Add more logging by PGijsbers · Pull Request #308 · openml/server-api

PGijsbers · 2026-04-08T15:43:47Z

Add common logging for authentication and also add logging to endpoints which permute the data.

coderabbitai · 2026-04-08T15:44:01Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Added an async HTTP middleware log_request_duration that records wallclock (monotonic_ns) and CPU/process (process_time_ns) durations around request handling and logs the response status. Registered this middleware in the app middleware stack and adjusted middleware order. Changed log sink registration to call sink_configuration.setdefault("serialize", True) and sink_configuration.setdefault("enqueue", True) before logger.add. Expanded add_request_context_to_log to contextualize method and path alongside request_id. Converted fetch_user to an async generator that yields User | None and emits authentication logs; updated tests to consume it as a generator. Added log statements across several routers and call logger.complete() during shutdown.

Possibly related PRs

Add logging configuration and setup #306: Modifies the same core logging utilities and adjusts middleware registration, overlapping changes to log sinks and request-context middleware.

Suggested labels

dev-tools

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 36.84% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Add more logging' is related to the changeset but is overly broad and vague, not highlighting the most important or specific aspect of the changes.	Consider a more specific title that captures the main focus, such as 'Add authentication and endpoint logging' or 'Add logging to auth and data-mutating endpoints'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description accurately reflects the main objectives of the changeset: adding logging for authentication and endpoints that modify data.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch common-logging

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai

Hey - I've found 4 issues, and left some high level feedback:

In log_request_duration, consider including request metadata like HTTP method, path, and response status code in the log entry so that duration metrics can be correlated with specific endpoints and outcomes.
The new authentication and authorization logs (e.g., failed API key in fetch_user, unauthorized study and setup operations) are all emitted at info level; it may be more appropriate to use warning (or similar) for these security-relevant events to make them easier to surface operationally.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `log_request_duration`, consider including request metadata like HTTP method, path, and response status code in the log entry so that duration metrics can be correlated with specific endpoints and outcomes.
- The new authentication and authorization logs (e.g., failed API key in `fetch_user`, unauthorized study and setup operations) are all emitted at `info` level; it may be more appropriate to use `warning` (or similar) for these security-relevant events to make them easier to surface operationally.

## Individual Comments

### Comment 1
<location path="src/core/logging.py" line_range="58-64" />
<code_context>
         return await call_next(request)


+async def log_request_duration(
+    request: Request,
+    call_next: Callable[[Request], Awaitable[Response]],
+) -> Response:
+    """Log the process and wallclock time for each call.
+
+    Reported times cannot be attributed solely to processing the request.
+    As multiple requests can be handled concurrently in the same process,
+    process time may be spent on other requests as well. The same goes for
+    wallclock time, which is additionally influenced by e.g., context switches.
+    """
+    start_mono_ns = time.monotonic_ns()
+    start_process_ns = time.process_time_ns()
+    response: Response = await call_next(request)
+
+    duration_mono_ns = time.monotonic_ns() - start_mono_ns
+    duration_process_ns = time.process_time_ns() - start_process_ns
+    logger.info(
+        "Request took {mono_ms} ms wallclock time (process time {process_ms} ms)",
+        mono_ms=int(duration_mono_ns / 1_000_000),
</code_context>
<issue_to_address>
**suggestion:** Include request metadata (method, path, status) in duration logs for better traceability.

Right now the log entry only includes timing. Please extend the `logger.info` call to also log fields like `method=request.method`, `path=request.url.path`, and `status_code=response.status_code` so individual duration logs can be correlated with specific requests in isolation.

```suggestion
    logger.info(
        "Request {method} {path} [{status_code}] took {mono_ms} ms wallclock time (process time {process_ms} ms)",
        method=request.method,
        path=request.url.path,
        status_code=response.status_code,
        mono_ms=int(duration_mono_ns / 1_000_000),
        process_ms=int(duration_process_ns / 1_000_000),
        wallclock_time_ns=duration_mono_ns,
        process_time_ns=duration_process_ns,
    )
```
</issue_to_address>

### Comment 2
<location path="src/routers/dependencies.py" line_range="41-44" />
<code_context>
+
+    duration_mono_ns = time.monotonic_ns() - start_mono_ns
+    duration_process_ns = time.process_time_ns() - start_process_ns
+    logger.info(
+        "Request took {mono_ms} ms wallclock time (process time {process_ms} ms)",
+        mono_ms=int(duration_mono_ns / 1_000_000),
</code_context>
<issue_to_address>
**nitpick (typo):** Fix wording in authentication success log message.

The log message currently says `"User {identifier} authenticated in with api key ..."`, which has an extra `in`. Please adjust the wording (e.g., `"User {identifier} authenticated with api key ending in '{api_key}'"`) to keep logs clear for analysis.
</issue_to_address>

### Comment 3
<location path="src/routers/openml/study.py" line_range="77-80" />
<code_context>
         msg = (
             f"You may not remove tag {tag!r} of setup {setup_id} because it was not created by you."
         )
+        logger.warning(
+            "User attempted to remove tag '{tag}' from setup {setup_id}.",
+            setup_id=setup_id,
</code_context>
<issue_to_address>
**suggestion:** Include the user identifier in the unauthorized study modification warning.

Since `user` is available here, please also include a user identifier in this warning (e.g. `user_id=user.user_id`) so audit logs clearly show who attempted the unauthorized operation.
</issue_to_address>

### Comment 4
<location path="tests/dependencies/fetch_user_test.py" line_range="20-22" />
<code_context>
     ],
 )
 async def test_fetch_user(api_key: str, user: User, user_test: AsyncConnection) -> None:
-    db_user = await fetch_user(api_key, user_data=user_test)
+    async with aclosing(fetch_user(api_key, user_data=user_test)) as agen:
+        db_user = await anext(agen)
     assert isinstance(db_user, User)
     assert user.user_id == db_user.user_id
</code_context>
<issue_to_address>
**suggestion (testing):** Add tests asserting the new authentication logging behavior for successful and failed logins.

Current tests don’t verify the new logging paths. Please add tests that capture or patch the logger to assert that: (1) a successful `fetch_user` call emits the expected log with `user_id` and masked API key; (2) an invalid API key raises `AuthenticationFailedError` and logs the failure with the masked API key; and (3) `fetch_user_or_raise` logs when no user is provided.

Suggested implementation:

```python
from contextlib import aclosing

import logging

import pytest
from sqlalchemy.ext.asyncio import AsyncConnection

```

```python
async def test_fetch_user_invalid_key_raises(
    user_test: AsyncConnection, caplog: pytest.LogCaptureFixture
) -> None:
    invalid_api_key = "invalid-api-key"

    with caplog.at_level(logging.INFO):
        with pytest.raises(AuthenticationFailedError):
            async with aclosing(fetch_user(api_key=invalid_api_key, user_data=user_test)) as agen:
                # Trigger the first iteration which should raise
                await anext(agen)

    # Ensure the failure was logged and the raw API key is not exposed
    assert any(
        "authentication failed" in record.getMessage().lower()
        for record in caplog.records
    )
    assert any("api_key=" in record.getMessage() for record in caplog.records)
    assert all(
        invalid_api_key not in record.getMessage()
        for record in caplog.records
    )


async def test_fetch_user_logs_success(
    api_key: str,
    user: User,
    user_test: AsyncConnection,
    caplog: pytest.LogCaptureFixture,
) -> None:
    with caplog.at_level(logging.INFO):
        async with aclosing(fetch_user(api_key, user_data=user_test)) as agen:
            db_user = await anext(agen)

    assert isinstance(db_user, User)
    assert user.user_id == db_user.user_id

    # Ensure a success log is emitted with user_id and a masked API key
    messages = [record.getMessage() for record in caplog.records]
    assert any(str(user.user_id) in message for message in messages)
    assert any("api_key=" in message for message in messages)
    # Raw API key should never appear in logs
    assert all(api_key not in message for message in messages)


async def test_fetch_user_or_raise_logs_no_user(
    user_test: AsyncConnection,
    caplog: pytest.LogCaptureFixture,
) -> None:
    with caplog.at_level(logging.INFO):
        with pytest.raises(AuthenticationFailedError):
            await fetch_user_or_raise(api_key=None, user=None, user_data=user_test)

    # Ensure the "no user" condition is logged with masked/absent API key
    messages = [record.getMessage().lower() for record in caplog.records]
    assert any("no user" in message for message in messages)

```

To integrate these tests correctly, you will also need to:

1. Ensure the following are imported at the top of `tests/dependencies/fetch_user_test.py` (or adjust the names/paths to match your codebase):
   - `from app.dependencies.fetch_user import fetch_user, fetch_user_or_raise, AuthenticationFailedError`
   - `from app.models import User` (or wherever `User` is defined).
2. Adjust the logger name, log level, and log message substrings in the assertions if your implementation uses:
   - A non-INFO log level for these messages.
   - A different wording than `"authentication failed"` or `"no user"`.
3. If your logging uses structured fields (e.g. `extra={"user_id": ..., "api_key": ...}`) instead of encoding everything in the message, you may want to assert on `record.user_id` / `record.api_key` instead of `record.getMessage()`.
4. If the API key masking format is stricter (for example, only showing a fixed prefix), replace the generic checks (`"api_key=" in message` and `api_key not in message`) with an assertion that matches your exact masking convention.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

src/core/logging.py

src/routers/dependencies.py

src/routers/openml/study.py

tests/dependencies/fetch_user_test.py

codecov · 2026-04-08T15:48:52Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.56%. Comparing base (2a8de11) to head (e6d4592).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #308      +/-   ##
==========================================
+ Coverage   93.28%   93.56%   +0.27%     
==========================================
  Files          72       72              
  Lines        3070     3109      +39     
  Branches      222      222              
==========================================
+ Hits         2864     2909      +45     
+ Misses        151      143       -8     
- Partials       55       57       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

src/core/logging.py (1)

52-65: Emit duration logs for failed requests too.

Currently, duration is only logged on successful call_next. Wrap timing/logging in try/finally so error paths are also measured.

♻️ Proposed adjustment

-    response: Response = await call_next(request)
-
-    duration_mono_ns = time.monotonic_ns() - start_mono_ns
-    duration_process_ns = time.process_time_ns() - start_process_ns
-    logger.info(
-        "Request took {mono_ms} ms wallclock time (process time {process_ms} ms)",
-        mono_ms=int(duration_mono_ns / 1_000_000),
-        process_ms=int(duration_process_ns / 1_000_000),
-        wallclock_time_ns=duration_mono_ns,
-        process_time_ns=duration_process_ns,
-    )
-    return response
+    try:
+        response: Response = await call_next(request)
+        return response
+    finally:
+        duration_mono_ns = time.monotonic_ns() - start_mono_ns
+        duration_process_ns = time.process_time_ns() - start_process_ns
+        logger.info(
+            "Request took {mono_ms} ms wallclock time (process time {process_ms} ms)",
+            mono_ms=int(duration_mono_ns / 1_000_000),
+            process_ms=int(duration_process_ns / 1_000_000),
+            wallclock_time_ns=duration_mono_ns,
+            process_time_ns=duration_process_ns,
+        )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/core/logging.py` around lines 52 - 65, Wrap the timing and logging around
the call to call_next in a try/finally so durations are computed and logger.info
is invoked regardless of whether call_next raises; capture start_mono_ns and
start_process_ns before calling call_next, compute duration_mono_ns and
duration_process_ns in the finally block, and ensure the logger.info call still
uses the same keys (mono_ms, process_ms, wallclock_time_ns, process_time_ns)
before re-raising or returning the response from the try block (refer to
start_mono_ns, start_process_ns, call_next, duration_mono_ns,
duration_process_ns, and logger.info).

src/routers/openml/study.py (1)

77-81: Bound entity_ids logging payload to avoid oversized log events.

entity_ids can be arbitrarily large; logging full lists in both paths can inflate log volume and slow sinks. Prefer logging entity_count (and optionally a short sample).

♻️ Proposed adjustment

         logger.warning(
             "User attempted to attach entities to study they do not own.",
             study_id=study_id,
-            entity_ids=entity_ids,
+            entity_count=len(entity_ids),
         )
@@
     logger.info(
         "User {user_id} attached entities to study {study_id}.",
         study_id=study_id,
-        entity_ids=entity_ids,
+        entity_count=len(entity_ids),
         user_id=user.user_id,
     )

Also applies to: 102-107

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/routers/openml/study.py` around lines 77 - 81, The current logger.warning
call (and the similar logging at the other path) logs the full entity_ids list
which may be arbitrarily large; change these logs to record
entity_count=len(entity_ids) and a short sample (e.g., entity_ids[:N]) instead
of the entire list, and include study_id as before; update both logger.warning
and the other logging call around the same logic (the other block that
references entity_ids) to use the count+sample pattern and ensure the sample
size is small and safely serializable.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/core/logging.py`:
- Around line 24-28: The logger is configured with enqueue=True which requires
an explicit drain on shutdown; update the application shutdown/lifespan handler
that currently calls close_databases() to also call logger.complete() after
closing DBs to flush queued records. If logger is not in scope in that handler,
import the logger object defined in core.logging (the one where
sink_configuration and logger.add(...) are used) and invoke logger.complete() as
the final step of the shutdown sequence. Ensure this call runs synchronously
during shutdown so all enqueued records are flushed before process exit.

---

Nitpick comments:
In `@src/core/logging.py`:
- Around line 52-65: Wrap the timing and logging around the call to call_next in
a try/finally so durations are computed and logger.info is invoked regardless of
whether call_next raises; capture start_mono_ns and start_process_ns before
calling call_next, compute duration_mono_ns and duration_process_ns in the
finally block, and ensure the logger.info call still uses the same keys
(mono_ms, process_ms, wallclock_time_ns, process_time_ns) before re-raising or
returning the response from the try block (refer to start_mono_ns,
start_process_ns, call_next, duration_mono_ns, duration_process_ns, and
logger.info).

In `@src/routers/openml/study.py`:
- Around line 77-81: The current logger.warning call (and the similar logging at
the other path) logs the full entity_ids list which may be arbitrarily large;
change these logs to record entity_count=len(entity_ids) and a short sample
(e.g., entity_ids[:N]) instead of the entire list, and include study_id as
before; update both logger.warning and the other logging call around the same
logic (the other block that references entity_ids) to use the count+sample
pattern and ensure the sample size is small and safely serializable.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ed9b7853-b75a-4162-9bda-d883688194ea

📥 Commits

Reviewing files that changed from the base of the PR and between 2a8de11 and c6af844.

📒 Files selected for processing (7)

src/core/logging.py
src/main.py
src/routers/dependencies.py
src/routers/openml/datasets.py
src/routers/openml/setups.py
src/routers/openml/study.py
tests/dependencies/fetch_user_test.py

src/core/logging.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/main.py`:
- Line 37: The call to logger.complete() in the async lifespan must be awaited
so coroutine sinks finish cleanly; update the async lifespan function to use
await logger.complete() (i.e., replace the bare logger.complete() invocation
with an awaited call inside the async function where logger.complete() is
currently called) to ensure Loguru's async sink tasks are awaited before
shutdown.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: febaa765-126e-42fa-9317-cbae8e1e44a5

📥 Commits

Reviewing files that changed from the base of the PR and between af4290a and 0f8cc87.

📒 Files selected for processing (1)

src/main.py

src/main.py

we will restructure the use of Lifespan and test client creation in a separate PR

PGijsbers added 6 commits April 8, 2026 10:37

Add logging to user authentication

691f5b6

Make sure enqueue is turned on by default to make non-blocking calls

175f402

Add middleware for logging response times

f55b798

Add user id to logging context

2bb4a1d

Add logging to state mutations

7135e25

Fix dependency tests

c6af844

sourcery-ai bot reviewed Apr 8, 2026

View reviewed changes

src/core/logging.py Show resolved Hide resolved

src/routers/dependencies.py Show resolved Hide resolved

src/routers/openml/study.py Show resolved Hide resolved

tests/dependencies/fetch_user_test.py Show resolved Hide resolved

Process feedback

af4290a

coderabbitai bot reviewed Apr 8, 2026

View reviewed changes

src/core/logging.py Show resolved Hide resolved

Call logger.complete on shutdown

0f8cc87

coderabbitai bot reviewed Apr 9, 2026

View reviewed changes

src/main.py Outdated Show resolved Hide resolved

PGijsbers added 3 commits April 9, 2026 10:57

Use asgi-lifespan to ensure the lifespan events are used in test

29e2240

setup/teardown once per session

bd5995a

we will restructure the use of Lifespan and test client creation in a separate PR

await teardown of logger and database together

e6d4592

PGijsbers merged commit 058fea0 into main Apr 9, 2026
9 checks passed

PGijsbers deleted the common-logging branch April 9, 2026 13:22

coderabbitai bot mentioned this pull request Apr 10, 2026

Restructure fixtures #309

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add more logging#308

Add more logging#308
PGijsbers merged 11 commits intomainfrom
common-logging

PGijsbers commented Apr 8, 2026

Uh oh!

coderabbitai bot commented Apr 8, 2026 •

edited

Loading

Reviews paused

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Apr 8, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

PGijsbers commented Apr 8, 2026

Uh oh!

coderabbitai bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Possibly related PRs

Suggested labels

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Apr 8, 2026 •

edited

Loading

codecov bot commented Apr 8, 2026 •

edited

Loading