Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdded an async HTTP middleware Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Hey - I've found 4 issues, and left some high level feedback:
- In
log_request_duration, consider including request metadata like HTTP method, path, and response status code in the log entry so that duration metrics can be correlated with specific endpoints and outcomes. - The new authentication and authorization logs (e.g., failed API key in
fetch_user, unauthorized study and setup operations) are all emitted atinfolevel; it may be more appropriate to usewarning(or similar) for these security-relevant events to make them easier to surface operationally.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `log_request_duration`, consider including request metadata like HTTP method, path, and response status code in the log entry so that duration metrics can be correlated with specific endpoints and outcomes.
- The new authentication and authorization logs (e.g., failed API key in `fetch_user`, unauthorized study and setup operations) are all emitted at `info` level; it may be more appropriate to use `warning` (or similar) for these security-relevant events to make them easier to surface operationally.
## Individual Comments
### Comment 1
<location path="src/core/logging.py" line_range="58-64" />
<code_context>
return await call_next(request)
+async def log_request_duration(
+ request: Request,
+ call_next: Callable[[Request], Awaitable[Response]],
+) -> Response:
+ """Log the process and wallclock time for each call.
+
+ Reported times cannot be attributed solely to processing the request.
+ As multiple requests can be handled concurrently in the same process,
+ process time may be spent on other requests as well. The same goes for
+ wallclock time, which is additionally influenced by e.g., context switches.
+ """
+ start_mono_ns = time.monotonic_ns()
+ start_process_ns = time.process_time_ns()
+ response: Response = await call_next(request)
+
+ duration_mono_ns = time.monotonic_ns() - start_mono_ns
+ duration_process_ns = time.process_time_ns() - start_process_ns
+ logger.info(
+ "Request took {mono_ms} ms wallclock time (process time {process_ms} ms)",
+ mono_ms=int(duration_mono_ns / 1_000_000),
</code_context>
<issue_to_address>
**suggestion:** Include request metadata (method, path, status) in duration logs for better traceability.
Right now the log entry only includes timing. Please extend the `logger.info` call to also log fields like `method=request.method`, `path=request.url.path`, and `status_code=response.status_code` so individual duration logs can be correlated with specific requests in isolation.
```suggestion
logger.info(
"Request {method} {path} [{status_code}] took {mono_ms} ms wallclock time (process time {process_ms} ms)",
method=request.method,
path=request.url.path,
status_code=response.status_code,
mono_ms=int(duration_mono_ns / 1_000_000),
process_ms=int(duration_process_ns / 1_000_000),
wallclock_time_ns=duration_mono_ns,
process_time_ns=duration_process_ns,
)
```
</issue_to_address>
### Comment 2
<location path="src/routers/dependencies.py" line_range="41-44" />
<code_context>
+
+ duration_mono_ns = time.monotonic_ns() - start_mono_ns
+ duration_process_ns = time.process_time_ns() - start_process_ns
+ logger.info(
+ "Request took {mono_ms} ms wallclock time (process time {process_ms} ms)",
+ mono_ms=int(duration_mono_ns / 1_000_000),
</code_context>
<issue_to_address>
**nitpick (typo):** Fix wording in authentication success log message.
The log message currently says `"User {identifier} authenticated in with api key ..."`, which has an extra `in`. Please adjust the wording (e.g., `"User {identifier} authenticated with api key ending in '{api_key}'"`) to keep logs clear for analysis.
</issue_to_address>
### Comment 3
<location path="src/routers/openml/study.py" line_range="77-80" />
<code_context>
msg = (
f"You may not remove tag {tag!r} of setup {setup_id} because it was not created by you."
)
+ logger.warning(
+ "User attempted to remove tag '{tag}' from setup {setup_id}.",
+ setup_id=setup_id,
</code_context>
<issue_to_address>
**suggestion:** Include the user identifier in the unauthorized study modification warning.
Since `user` is available here, please also include a user identifier in this warning (e.g. `user_id=user.user_id`) so audit logs clearly show who attempted the unauthorized operation.
</issue_to_address>
### Comment 4
<location path="tests/dependencies/fetch_user_test.py" line_range="20-22" />
<code_context>
],
)
async def test_fetch_user(api_key: str, user: User, user_test: AsyncConnection) -> None:
- db_user = await fetch_user(api_key, user_data=user_test)
+ async with aclosing(fetch_user(api_key, user_data=user_test)) as agen:
+ db_user = await anext(agen)
assert isinstance(db_user, User)
assert user.user_id == db_user.user_id
</code_context>
<issue_to_address>
**suggestion (testing):** Add tests asserting the new authentication logging behavior for successful and failed logins.
Current tests don’t verify the new logging paths. Please add tests that capture or patch the logger to assert that: (1) a successful `fetch_user` call emits the expected log with `user_id` and masked API key; (2) an invalid API key raises `AuthenticationFailedError` and logs the failure with the masked API key; and (3) `fetch_user_or_raise` logs when no user is provided.
Suggested implementation:
```python
from contextlib import aclosing
import logging
import pytest
from sqlalchemy.ext.asyncio import AsyncConnection
```
```python
async def test_fetch_user_invalid_key_raises(
user_test: AsyncConnection, caplog: pytest.LogCaptureFixture
) -> None:
invalid_api_key = "invalid-api-key"
with caplog.at_level(logging.INFO):
with pytest.raises(AuthenticationFailedError):
async with aclosing(fetch_user(api_key=invalid_api_key, user_data=user_test)) as agen:
# Trigger the first iteration which should raise
await anext(agen)
# Ensure the failure was logged and the raw API key is not exposed
assert any(
"authentication failed" in record.getMessage().lower()
for record in caplog.records
)
assert any("api_key=" in record.getMessage() for record in caplog.records)
assert all(
invalid_api_key not in record.getMessage()
for record in caplog.records
)
async def test_fetch_user_logs_success(
api_key: str,
user: User,
user_test: AsyncConnection,
caplog: pytest.LogCaptureFixture,
) -> None:
with caplog.at_level(logging.INFO):
async with aclosing(fetch_user(api_key, user_data=user_test)) as agen:
db_user = await anext(agen)
assert isinstance(db_user, User)
assert user.user_id == db_user.user_id
# Ensure a success log is emitted with user_id and a masked API key
messages = [record.getMessage() for record in caplog.records]
assert any(str(user.user_id) in message for message in messages)
assert any("api_key=" in message for message in messages)
# Raw API key should never appear in logs
assert all(api_key not in message for message in messages)
async def test_fetch_user_or_raise_logs_no_user(
user_test: AsyncConnection,
caplog: pytest.LogCaptureFixture,
) -> None:
with caplog.at_level(logging.INFO):
with pytest.raises(AuthenticationFailedError):
await fetch_user_or_raise(api_key=None, user=None, user_data=user_test)
# Ensure the "no user" condition is logged with masked/absent API key
messages = [record.getMessage().lower() for record in caplog.records]
assert any("no user" in message for message in messages)
```
To integrate these tests correctly, you will also need to:
1. Ensure the following are imported at the top of `tests/dependencies/fetch_user_test.py` (or adjust the names/paths to match your codebase):
- `from app.dependencies.fetch_user import fetch_user, fetch_user_or_raise, AuthenticationFailedError`
- `from app.models import User` (or wherever `User` is defined).
2. Adjust the logger name, log level, and log message substrings in the assertions if your implementation uses:
- A non-INFO log level for these messages.
- A different wording than `"authentication failed"` or `"no user"`.
3. If your logging uses structured fields (e.g. `extra={"user_id": ..., "api_key": ...}`) instead of encoding everything in the message, you may want to assert on `record.user_id` / `record.api_key` instead of `record.getMessage()`.
4. If the API key masking format is stricter (for example, only showing a fixed prefix), replace the generic checks (`"api_key=" in message` and `api_key not in message`) with an assertion that matches your exact masking convention.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #308 +/- ##
==========================================
+ Coverage 93.28% 93.56% +0.27%
==========================================
Files 72 72
Lines 3070 3109 +39
Branches 222 222
==========================================
+ Hits 2864 2909 +45
+ Misses 151 143 -8
- Partials 55 57 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/core/logging.py (1)
52-65: Emit duration logs for failed requests too.Currently, duration is only logged on successful
call_next. Wrap timing/logging intry/finallyso error paths are also measured.♻️ Proposed adjustment
- response: Response = await call_next(request) - - duration_mono_ns = time.monotonic_ns() - start_mono_ns - duration_process_ns = time.process_time_ns() - start_process_ns - logger.info( - "Request took {mono_ms} ms wallclock time (process time {process_ms} ms)", - mono_ms=int(duration_mono_ns / 1_000_000), - process_ms=int(duration_process_ns / 1_000_000), - wallclock_time_ns=duration_mono_ns, - process_time_ns=duration_process_ns, - ) - return response + try: + response: Response = await call_next(request) + return response + finally: + duration_mono_ns = time.monotonic_ns() - start_mono_ns + duration_process_ns = time.process_time_ns() - start_process_ns + logger.info( + "Request took {mono_ms} ms wallclock time (process time {process_ms} ms)", + mono_ms=int(duration_mono_ns / 1_000_000), + process_ms=int(duration_process_ns / 1_000_000), + wallclock_time_ns=duration_mono_ns, + process_time_ns=duration_process_ns, + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/core/logging.py` around lines 52 - 65, Wrap the timing and logging around the call to call_next in a try/finally so durations are computed and logger.info is invoked regardless of whether call_next raises; capture start_mono_ns and start_process_ns before calling call_next, compute duration_mono_ns and duration_process_ns in the finally block, and ensure the logger.info call still uses the same keys (mono_ms, process_ms, wallclock_time_ns, process_time_ns) before re-raising or returning the response from the try block (refer to start_mono_ns, start_process_ns, call_next, duration_mono_ns, duration_process_ns, and logger.info).src/routers/openml/study.py (1)
77-81: Boundentity_idslogging payload to avoid oversized log events.
entity_idscan be arbitrarily large; logging full lists in both paths can inflate log volume and slow sinks. Prefer loggingentity_count(and optionally a short sample).♻️ Proposed adjustment
logger.warning( "User attempted to attach entities to study they do not own.", study_id=study_id, - entity_ids=entity_ids, + entity_count=len(entity_ids), ) @@ logger.info( "User {user_id} attached entities to study {study_id}.", study_id=study_id, - entity_ids=entity_ids, + entity_count=len(entity_ids), user_id=user.user_id, )Also applies to: 102-107
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routers/openml/study.py` around lines 77 - 81, The current logger.warning call (and the similar logging at the other path) logs the full entity_ids list which may be arbitrarily large; change these logs to record entity_count=len(entity_ids) and a short sample (e.g., entity_ids[:N]) instead of the entire list, and include study_id as before; update both logger.warning and the other logging call around the same logic (the other block that references entity_ids) to use the count+sample pattern and ensure the sample size is small and safely serializable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/core/logging.py`:
- Around line 24-28: The logger is configured with enqueue=True which requires
an explicit drain on shutdown; update the application shutdown/lifespan handler
that currently calls close_databases() to also call logger.complete() after
closing DBs to flush queued records. If logger is not in scope in that handler,
import the logger object defined in core.logging (the one where
sink_configuration and logger.add(...) are used) and invoke logger.complete() as
the final step of the shutdown sequence. Ensure this call runs synchronously
during shutdown so all enqueued records are flushed before process exit.
---
Nitpick comments:
In `@src/core/logging.py`:
- Around line 52-65: Wrap the timing and logging around the call to call_next in
a try/finally so durations are computed and logger.info is invoked regardless of
whether call_next raises; capture start_mono_ns and start_process_ns before
calling call_next, compute duration_mono_ns and duration_process_ns in the
finally block, and ensure the logger.info call still uses the same keys
(mono_ms, process_ms, wallclock_time_ns, process_time_ns) before re-raising or
returning the response from the try block (refer to start_mono_ns,
start_process_ns, call_next, duration_mono_ns, duration_process_ns, and
logger.info).
In `@src/routers/openml/study.py`:
- Around line 77-81: The current logger.warning call (and the similar logging at
the other path) logs the full entity_ids list which may be arbitrarily large;
change these logs to record entity_count=len(entity_ids) and a short sample
(e.g., entity_ids[:N]) instead of the entire list, and include study_id as
before; update both logger.warning and the other logging call around the same
logic (the other block that references entity_ids) to use the count+sample
pattern and ensure the sample size is small and safely serializable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: ed9b7853-b75a-4162-9bda-d883688194ea
📒 Files selected for processing (7)
src/core/logging.pysrc/main.pysrc/routers/dependencies.pysrc/routers/openml/datasets.pysrc/routers/openml/setups.pysrc/routers/openml/study.pytests/dependencies/fetch_user_test.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/main.py`:
- Line 37: The call to logger.complete() in the async lifespan must be awaited
so coroutine sinks finish cleanly; update the async lifespan function to use
await logger.complete() (i.e., replace the bare logger.complete() invocation
with an awaited call inside the async function where logger.complete() is
currently called) to ensure Loguru's async sink tasks are awaited before
shutdown.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
we will restructure the use of Lifespan and test client creation in a separate PR
Add common logging for authentication and also add logging to endpoints which permute the data.