feat: pagination in services and types#61
Conversation
Reviewer's GuideAdds optional, backward-compatible pagination support to the GA4GH registry services and service types endpoints, and updates the OpenAPI specification to describe the new query parameters and paginated response schemas. Sequence diagram for optional pagination in getServices endpointsequenceDiagram
actor Client
participant FlaskServer
participant getServices
participant ServicesCollection
Client->>FlaskServer: GET /services?page&page_size
FlaskServer->>getServices: getServices
alt [page and page_size missing]
getServices->>ServicesCollection: find(filter, projection)
ServicesCollection-->>getServices: cursor(all services)
getServices-->>FlaskServer: list(services)
else [page or page_size provided]
getServices->>ServicesCollection: count_documents({})
ServicesCollection-->>getServices: total_count
getServices->>ServicesCollection: find(filter, projection)
ServicesCollection-->>getServices: cursor
getServices->>ServicesCollection: cursor.skip(skip_items).limit(page_size)
ServicesCollection-->>getServices: paginated_services
getServices-->>FlaskServer: {results, pagination}
end
FlaskServer-->>Client: 200 OK
Sequence diagram for optional pagination in getServiceTypes endpointsequenceDiagram
actor Client
participant FlaskServer
participant getServiceTypes
participant getServices
Client->>FlaskServer: GET /service/types?page&page_size
FlaskServer->>getServiceTypes: getServiceTypes
getServiceTypes->>getServices: getServices.__wrapped__
getServices-->>getServiceTypes: services or {results, pagination}
getServiceTypes->>getServiceTypes: build uniq_types
alt [page and page_size missing]
getServiceTypes-->>FlaskServer: uniq_types
else [page or page_size provided]
getServiceTypes->>getServiceTypes: paginate uniq_types
getServiceTypes-->>FlaskServer: {results, pagination}
end
FlaskServer-->>Client: 200 OK
Entity relationship diagram for new pagination response schemaserDiagram
PaginationMetadata {
int page
int page_size
int total
}
PaginatedExternalService {
}
PaginatedServiceType {
}
ExternalService {
}
ServiceType {
}
PaginatedExternalService ||--o{ ExternalService : results
PaginatedExternalService ||--|| PaginationMetadata : pagination
PaginatedServiceType ||--o{ ServiceType : results
PaginatedServiceType ||--|| PaginationMetadata : pagination
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 4 issues, and left some high level feedback:
- In
getServices, thetotal_pagescalculation causes aBadRequestwhentotal_count == 0(e.g., page=1 and no records yieldtotal_pages == 0), which is surprising; consider treating empty collections as a valid first page and returning an emptyresultslist with appropriate pagination metadata instead of erroring. - Both
getServicesandgetServiceTypesnow contain nearly identical pagination logic (defaults, bounds checking, skip/limit or slicing); it would be cleaner and less error-prone to extract this into a shared helper that can be reused across endpoints. - The mixed return type of
getServices(raw list vs. object withresults/pagination) forces callers likegetServiceTypesto useisinstancechecks and__wrapped__workarounds; consider normalizing the return shape (e.g., always a paginated object with sensible defaults) or providing a separate internal helper that always returns a list.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `getServices`, the `total_pages` calculation causes a `BadRequest` when `total_count == 0` (e.g., page=1 and no records yield `total_pages == 0`), which is surprising; consider treating empty collections as a valid first page and returning an empty `results` list with appropriate pagination metadata instead of erroring.
- Both `getServices` and `getServiceTypes` now contain nearly identical pagination logic (defaults, bounds checking, skip/limit or slicing); it would be cleaner and less error-prone to extract this into a shared helper that can be reused across endpoints.
- The mixed return type of `getServices` (raw list vs. object with `results`/`pagination`) forces callers like `getServiceTypes` to use `isinstance` checks and `__wrapped__` workarounds; consider normalizing the return shape (e.g., always a paginated object with sensible defaults) or providing a separate internal helper that always returns a list.
## Individual Comments
### Comment 1
<location path="cloud_registry/ga4gh/registry/server.py" line_range="45-48" />
<code_context>
+ # Return paginated response
+ page = page or 1
+ page_size = page_size or 10
+ total_count = db_collection_service.count_documents({})
+ total_pages = floor(total_count / page_size) + (1 if total_count % page_size > 0 else 0)
+
+ if page < 1 or page > total_pages:
+ raise BadRequest
+
</code_context>
<issue_to_address>
**issue:** Requesting page 1 on an empty collection results in BadRequest instead of an empty paginated response.
Because `total_count == 0` yields `total_pages = 0`, any `page >= 1` satisfies `page > total_pages` and raises `BadRequest`. This makes `/services?page=1` on an empty store an error instead of an empty `results` with pagination metadata, and it’s inconsistent with `getServiceTypes`, where `page > total_pages` is only invalid when `total_count > 0`. Please adjust the condition so that when `total_count == 0`, `page=1` (and possibly higher pages) returns an empty `results` instead of an error.
</issue_to_address>
### Comment 2
<location path="cloud_registry/ga4gh/registry/server.py" line_range="43-46" />
<code_context>
+ return list(records)
+
+ # Return paginated response
+ page = page or 1
+ page_size = page_size or 10
+ total_count = db_collection_service.count_documents({})
+ total_pages = floor(total_count / page_size) + (1 if total_count % page_size > 0 else 0)
+
+ if page < 1 or page > total_pages:
</code_context>
<issue_to_address>
**issue (bug_risk):** page_size is not validated and can be zero or negative, leading to runtime errors and invalid skip/limit values.
If a client sends `page_size=0` or a negative value, this will raise `ZeroDivisionError` when computing `total_pages` and also result in invalid `.limit(page_size)` and negative skip values. The same applies to the `getServiceTypes` pagination. Please enforce `page_size >= 1` (and optionally a maximum) and reject invalid values with a `BadRequest`.
</issue_to_address>
### Comment 3
<location path="cloud_registry/ga4gh/registry/server.py" line_range="101-102" />
<code_context>
+ page = request.args.get("page", type=int)
+ page_size = request.args.get("page_size", type=int)
+
services = getServices.__wrapped__()
+ if isinstance(services, dict):
+ services = services["results"]
types = [s["type"] for s in services]
</code_context>
<issue_to_address>
**issue (bug_risk):** Pagination parameters on /service-types currently affect the underlying /services call, which restricts the set of services considered when computing distinct service types.
Because `page` and `page_size` are still present in `request.args` when calling `getServices.__wrapped__()`, `getServices` paginates the services before `getServiceTypes` builds the type list from `services["results"]`, and then you paginate the types again. `getServiceTypes` should derive distinct types from the full, unpaginated service set, and only then apply pagination to the types. Consider calling a helper that always returns the complete service list or otherwise disabling pagination inside `getServices` for this endpoint.
</issue_to_address>
### Comment 4
<location path="cloud_registry/api/20201108.e0358db.openapi.yaml" line_range="24-35" />
<code_context>
+ schema:
+ type: integer
+ example: 1
+ - name: page_size
+ in: query
+ required: false
+ schema:
+ type: integer
+ example: 10
responses:
'200':
</code_context>
<issue_to_address>
**suggestion (bug_risk):** The OpenAPI schema for pagination parameters does not constrain page/page_size to positive integers, which conflicts with server-side expectations.
Backend code requires `page` and `page_size` to be positive, non-zero values, but the OpenAPI spec only marks them as generic integers. Please add `minimum: 1` (and optionally an upper `maximum`) to both schemas so that validation and generated clients correctly enforce the backend constraints.
```suggestion
- name: page
in: query
required: false
schema:
type: integer
minimum: 1
example: 1
- name: page_size
in: query
required: false
schema:
type: integer
minimum: 1
example: 10
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| services = getServices.__wrapped__() | ||
| if isinstance(services, dict): |
There was a problem hiding this comment.
issue (bug_risk): Pagination parameters on /service-types currently affect the underlying /services call, which restricts the set of services considered when computing distinct service types.
Because page and page_size are still present in request.args when calling getServices.__wrapped__(), getServices paginates the services before getServiceTypes builds the type list from services["results"], and then you paginate the types again. getServiceTypes should derive distinct types from the full, unpaginated service set, and only then apply pagination to the types. Consider calling a helper that always returns the complete service list or otherwise disabling pagination inside getServices for this endpoint.
Description
Adds backward compatible pagination to the
servicesandservice/typesendpoint.Fixes #58 (issue)
Type of change
New feature (non-breaking change which adds functionality)
Checklist:
Summary by Sourcery
Add optional pagination support to the services and service types endpoints while preserving the existing non-paginated behaviour.
New Features:
Enhancements: