Skip to content

RAS-1925: Investigate Rops interaction with collection exercise#1073

Open
SteveScorfield wants to merge 1 commit into
mainfrom
investigation-into-rops-interaction-with-ce
Open

RAS-1925: Investigate Rops interaction with collection exercise#1073
SteveScorfield wants to merge 1 commit into
mainfrom
investigation-into-rops-interaction-with-ce

Conversation

@SteveScorfield

@SteveScorfield SteveScorfield commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

What and why?

Multiple calls are being made to the CE service when navigating through a CE in ROPS. These calls are mostly unnecessary when we are just populating lists and data. For instance if I were to select a specific period of a CE from the list, it populates the page with data, already retrieved from the list page. Equally, when navigating back to the CE list page, it repopulates the data again. This is really unnecessary. From what I found, there are about 7 unnecessary calls being made to the CE service. All this data can be contained within a Redis session. Using a Redis session to store this data and refreshing it only when changes are made/session expiry ensures that the minimum of calls are made. I have made some changes that can be viewed as a starting point and doesn’t necessarily need to be implemented this way.

How to test?

  1. Deploy this locally
  2. Play about creating CEs

Note: As this is a concept, the tests are not working correctly. I didn't want to fully update the tests on a concept piece of work.

Jira

https://officefornationalstatistics.atlassian.net/browse/RAS-1925

@SteveScorfield SteveScorfield requested a review from a team as a code owner June 1, 2026 06:34

@arroyoAle arroyoAle left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good start, but I think we need a bit more information about where collection exercise is being called from and which ones might be redundant. Also might be worth a thought about wether redis is appropriate for this, although we do this for the surveys so it might be a similar, but since we create/edit collection exercise with more regularity we might want to always request the most up to date information from collection exercise.

self.set(redis_key, json.dumps(result), self.EXPIRY)
return result

def remove_old_collection_exercises(self, survey_id: str) -> str:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we'd need to remove them from the cache instead of relying on the TTL to clear the exercise

@LJBabbage

Copy link
Copy Markdown
Contributor

Would Redis improve performance? Yes, but it doesn't address the root cause. The underlying issue is the architecture and code design: the logic is in the wrong place and is performing the wrong operations. There is also a broader data-volume problem around Collection Exercises (CEs) and Cases, which is not being managed effectively.

What's happening today

ROPS calls the Case service and requests every case for a party since the start of RAS/RM in 2017. In PreProd, this can return up to 1,374 cases for a single party, with the 100th highest party still having 497 cases.

The process then becomes highly inefficient:

ROPS loops through every returned case.
For each case, it makes an individual call to the Collection Exercise service.
The Collection Exercise service which holds 1,798 CEs.
After processing all of these calls, ROPS attempts to determine which CEs are "live".

Even this determination is flawed. A CE is considered live if the current date is after scheduledStartDateTime, but no check is made to see whether the CE has already ended.

ROPS then takes the latest CE for each survey and ultimately produces a very small result set—often fewer than 20 records containing only three columns of data. However, getting to that result requires a large amount of processing and a potentially huge number of service calls.

Would Redis help?

Redis would likely improve the performance of the CE lookups, but it would only reduce the cost of an inefficient design.

The real issue is that ROPS should not be making up to 1,374 calls in the first place. It should be making a single call and receiving the data it actually needs.

From looking at the code, this logic appears better suited to the Case service. The Case service already holds both ce_id and survey_id within Case Groups. With support from the Collection Exercise service (which could itself use Redis for caching), the Case service should be able to determine the relevant records and return only the required results to ROPS.

Longer-term considerations

At a higher level, we should stop returning historical data going all the way back to 2017 by default. Either:

Remove historical CEs and Cases entirely where they are not needed; or
Introduce sensible limits and filtering so only relevant, recent records are returned.

Without addressing the volume problem, we will continue to process and transfer large amounts of unnecessary data regardless of any caching improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants