Refactor grain_data_processing_test and improve time#3918
Conversation
|
🤖 Hi @aireenmei, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
There was a problem hiding this comment.
This PR successfully refactors grain_data_processing_test.py to improve test execution time. By introducing specialized mixins for setup and determinism, it avoids redundant and expensive checks in variant subclasses while actually increasing the validation depth for the primary format tests.
🔍 General Feedback
- Refactoring Quality: The use of
GrainDeterminismMixinand private setup mixins like_GrainArrayRecordSetupis a clean architectural improvement that makes the test suite more modular and maintainable. - Coverage: The addition of auxiliary tensor checks in the determinism tests is a proactive improvement to the overall test quality.
- Performance: This change should significantly reduce the time spent in
grain_data_processing_test.pyduring CI runs by eliminating redundanttest_batch_determinismandtest_for_loop_repeatablecalls in specialized variant tests.
There was a problem hiding this comment.
🟢 This is a solid refactoring. Moving the determinism tests into a separate mixin (GrainDeterminismMixin) and setup logic into format-specific mixins (_GrainArrayRecordSetup, _GrainTFRecordSetup) effectively reduces redundant test execution in variant subclasses, which will significantly improve CI performance.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
b3ec285 to
6c2949e
Compare
| super().setUpClass() | ||
|
|
||
| def setUp(self): | ||
| """common setup for ArrayReord format""" |
There was a problem hiding this comment.
nit: Common and ArrayRecord
There was a problem hiding this comment.
Fixed. Thanks for catching
6c2949e to
1ab5b5b
Compare
1ab5b5b to
feaf467
Compare
Description
As data pipeline unit tests in grain_data_processing_test.py grow, it take a lot of time from the CI runner. The base test includes three tests: test_train_ds, test_batch_determinism, test_for_loop_repeatable. And all the variants inherit these three test. Since it's unlikely for the config variants to change the determinism and for_loop_repeatable behavior, this PR refactor introduces _Grain{format}Setup, GrainBaseProcessingTest, GrainDeterminismMixin class, and have the config variant test only inherit _Grain{format}Setup and GrainBaseProcessingTest to reduce the time for running determinism tests.
Tests
When running locally (CI runner time will be different but expect similar percentage)
Before this PR, all tests in grain_data_processing_test.py takes 7:08 min, log
After this PR, all tests in grain_data_processing_test.py takes 4:15 min, log
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.