Skip to content

Fixes for import automation#1969

Open
vish-cs wants to merge 1 commit intodatacommonsorg:masterfrom
vish-cs:fix
Open

Fixes for import automation#1969
vish-cs wants to merge 1 commit intodatacommonsorg:masterfrom
vish-cs:fix

Conversation

@vish-cs
Copy link
Copy Markdown
Contributor

@vish-cs vish-cs commented Apr 20, 2026

  • Fix the condition to check for an empty diff (schema_diff_size)
  • Add the option to import-helper to run the batch job or the dataflow job
  • Update spanner client to only return staging imports

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the import differ logic to return a summary dictionary and updates the validation workflow to handle this new structure. It also introduces a conditional post-processing step for Spanner ingestion, allowing for status updates to 'STAGING' and corresponding filtering in the Spanner client. I have no feedback to provide.

logging.info("Marking import as SKIP due to no data diff.")
import_summary.status = ImportStatus.SKIP
else:
import_summary.status = ImportStatus.STAGING
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return validation_status

'import_version',
datetime.now(timezone.utc).strftime("%Y-%m-%d"))
run_ingestion = True
post_process = attributes.get('post_process', '')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a new attribute? How is this used?

The name seems to indicate this is post running import workflow.
Can we rename this to run_process so when set to spanner_ingestion_workflow it is clear that is it only running a dataflow ingestion?

import_input=import_input,
absolute_import_dir=absolute_import_dir)
if differ_summary is not None:
diff_found = (differ_summary['obs_diff_size'] != 0 or
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use .get() instead of []?
diff_summary.get('obs_diff_size', 0) != 0 or differ_summary,get('schema_diff_size', 0) != 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants