Skip to content

Script to upload time series and observations to spanner graph#1957

Open
ajaits wants to merge 8 commits intodatacommonsorg:masterfrom
ajaits:dc-spanner
Open

Script to upload time series and observations to spanner graph#1957
ajaits wants to merge 8 commits intodatacommonsorg:masterfrom
ajaits:dc-spanner

Conversation

@ajaits
Copy link
Copy Markdown
Contributor

@ajaits ajaits commented Apr 16, 2026

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a script for uploading statistical variable observations from CSV files to Google Cloud Spanner using a template-based configuration. The review identified several critical issues, including a reference to an undefined attribute, a logic error in the mutation insertion loop due to a misaligned 'else' block, and incorrect row counting. Other improvements suggested include more robust string splitting for default values, consistent key naming for template lookups, and ensuring compatibility with the Spanner API by passing lists instead of dictionary views.


num_mutations = 0
if not self._dry_run:
with self.database.batch() as batch:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

self.database is not defined. It should be self._database, which was initialized in the __init__ method.

Suggested change
with self.database.batch() as batch:
with self._database.batch() as batch:

Comment on lines +195 to +210
for table, table_row in mutations:
if isinstance(table_row, list):
for row in table_row:
batch.insert(table=table,
columns=row.keys(),
values=[row.values()])
num_mutations += len(row.keys())
self._counters.add_counter(
f'spanner-rows-inserted-{table}', 1)
else:
batch.insert(table=table,
columns=table_row.keys(),
values=[table_row.values()])
num_mutations += len(table_row.keys())
self._counters.add_counter(f'spanner-rows-inserted-{table}',
1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This block contains several critical issues:

  1. Logic Error (for-else bug): The else at line 204 is aligned with the for loop at line 195. This causes the code to only process the last mutation in the list if it's a dictionary, and it will crash if the last mutation is a list (calling .keys() on a list). The else should be aligned with the if at line 196 to correctly handle both list and dictionary mutation types for every item in mutations.
  2. Incorrect Row Counting: The count is incremented by the number of columns instead of the number of rows. Additionally, the counter at line 203 only adds 1 regardless of how many rows were in the list. The variable should be renamed to mutations_count to follow the repository's naming convention (plural_noun_count).
  3. API Compatibility: The Spanner batch.insert method expects values to be a list of lists. Converting row.values() to a list is recommended.
  4. Guard Checks: Using explicit checks like if row: is preferred over broad exception handling.
Suggested change
for table, table_row in mutations:
if isinstance(table_row, list):
for row in table_row:
batch.insert(table=table,
columns=row.keys(),
values=[row.values()])
num_mutations += len(row.keys())
self._counters.add_counter(
f'spanner-rows-inserted-{table}', 1)
else:
batch.insert(table=table,
columns=table_row.keys(),
values=[table_row.values()])
num_mutations += len(table_row.keys())
self._counters.add_counter(f'spanner-rows-inserted-{table}',
1)
for table, table_row in mutations:
if isinstance(table_row, list):
for row in table_row:
if row:
batch.insert(table=table,
columns=list(row.keys()),
values=[list(row.values())])
mutations_count += 1
self._counters.add_counter(
f'spanner-rows-inserted-{table}', len(table_row))
elif table_row:
batch.insert(table=table,
columns=list(table_row.keys()),
values=[list(table_row.values())])
mutations_count += 1
self._counters.add_counter(f'spanner-rows-inserted-{table}',
1)
References
  1. When naming a variable for a count of items, prefer the pattern plural_noun_count (e.g., records_count) over singular_noun_counts (e.g., record_counts).
  2. Prefer explicit guard checks over broad exception control flow (try...except) for handling nested data structures, as they are easier to reason about.

if default_values:
# Parse default values as prop=value
for default_value in default_values:
key, value = default_value.split("=")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using split("=") without a limit will fail if the value itself contains an equals sign (e.g., property=value=extra). It is safer to use split("=", 1) to ensure it only splits on the first occurrence.

Suggested change
key, value = default_value.split("=")
key, value = default_value.split("=", 1)

value = self._build_value_from_template(template, pvs)
if value:
sub_table_row[col] = value
pvs_key = f'{table}{col}'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pvs_key construction here is inconsistent with the dictionary case at line 174. It is missing an underscore between the table name and column name, which might break template lookups if they expect the Table_Column format (e.g., {TimeSeries_id}).

Suggested change
pvs_key = f'{table}{col}'
pvs_key = f'{table}_{col}'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant