Script to upload time series and observations to spanner graph#1957
Script to upload time series and observations to spanner graph#1957ajaits wants to merge 8 commits intodatacommonsorg:masterfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a script for uploading statistical variable observations from CSV files to Google Cloud Spanner using a template-based configuration. The review identified several critical issues, including a reference to an undefined attribute, a logic error in the mutation insertion loop due to a misaligned 'else' block, and incorrect row counting. Other improvements suggested include more robust string splitting for default values, consistent key naming for template lookups, and ensuring compatibility with the Spanner API by passing lists instead of dictionary views.
|
|
||
| num_mutations = 0 | ||
| if not self._dry_run: | ||
| with self.database.batch() as batch: |
| for table, table_row in mutations: | ||
| if isinstance(table_row, list): | ||
| for row in table_row: | ||
| batch.insert(table=table, | ||
| columns=row.keys(), | ||
| values=[row.values()]) | ||
| num_mutations += len(row.keys()) | ||
| self._counters.add_counter( | ||
| f'spanner-rows-inserted-{table}', 1) | ||
| else: | ||
| batch.insert(table=table, | ||
| columns=table_row.keys(), | ||
| values=[table_row.values()]) | ||
| num_mutations += len(table_row.keys()) | ||
| self._counters.add_counter(f'spanner-rows-inserted-{table}', | ||
| 1) |
There was a problem hiding this comment.
This block contains several critical issues:
- Logic Error (for-else bug): The
elseat line 204 is aligned with theforloop at line 195. This causes the code to only process the last mutation in the list if it's a dictionary, and it will crash if the last mutation is a list (calling.keys()on a list). Theelseshould be aligned with theifat line 196 to correctly handle both list and dictionary mutation types for every item inmutations. - Incorrect Row Counting: The count is incremented by the number of columns instead of the number of rows. Additionally, the counter at line 203 only adds 1 regardless of how many rows were in the list. The variable should be renamed to
mutations_countto follow the repository's naming convention (plural_noun_count). - API Compatibility: The Spanner
batch.insertmethod expectsvaluesto be a list of lists. Convertingrow.values()to a list is recommended. - Guard Checks: Using explicit checks like
if row:is preferred over broad exception handling.
| for table, table_row in mutations: | |
| if isinstance(table_row, list): | |
| for row in table_row: | |
| batch.insert(table=table, | |
| columns=row.keys(), | |
| values=[row.values()]) | |
| num_mutations += len(row.keys()) | |
| self._counters.add_counter( | |
| f'spanner-rows-inserted-{table}', 1) | |
| else: | |
| batch.insert(table=table, | |
| columns=table_row.keys(), | |
| values=[table_row.values()]) | |
| num_mutations += len(table_row.keys()) | |
| self._counters.add_counter(f'spanner-rows-inserted-{table}', | |
| 1) | |
| for table, table_row in mutations: | |
| if isinstance(table_row, list): | |
| for row in table_row: | |
| if row: | |
| batch.insert(table=table, | |
| columns=list(row.keys()), | |
| values=[list(row.values())]) | |
| mutations_count += 1 | |
| self._counters.add_counter( | |
| f'spanner-rows-inserted-{table}', len(table_row)) | |
| elif table_row: | |
| batch.insert(table=table, | |
| columns=list(table_row.keys()), | |
| values=[list(table_row.values())]) | |
| mutations_count += 1 | |
| self._counters.add_counter(f'spanner-rows-inserted-{table}', | |
| 1) |
References
- When naming a variable for a count of items, prefer the pattern plural_noun_count (e.g., records_count) over singular_noun_counts (e.g., record_counts).
- Prefer explicit guard checks over broad exception control flow (try...except) for handling nested data structures, as they are easier to reason about.
| if default_values: | ||
| # Parse default values as prop=value | ||
| for default_value in default_values: | ||
| key, value = default_value.split("=") |
There was a problem hiding this comment.
| value = self._build_value_from_template(template, pvs) | ||
| if value: | ||
| sub_table_row[col] = value | ||
| pvs_key = f'{table}{col}' |
There was a problem hiding this comment.
The pvs_key construction here is inconsistent with the dictionary case at line 174. It is missing an underscore between the table name and column name, which might break template lookups if they expect the Table_Column format (e.g., {TimeSeries_id}).
| pvs_key = f'{table}{col}' | |
| pvs_key = f'{table}_{col}' |
No description provided.