HIVE-29546: Iceberg: [V3] Support of ROW LINEAGE in COMPACTION#6407
HIVE-29546: Iceberg: [V3] Support of ROW LINEAGE in COMPACTION#6407kokila-19 wants to merge 2 commits intoapache:masterfrom
Conversation
ddcb497 to
305c54e
Compare
305c54e to
5859f42
Compare
051059e to
a01c6a1
Compare
a01c6a1 to
ec71086
Compare
| private static void setRowLineageConfFlag(Configuration conf, boolean enabled) { | ||
| if (enabled) { | ||
| conf.setBoolean(SessionStateUtil.ROW_LINEAGE, true); | ||
| } else { | ||
| conf.unset(SessionStateUtil.ROW_LINEAGE); | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Enable the row lineage session flag for the current statement execution. | ||
| * Returns {@code true} if the flag was enabled | ||
| */ | ||
| public static void enableRowLineage(SessionState sessionState) { | ||
| setRowLineageConfFlag(sessionState.getConf(), true); | ||
| } | ||
|
|
||
| public static void disableRowLineage(SessionState sessionState) { | ||
| setRowLineageConfFlag(sessionState.getConf(), false); | ||
| } | ||
|
|
There was a problem hiding this comment.
why can't we directly do
public static void enableRowLineage(SessionState sessionState) {
sessionState.getConf().setBoolean(SessionStateUtil.ROW_LINEAGE, true);
}
public static void disableRowLineage(SessionState sessionState) {
sessionState.getConf().setBoolean(SessionStateUtil.ROW_LINEAGE, false);
}
You have javadoc for one & not for others, considering it is a util class we can drop it
| private static String buildSelectColumnList(Table icebergTable, HiveConf conf) { | ||
| return icebergTable.schema().columns().stream() | ||
| .map(Types.NestedField::name) | ||
| .map(col -> HiveUtils.unparseIdentifier(col, conf)) | ||
| .collect(Collectors.joining(", ")); |
There was a problem hiding this comment.
I don't think this logic should kick in like this. If rowLineage isn't enabled, it should just return *, like before.
If rowLineage is enabled add the name from ROW_LINEAGE_COLUMNS_TO_FILE_NAME
There was a problem hiding this comment.
I’ve refactored the code to perform the row lineage check only once and handle all related changes accordingly.
| public static String getRowLineageSelectColumns(boolean rowLineageEnabled) { | ||
| return rowLineageEnabled | ||
| ? ", " + VirtualColumn.ROW_LINEAGE_ID.getName() + ", " + VirtualColumn.LAST_UPDATED_SEQUENCE_NUMBER.getName() | ||
| : ""; |
There was a problem hiding this comment.
Can you change it getRowLineageColumnsForCompaction
| if (rowLineageEnabled) { | ||
| RowLineageUtils.enableRowLineage(sessionState); | ||
| LOG.debug("Row lineage flag set for compaction of table {}", compactTableName); | ||
| } |
There was a problem hiding this comment.
can we not do it when we add the columns for row lineage, would avoid redundant checking rowLineageEnabled
ec71086 to
b9ca107
Compare
b9ca107 to
2c3fb97
Compare
2c3fb97 to
77767d4
Compare
|



What changes were proposed in this pull request?
Preserve Iceberg v3 row lineage during compaction by generating compaction rewrite queries that carry row-lineage values correctly.
Propagate the row-lineage flag reliably into write job properties using RowLineageUtils.isRowLineageInsert(conf).
Why are the changes needed?
Compaction is implemented as INSERT OVERWRITE, without special handling it can rewrite data with new row-lineage values.
Does this PR introduce any user-facing change?
No, this PR does not introduce any user-facing changes. It adds internal support for row lineage during compaction.
How was this patch tested?
qtest