docs: add upstream sync process documentation#1524
Open
timsaucer wants to merge 9 commits intoapache:mainfrom
Open
docs: add upstream sync process documentation#1524timsaucer wants to merge 9 commits intoapache:mainfrom
timsaucer wants to merge 9 commits intoapache:mainfrom
Conversation
Document the three-PR workflow used to sync to a newer upstream apache/datafusion version: bump crate deps + fix breakage, consolidate transitive deps, then fill API and documentation gaps via /check-upstream. Cross-reference from dev/release/README.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New AI agent skill at .ai/skills/audit-skill-md/SKILL.md to keep the user-facing skills/datafusion_python/SKILL.md in sync with the public Python API. Audits SessionContext, DataFrame, Expr, and functions surfaces for new APIs not covered, stale mentions, examples that drifted from idiomatic style, and missing version notes. Wired into PR 3 of the upstream sync workflow documented in dev/release/upstream-sync.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a checklist item to "Preparing the main Branch" pointing release managers at dev/release/upstream-sync.md so the crate bump, dependency consolidation, and /check-upstream and /audit-skill-md passes are confirmed done before the release branch is cut. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace `cargo update -p datafusion` with an explicit multi-`-p` invocation listing every `datafusion-*` workspace dependency, so PR 1 of the upstream-sync workflow refreshes only the datafusion family and leaves other transitives for PR 2 to consolidate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR 1 step 1 incorrectly stated downstream `datafusion-*` crates are pinned in `crates/core/Cargo.toml`. Pins live in the root `[workspace.dependencies]`; per-crate manifests inherit via `workspace = true`. Reword step 1 to point at the right file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR 1 step 1 must also bump `[workspace.package].version` because the `datafusion-python` major version tracks the upstream `datafusion` major. The previous reword dropped that instruction. Reinstate it alongside the `[workspace.dependencies]` updates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Frontmatter description referenced "requires upstream DataFusion vX", but the body of the skill settles on the `datafusion-python NN` form (consistent with the package/upstream-major equivalence). Switch the description to match so the skill speaks one language end to end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audit-skill-md documents the order `/check-upstream` -> `/make-pythonic` (optional) -> `/audit-skill-md`, but PR 3 of the upstream-sync workflow only listed the first and last. Insert the make-pythonic pass as step 3 so signatures get aligned before the SKILL.md audit, avoiding example churn. Drops the orphan trailing paragraph in favor of inline guidance on when to defer larger reshapes to their own PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace literal `version = "53.0.0"` example with a pointer to the `[workspace.package]` field plus an `NN.0.0` placeholder so the skill prose does not drift each major bump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Part of #1394 (PR 5 in the implementation plan).
Rationale for this change
The repository periodically syncs to a newer upstream
apache/datafusionversion. The workflow we have converged on splits that work into three sequential PRs (crate bump + breakage fixes, transitive dependency consolidation, then API and documentation gap filling), but the workflow has lived only in tribal knowledge. New maintainers have no document to follow, and the agent skills that automate parts of the third step (/check-upstream, the new/audit-skill-md) have no canonical entry point that describes when to invoke them.This PR captures the workflow in
dev/release/upstream-sync.md, adds theaudit-skill-mdagent skill that PR 3 of the workflow relies on, and wires a verification checkbox into the existing release instructions so the sync is confirmed complete before a release branch is cut.What changes are included in this PR?
dev/release/upstream-sync.md— describes the three-PR workflow: bump DataFusion crate dependencies and fix breakage, consolidate transitive dependencies, and fill API and documentation gaps via/check-upstreamand/audit-skill-md. Cross-references existing skills and prior reference PRs (Upgrade to Datafusion 51 #1311, Prepare for DF52 release #1337)..ai/skills/audit-skill-md/SKILL.md— new agent skill that auditsskills/datafusion_python/SKILL.mdagainst the current public Python API. CoversSessionContext,DataFrame,Expr, andfunctionssurfaces. Reports four kinds of findings (new APIs not covered, stale mentions, examples that drifted from idiomatic style, missing version notes) and can apply edits directly when asked. Version notes use a singledatafusion-python NNform because the package and upstream crate share a major version.dev/release/README.md— adds an "Upstream Sync" pointer in the intro and a checklist item under "Preparing themainBranch" that confirms the sync workflow has been completed.Are there any user-facing changes?
No. Documentation and agent-skill files only. No code or public API changes.