Skip to content

docs: add upstream sync process documentation#1524

Open
timsaucer wants to merge 9 commits intoapache:mainfrom
timsaucer:feat/update-sync-process-documentation
Open

docs: add upstream sync process documentation#1524
timsaucer wants to merge 9 commits intoapache:mainfrom
timsaucer:feat/update-sync-process-documentation

Conversation

@timsaucer
Copy link
Copy Markdown
Member

Which issue does this PR close?

Part of #1394 (PR 5 in the implementation plan).

Rationale for this change

The repository periodically syncs to a newer upstream apache/datafusion version. The workflow we have converged on splits that work into three sequential PRs (crate bump + breakage fixes, transitive dependency consolidation, then API and documentation gap filling), but the workflow has lived only in tribal knowledge. New maintainers have no document to follow, and the agent skills that automate parts of the third step (/check-upstream, the new /audit-skill-md) have no canonical entry point that describes when to invoke them.

This PR captures the workflow in dev/release/upstream-sync.md, adds the audit-skill-md agent skill that PR 3 of the workflow relies on, and wires a verification checkbox into the existing release instructions so the sync is confirmed complete before a release branch is cut.

What changes are included in this PR?

  • dev/release/upstream-sync.md — describes the three-PR workflow: bump DataFusion crate dependencies and fix breakage, consolidate transitive dependencies, and fill API and documentation gaps via /check-upstream and /audit-skill-md. Cross-references existing skills and prior reference PRs (Upgrade to Datafusion 51 #1311, Prepare for DF52 release #1337).
  • .ai/skills/audit-skill-md/SKILL.md — new agent skill that audits skills/datafusion_python/SKILL.md against the current public Python API. Covers SessionContext, DataFrame, Expr, and functions surfaces. Reports four kinds of findings (new APIs not covered, stale mentions, examples that drifted from idiomatic style, missing version notes) and can apply edits directly when asked. Version notes use a single datafusion-python NN form because the package and upstream crate share a major version.
  • dev/release/README.md — adds an "Upstream Sync" pointer in the intro and a checklist item under "Preparing the main Branch" that confirms the sync workflow has been completed.

Are there any user-facing changes?

No. Documentation and agent-skill files only. No code or public API changes.

timsaucer and others added 9 commits May 3, 2026 10:58
Document the three-PR workflow used to sync to a newer upstream
apache/datafusion version: bump crate deps + fix breakage, consolidate
transitive deps, then fill API and documentation gaps via
/check-upstream. Cross-reference from dev/release/README.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New AI agent skill at .ai/skills/audit-skill-md/SKILL.md to keep the
user-facing skills/datafusion_python/SKILL.md in sync with the public
Python API. Audits SessionContext, DataFrame, Expr, and functions
surfaces for new APIs not covered, stale mentions, examples that drifted
from idiomatic style, and missing version notes. Wired into PR 3 of the
upstream sync workflow documented in dev/release/upstream-sync.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a checklist item to "Preparing the main Branch" pointing release
managers at dev/release/upstream-sync.md so the crate bump, dependency
consolidation, and /check-upstream and /audit-skill-md passes are
confirmed done before the release branch is cut.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace `cargo update -p datafusion` with an explicit multi-`-p`
invocation listing every `datafusion-*` workspace dependency, so PR 1
of the upstream-sync workflow refreshes only the datafusion family
and leaves other transitives for PR 2 to consolidate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR 1 step 1 incorrectly stated downstream `datafusion-*` crates are
pinned in `crates/core/Cargo.toml`. Pins live in the root
`[workspace.dependencies]`; per-crate manifests inherit via
`workspace = true`. Reword step 1 to point at the right file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR 1 step 1 must also bump `[workspace.package].version` because the
`datafusion-python` major version tracks the upstream `datafusion`
major. The previous reword dropped that instruction. Reinstate it
alongside the `[workspace.dependencies]` updates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Frontmatter description referenced "requires upstream DataFusion vX",
but the body of the skill settles on the `datafusion-python NN` form
(consistent with the package/upstream-major equivalence). Switch the
description to match so the skill speaks one language end to end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audit-skill-md documents the order
`/check-upstream` -> `/make-pythonic` (optional) -> `/audit-skill-md`,
but PR 3 of the upstream-sync workflow only listed the first and last.
Insert the make-pythonic pass as step 3 so signatures get aligned
before the SKILL.md audit, avoiding example churn. Drops the orphan
trailing paragraph in favor of inline guidance on when to defer
larger reshapes to their own PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace literal `version = "53.0.0"` example with a pointer to the
`[workspace.package]` field plus an `NN.0.0` placeholder so the skill
prose does not drift each major bump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant