Skip to content

[FSTORE-2030] Add support for specifying lookback windows for PIT queries#583

Merged
manu-sj merged 2 commits into
logicalclocks:mainfrom
manu-sj:FSTORE-2030
Jun 2, 2026
Merged

[FSTORE-2030] Add support for specifying lookback windows for PIT queries#583
manu-sj merged 2 commits into
logicalclocks:mainfrom
manu-sj:FSTORE-2030

Conversation

@manu-sj
Copy link
Copy Markdown
Contributor

@manu-sj manu-sj commented May 21, 2026

Summary

  • Adds a user-guide section Lookback window for PIT joins to feature_view/batch-data.md covering the two modes, the dict and dataclass call shapes, partition pruning behavior, and the one-sided lower-only form.
  • Cross-links from feature_view/training-data.md so users hitting create_training_data find the same reference.

JIRA

FSTORE-2030

Test plan

  • One-sentence-per-line convention respected.
  • Python code blocks valid Python (run through ruff via the workspace policy).
  • Reviewer to verify the page renders correctly in the mkdocs preview.

Companion PRs

  • Backend: logicalclocks/hopsworks-ee → branch FSTORE-2030
  • SDK: logicalclocks/hopsworks-api → branch FSTORE-2030
  • Integration tests: logicalclocks/loadtest → branch FSTORE-2030

@manu-sj manu-sj marked this pull request as ready for review May 23, 2026 23:55
@manu-sj manu-sj force-pushed the FSTORE-2030 branch 2 times, most recently from 37944b9 to ddb1106 Compare May 31, 2026 19:03
@manu-sj manu-sj requested a review from Copilot June 1, 2026 05:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Hopsworks Feature Store user guides to document the new lookback option for Point-in-Time (PIT) joins, including uniform and per-Feature-Group modes, and adds cross-links between batch-data retrieval and training-data materialization.

Changes:

  • Adds a new “Lookback window for PIT joins” section to the batch-data guide, including Lookback / Lookbacks examples and pruning guidance.
  • Adds a corresponding training-data section that references the batch-data lookback documentation and notes persistence of the resolved window.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

File Description
docs/user_guides/fs/feature_view/batch-data.md Adds the main lookback documentation for PIT joins, with uniform/per-FG shapes and pruning notes.
docs/user_guides/fs/feature_view/training-data.md Adds a training-data-facing lookback section and links readers to the batch-data lookback reference.

Comment thread docs/user_guides/fs/feature_view/batch-data.md Outdated
Comment thread docs/user_guides/fs/feature_view/batch-data.md Outdated
Comment thread docs/user_guides/fs/feature_view/batch-data.md Outdated
Comment thread docs/user_guides/fs/feature_view/batch-data.md
Comment thread docs/user_guides/fs/feature_view/batch-data.md Outdated
Comment thread docs/user_guides/fs/feature_view/training-data.md
Comment thread docs/user_guides/fs/feature_view/batch-data.md
Comment thread docs/user_guides/fs/feature_view/batch-data.md
Comment thread docs/user_guides/fs/feature_view/batch-data.md Outdated
Comment thread docs/user_guides/fs/feature_view/batch-data.md Outdated
manu-sj and others added 2 commits June 2, 2026 07:49
…ries

https://hopsworks.atlassian.net/browse/FSTORE-2030

PIT joins on partitioned feature groups defeat partition pruning: the
join predicate (`right_fg.event_time <= root_fg.event_time`) is a range
rather than an equality, so the engine has to scan an unbounded slice of
history on every read. As feature groups grow with daily ingestion this
linearly inflates scan cost. Lookback windows expose an explicit upper
bound on the historical depth a PIT join may reach. This repo carries
the user-facing reference for the new parameter.

The batch-data and training-data user guides describe the `lookback`
argument on `FeatureView.get_batch_data` and `create_training_data`. The
pages cover the two-bound `Lookback` value, the trade-off between
partition-key mode (pushed down to file-listing) and event-time mode
(engine-dependent), and the per-feature-group override carrier. Each
mode is shown in both the instance form and the dict-equivalent form
with concrete partition-column placeholders and literal dates.

Reviewed-by: GitHub Copilot <Copilot@users.noreply.github.com>
Reviewed-by: OpenAI Codex (GPT-5 via codex-plugin-cc 1.0.4) <codex@openai.com>
Signed-off-by: Manu Sathyarajan Joseph <manu.joseph@logicalclocks.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ries

https://hopsworks.atlassian.net/browse/FSTORE-2030

Rename the lookback API surface across the SDK, backend, and docs:
the feature-view-level container is now Lookback (top-level wire
field "lookback") and the per-feature-group bound is
FeatureGroupLookback, with feature_group_lookbacks naming the per-FG
override map. Lookback keys are canonically uppercase (EVENT_TIME,
PARTITION_KEY) with case-insensitive validation. The backend now
echoes the persisted lookback configuration on training dataset
responses so the client no longer rehydrates it locally.

The batch-data and training-data guides switch their examples to the
renamed classes, the feature_group_lookbacks keyword, and the
uppercase canonical keys.

Signed-off-by: Manu Sathyarajan Joseph <manu.joseph@logicalclocks.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@manu-sj manu-sj merged commit 9e7d910 into logicalclocks:main Jun 2, 2026
1 check passed
manu-sj added a commit that referenced this pull request Jun 3, 2026
…T queries (#583) (#597)

* [FSTORE-2030] Add support for specifying lookback windows for PIT queries
https://hopsworks.atlassian.net/browse/FSTORE-2030

PIT joins on partitioned feature groups defeat partition pruning: the
join predicate (`right_fg.event_time <= root_fg.event_time`) is a range
rather than an equality, so the engine has to scan an unbounded slice of
history on every read. As feature groups grow with daily ingestion this
linearly inflates scan cost. Lookback windows expose an explicit upper
bound on the historical depth a PIT join may reach. This repo carries
the user-facing reference for the new parameter.

The batch-data and training-data user guides describe the `lookback`
argument on `FeatureView.get_batch_data` and `create_training_data`. The
pages cover the two-bound `Lookback` value, the trade-off between
partition-key mode (pushed down to file-listing) and event-time mode
(engine-dependent), and the per-feature-group override carrier. Each
mode is shown in both the instance form and the dict-equivalent form
with concrete partition-column placeholders and literal dates.

Reviewed-by: GitHub Copilot <Copilot@users.noreply.github.com>
Reviewed-by: OpenAI Codex (GPT-5 via codex-plugin-cc 1.0.4) <codex@openai.com>



* [FSTORE-2030] Add support for specifying lookback windows for PIT queries
https://hopsworks.atlassian.net/browse/FSTORE-2030

Rename the lookback API surface across the SDK, backend, and docs:
the feature-view-level container is now Lookback (top-level wire
field "lookback") and the per-feature-group bound is
FeatureGroupLookback, with feature_group_lookbacks naming the per-FG
override map. Lookback keys are canonically uppercase (EVENT_TIME,
PARTITION_KEY) with case-insensitive validation. The backend now
echoes the persisted lookback configuration on training dataset
responses so the client no longer rehydrates it locally.

The batch-data and training-data guides switch their examples to the
renamed classes, the feature_group_lookbacks keyword, and the
uppercase canonical keys.




---------

Signed-off-by: Manu Sathyarajan Joseph <manu.joseph@logicalclocks.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants