Skip to content

Add APIs to prune completed state for LSPS1/LSPS2/LSPS5#4526

Open
f3r10 wants to merge 4 commits intolightningdevkit:mainfrom
f3r10:feat/add_apis_to_prune_completed_state
Open

Add APIs to prune completed state for LSPS1/LSPS2/LSPS5#4526
f3r10 wants to merge 4 commits intolightningdevkit:mainfrom
f3r10:feat/add_apis_to_prune_completed_state

Conversation

@f3r10
Copy link
Copy Markdown

@f3r10 f3r10 commented Mar 31, 2026

Closes #4444.

Add explicit pruning APIs to LSPS1ServiceHandler, LSPS2ServiceHandler, and LSPS5ServiceHandler so LSP operators can reclaim memory from completed historical state.

@ldk-reviews-bot
Copy link
Copy Markdown

ldk-reviews-bot commented Mar 31, 2026

I've assigned @valentinewallace as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

///
/// Returns an [`APIError::APIMisuseError`] if the counterparty has no state, the order is
/// unknown, or the order is in a non-terminal state.
pub async fn prune_order(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is the right API to require downstream code list the specific orders to prune or should we have some kind of "prune all failed orders older than X" API? @tnull wdyt?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right @TheBlueMatt, it would be much better something like this:
handler.prune_orders(client_id, Duration::from_secs(30 * 24 * 3600)).await?;

I am going to update that part. Thanks for the early review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree, allowing to prune older entries in an interval is great. At some point we still might want to expose the LSPS{1,2} service state via the API, at which point allowing to drop specific orders would make sense, but not sure if that's not better done in a dedicated follow-up.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better in a follow-up when service state listing exists

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready the new method using an interval @tnull @TheBlueMatt

Add `prune_orders(counterparty_node_id, max_age: Duration)` to both
`LSPS1ServiceHandler` and `LSPS1ServiceHandlerSync`. It removes all
terminal orders (`CompletedAndChannelOpened` / `FailedAndRefunded`) for
a given peer that are at least `max_age` old, persists the updated state,
and returns the number of entries removed.

Passing `Duration::ZERO` prunes all terminal orders immediately regardless
of age, which is the recommended approach to unblock a client that has
hit the per-peer request limit due to accumulated failed orders.

On the `PeerState` layer, `prune_terminal_orders(now, max_age)` uses
`retain` for a single-pass removal and sets `needs_persist` only when
at least one entry is removed.
@f3r10 f3r10 force-pushed the feat/add_apis_to_prune_completed_state branch from 25a876b to ebaa7ed Compare April 8, 2026 17:40
@f3r10
Copy link
Copy Markdown
Author

f3r10 commented Apr 8, 2026

Trying to add the API for LSPS2, using the same strategy as LSPS1 (using intervals), I realize it is not quite possible because OutboundJITChannel has no created_at timestamp.
It would only be possible to create a function like this prune_channel(counterparty_node_id, intercept_scid), but it would be necessary to list all scids

The solution would be to add created_at: LSPSDateTime to OutboundJITChannel (new TLV field — backwards-compatible since missing fields default on read).

My question is, should this change be done in this same PR or in a follow-up @tnull @TheBlueMatt wdyt?

@f3r10 f3r10 requested review from TheBlueMatt and tnull April 8, 2026 17:49
@TheBlueMatt
Copy link
Copy Markdown
Collaborator

Yea, makes sense to add a created-at timestamp in LSPS2 requests imo.

f3r10 added 3 commits April 13, 2026 16:55
Add a `created_at: LSPSDateTime` field to `OutboundJITChannel` to
record when each JIT channel was created (i.e., when the buy request
was accepted by the LSP). This timestamp is needed to implement
time-based bulk pruning of completed channel state.

The field is persisted as TLV type 10 with a `default_value` of
Unix epoch, ensuring old serialized data (without TLV 10) is read
back successfully with the epoch sentinel rather than failing
deserialization.
Add `LSPS2ServiceHandler::prune_channels` that lets the LSP operator
remove all channels in the `PaymentForwarded` terminal state whose
`created_at` timestamp is at least `max_age` old.  Passing
 `Duration::ZERO` prunes all terminal channels regardless of age.  All associated state is cleaned up atomically:

- per-peer `intercept_scid_by_channel_id` and
`intercept_scid_by_user_channel_id`
 - handler-level `peer_by_intercept_scid` and `peer_by_channel_id`

A new `PeerState::prune_terminal_channels` helper handles the intra-peer
map cleanup and returns the removed `(scid, channel_id)` pairs for the handler to clean up the outer maps.

Integration tests cover: non-terminal channels not pruned, unknown
counterparty errors, age filtering, and successful bulk prune.
Add `LSPS5ServiceHandler::prune_webhook` that removes a single webhook
entry identified by `counterparty_node_id` and `LSPS5AppName`. The
method is synchronous, consistent with all other public `notify_*`
methods on this handler: it marks the peer state as dirty and relies on
the normal `LiquidityManager::persist` loop to flush the change to
the KVStore.

The method reuses the existing private `PeerState::remove_webhook`
helper and returns an `APIError::APIMisuseError` if the counterparty
has no registered state or the given `app_name` is not found.
@f3r10 f3r10 marked this pull request as ready for review April 16, 2026 16:00
(4, opening_fee_params, required),
(6, payment_size_msat, option),
(8, trust_model, required),
(10, created_at, (default_value, LSPSDateTime::new_from_duration_since_epoch(Duration::ZERO))),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The default_value of epoch (Duration::ZERO = 1970-01-01T00:00:00Z) means that any pre-existing OutboundJITChannel deserialized from old persisted state (before this field existed) will have created_at = epoch. When prune_channels is called, now.duration_since(epoch) will return ~54 years, so any max_age shorter than that will prune all pre-upgrade terminal channels. This silently breaks the age-filter semantics for operators who upgrade and then call prune_channels expecting only old-enough channels to be pruned.

Consider using a sentinel value (e.g., Option<LSPSDateTime>) and skipping the age check for channels without a created_at, or document this prominently so operators know the first post-upgrade prune with max_age > 0 will still prune all legacy terminal channels.

Comment on lines +684 to +685
let should_prune = max_age == Duration::ZERO
|| now.duration_since(&channel.created_at) >= max_age;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug (pre-existing, but used by new code): LSPSDateTime::duration_since uses abs_diff, so it returns the absolute time difference. If now < created_at (e.g., due to clock skew or time adjustments), this returns a positive duration equal to the gap, which could exceed max_age and cause a channel to be incorrectly pruned even though it was just created.

For age-based pruning, a saturating subtraction (return Duration::ZERO when now < created_at) would be correct — an entry created "in the future" relative to now should never be considered old enough to prune. Same issue applies to the LSPS1 usage in prune_terminal_orders.

let mut outer_state_lock = self.per_peer_state.write().unwrap();
match outer_state_lock.get_mut(&counterparty_node_id) {
Some(peer_state) => {
if !peer_state.remove_webhook(app_name) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: remove_webhook unconditionally sets needs_persist |= true (line 822 of peer state) even when no webhook is found. When prune_webhook is called with a non-existent app_name, remove_webhook returns false and this method correctly returns an error — but the needs_persist flag has already been set, causing a needless persistence cycle. Consider moving the needs_persist assignment inside the retain's removal branch in remove_webhook.

@ldk-claude-review-bot
Copy link
Copy Markdown
Collaborator

Review Summary

Issues Found

Inline comments posted:

  1. lightning-liquidity/src/lsps2/service.rs:512created_at defaults to epoch (1970) for pre-existing persisted channels. This silently causes all pre-upgrade terminal channels to be pruned regardless of max_age, breaking the age-filter contract on upgrade.

  2. lightning-liquidity/src/lsps2/service.rs:684-685LSPSDateTime::duration_since uses abs_diff, so if now < created_at (clock skew), channels/orders are incorrectly considered old and get pruned. Same issue applies to the LSPS1 usage in prune_terminal_orders at lsps1/peer_state.rs:419.

  3. lightning-liquidity/src/lsps5/service.rs:600remove_webhook unconditionally sets needs_persist = true even when no webhook was found, causing unnecessary persistence when prune_webhook is called with a non-existent app_name.

Cross-cutting observations

  • Persistence inconsistency across handlers: LSPS1 prune_orders and LSPS2 prune_channels both call persist_peer_state() immediately after pruning. LSPS5 prune_webhook does not — it defers to the next LiquidityManager::persist call. This is documented but means state could be lost on crash. Consider aligning the persistence strategy or documenting the risk more prominently.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 16, 2026

Codecov Report

❌ Patch coverage is 84.31953% with 53 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.03%. Comparing base (47122e8) to head (e56052d).
⚠️ Report is 94 commits behind head on main.

Files with missing lines Patch % Lines
lightning-liquidity/src/lsps1/service.rs 0.00% 39 Missing ⚠️
lightning-liquidity/src/lsps2/service.rs 92.45% 8 Missing and 4 partials ⚠️
lightning-liquidity/src/lsps1/peer_state.rs 99.14% 0 Missing and 1 partial ⚠️
lightning-liquidity/src/manager.rs 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4526      +/-   ##
==========================================
+ Coverage   86.18%   87.03%   +0.84%     
==========================================
  Files         160      161       +1     
  Lines      108410   109304     +894     
  Branches   108410   109304     +894     
==========================================
+ Hits        93433    95132    +1699     
+ Misses      12344    11686     -658     
+ Partials     2633     2486     -147     
Flag Coverage Δ
fuzzing 39.54% <0.00%> (?)
tests 86.14% <84.31%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add APIs to prune completed state for LSPS1/LSPS2/LSPS5

5 participants