Add APIs to prune completed state for LSPS1/LSPS2/LSPS5#4526
Add APIs to prune completed state for LSPS1/LSPS2/LSPS5#4526f3r10 wants to merge 4 commits intolightningdevkit:mainfrom
Conversation
|
I've assigned @valentinewallace as a reviewer! |
| /// | ||
| /// Returns an [`APIError::APIMisuseError`] if the counterparty has no state, the order is | ||
| /// unknown, or the order is in a non-terminal state. | ||
| pub async fn prune_order( |
There was a problem hiding this comment.
Hmm, is the right API to require downstream code list the specific orders to prune or should we have some kind of "prune all failed orders older than X" API? @tnull wdyt?
There was a problem hiding this comment.
You are right @TheBlueMatt, it would be much better something like this:
handler.prune_orders(client_id, Duration::from_secs(30 * 24 * 3600)).await?;
I am going to update that part. Thanks for the early review
There was a problem hiding this comment.
Yeah, I agree, allowing to prune older entries in an interval is great. At some point we still might want to expose the LSPS{1,2} service state via the API, at which point allowing to drop specific orders would make sense, but not sure if that's not better done in a dedicated follow-up.
There was a problem hiding this comment.
I think it would be better in a follow-up when service state listing exists
There was a problem hiding this comment.
Ready the new method using an interval @tnull @TheBlueMatt
Add `prune_orders(counterparty_node_id, max_age: Duration)` to both `LSPS1ServiceHandler` and `LSPS1ServiceHandlerSync`. It removes all terminal orders (`CompletedAndChannelOpened` / `FailedAndRefunded`) for a given peer that are at least `max_age` old, persists the updated state, and returns the number of entries removed. Passing `Duration::ZERO` prunes all terminal orders immediately regardless of age, which is the recommended approach to unblock a client that has hit the per-peer request limit due to accumulated failed orders. On the `PeerState` layer, `prune_terminal_orders(now, max_age)` uses `retain` for a single-pass removal and sets `needs_persist` only when at least one entry is removed.
25a876b to
ebaa7ed
Compare
|
Trying to add the API for LSPS2, using the same strategy as LSPS1 (using intervals), I realize it is not quite possible because The solution would be to add My question is, should this change be done in this same PR or in a follow-up @tnull @TheBlueMatt wdyt? |
|
Yea, makes sense to add a created-at timestamp in LSPS2 requests imo. |
Add a `created_at: LSPSDateTime` field to `OutboundJITChannel` to record when each JIT channel was created (i.e., when the buy request was accepted by the LSP). This timestamp is needed to implement time-based bulk pruning of completed channel state. The field is persisted as TLV type 10 with a `default_value` of Unix epoch, ensuring old serialized data (without TLV 10) is read back successfully with the epoch sentinel rather than failing deserialization.
Add `LSPS2ServiceHandler::prune_channels` that lets the LSP operator remove all channels in the `PaymentForwarded` terminal state whose `created_at` timestamp is at least `max_age` old. Passing `Duration::ZERO` prunes all terminal channels regardless of age. All associated state is cleaned up atomically: - per-peer `intercept_scid_by_channel_id` and `intercept_scid_by_user_channel_id` - handler-level `peer_by_intercept_scid` and `peer_by_channel_id` A new `PeerState::prune_terminal_channels` helper handles the intra-peer map cleanup and returns the removed `(scid, channel_id)` pairs for the handler to clean up the outer maps. Integration tests cover: non-terminal channels not pruned, unknown counterparty errors, age filtering, and successful bulk prune.
Add `LSPS5ServiceHandler::prune_webhook` that removes a single webhook entry identified by `counterparty_node_id` and `LSPS5AppName`. The method is synchronous, consistent with all other public `notify_*` methods on this handler: it marks the peer state as dirty and relies on the normal `LiquidityManager::persist` loop to flush the change to the KVStore. The method reuses the existing private `PeerState::remove_webhook` helper and returns an `APIError::APIMisuseError` if the counterparty has no registered state or the given `app_name` is not found.
| (4, opening_fee_params, required), | ||
| (6, payment_size_msat, option), | ||
| (8, trust_model, required), | ||
| (10, created_at, (default_value, LSPSDateTime::new_from_duration_since_epoch(Duration::ZERO))), |
There was a problem hiding this comment.
Bug: The default_value of epoch (Duration::ZERO = 1970-01-01T00:00:00Z) means that any pre-existing OutboundJITChannel deserialized from old persisted state (before this field existed) will have created_at = epoch. When prune_channels is called, now.duration_since(epoch) will return ~54 years, so any max_age shorter than that will prune all pre-upgrade terminal channels. This silently breaks the age-filter semantics for operators who upgrade and then call prune_channels expecting only old-enough channels to be pruned.
Consider using a sentinel value (e.g., Option<LSPSDateTime>) and skipping the age check for channels without a created_at, or document this prominently so operators know the first post-upgrade prune with max_age > 0 will still prune all legacy terminal channels.
| let should_prune = max_age == Duration::ZERO | ||
| || now.duration_since(&channel.created_at) >= max_age; |
There was a problem hiding this comment.
Bug (pre-existing, but used by new code): LSPSDateTime::duration_since uses abs_diff, so it returns the absolute time difference. If now < created_at (e.g., due to clock skew or time adjustments), this returns a positive duration equal to the gap, which could exceed max_age and cause a channel to be incorrectly pruned even though it was just created.
For age-based pruning, a saturating subtraction (return Duration::ZERO when now < created_at) would be correct — an entry created "in the future" relative to now should never be considered old enough to prune. Same issue applies to the LSPS1 usage in prune_terminal_orders.
| let mut outer_state_lock = self.per_peer_state.write().unwrap(); | ||
| match outer_state_lock.get_mut(&counterparty_node_id) { | ||
| Some(peer_state) => { | ||
| if !peer_state.remove_webhook(app_name) { |
There was a problem hiding this comment.
Nit: remove_webhook unconditionally sets needs_persist |= true (line 822 of peer state) even when no webhook is found. When prune_webhook is called with a non-existent app_name, remove_webhook returns false and this method correctly returns an error — but the needs_persist flag has already been set, causing a needless persistence cycle. Consider moving the needs_persist assignment inside the retain's removal branch in remove_webhook.
Review SummaryIssues FoundInline comments posted:
Cross-cutting observations
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4526 +/- ##
==========================================
+ Coverage 86.18% 87.03% +0.84%
==========================================
Files 160 161 +1
Lines 108410 109304 +894
Branches 108410 109304 +894
==========================================
+ Hits 93433 95132 +1699
+ Misses 12344 11686 -658
+ Partials 2633 2486 -147
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Closes #4444.
Add explicit pruning APIs to
LSPS1ServiceHandler,LSPS2ServiceHandler, andLSPS5ServiceHandlerso LSP operators can reclaim memory from completed historical state.