pdp-sim

A deterministic discrete-event simulator for blob-aggregation strategies in the fil.one storage layer. It answers one question with numbers instead of intuition:

Given how clients actually write and delete data, which packing / flush / delete-batching strategy minimizes cost without blowing the delete-latency SLA — and by how much?

It models how a storage provider (SP) aggregates client blobs into PDP pieces, commits them to chain, and times deletes, and it accounts for gas, storage cost, infra, client revenue, and delete-enactment latency (including the upper-bound SLA the team cares about) over a simulated horizon.

This README explains what is modeled, how, and why the outputs are the right things to measure. For the underlying math see docs/MODELING.md; for the architecture see docs/DESIGN.md; for the Go contract see docs/INTERFACES.md.

1. The problem, precisely

fil.one sits between storage clients and Filecoin storage providers:

   clients ──$/TB/month──▶ fil.one ──$/TB/month──▶ SP ──gas per chain op──▶ Filecoin (PDP)
   (S3 PUT/DELETE)                  (storage)            + disk/power (infra)

Clients PUT arbitrarily-sized blobs. The SP can't put every tiny blob on chain individually — gas would dwarf the data — so it aggregates many blobs into one PDP piece (a committed root CID) and commits pieces to a data set that is proven on chain every proving period. Three decisions drive the whole cost structure, and they fight each other:

Lever	Wait longer / pack bigger ⇒	Cost of doing so
How to pack (aggregate size)	fewer adds per byte; gas amortized	a delete inside a big piece forces a bigger rewrite
When to flush (to chain)	bigger, cheaper batches	data sits off-chain longer (held but unbilled/at-risk)
How long to batch deletes	fewer remove txs; gas saved	higher delete-enact latency (SLA risk) + longer paying-for-deleted-data

The punchline the team cares about: batching is not free. It trades gas for latency and for a window where the SP stores (and proves, and pays for) data the client already asked to delete. There is no closed-form answer because it depends on the workload — the size distribution and the churn (how fast blobs die). So we simulate.

Why a simulator and not a spreadsheet: the costs are time-integrals over a changing system (bytes on chain rise and fall as pieces are added, proven each period, and removed) and the deletes are path-dependent (which blobs share a piece at write time determines the rewrite cost at delete time). A spreadsheet can't capture "this blob died while still buffered so it never hit chain" or "these two deletes landed in the same batch window so they shared one remove tx." A discrete-event model can, exactly.

2. What it measures — and why these are the right metrics

Every output maps to a real dollar or a real SLA commitment. A run prints (see §6 for a full table):

Metric	What it is	Why it's the metric that matters
gas: add / remove / proving / create	per PDP op class, in FVM units + FIL (price-independent) and USD	gas is the SP's per-op chain cost; broken out so you see which op dominates, and reported in FIL so it survives FIL-price changes
gas: proving (recurring)	per-data-set per-period proving cost	the cost floor at low churn — it runs forever regardless of activity
write amplification	chain bytes written ÷ unique bytes stored	how much survivor data gets rewritten on deletes — the hidden cost of packing
zombie storage %	deleted-but-not-yet-removed byte-seconds ÷ total	the share of storage-time spent on data the client already deleted (pure loss)
delete latency mean/p95/p99/MAX	request → on-chain removal	the SLA metric; MAX is the headline upper bound, not the average
SLA violations	per-client deletes past `sla_max_delete`	direct contract-breach count
time-to-chain	arrival → proven on chain	how long data is at-risk / off-chain before it's committed
revenue / storage / infra / margins	the money	split across the two balance sheets (see §3.5)

The design principle: report the upper bound, not just the average. Delete latency is reported as a full distribution with MAX first-class, because an SLA is a promise about the worst case.

3. What is modeled, and how

3.1 Entities and the blob lifecycle

Blob — one client PUT: id, clientID, sizeBytes, arrivalTime, deleteTime, state.
Root / piece — a committed PDP root CID holding one or more blobs; the unit added to / removed from chain. Tracks generation (how many rewrites produced it).
DataSet — a PDP proof set: a collection of pieces proven together every proving period.
ChainOp / confirm effect — a queued add/remove that takes effect after a confirmation delay.

A blob walks a fixed state machine (internal/model): Pending (arrived, buffered) → OnChain (its piece is proven) → MarkedDeleted (delete requested, still on chain) → Removed (removal enacted on chain). Every cost is attached to a specific transition, so nothing is double-counted and nothing is free that shouldn't be.

3.2 The engine: deterministic discrete-event

internal/engine is a min-heap of events ordered by (time, sequence). The loop pops the earliest event, advances the clock, dispatches to a handler, and handlers schedule new events. Event kinds: BlobArrival, BlobDelete, AggregateTick, BatcherTick, DeleteTick, ChainConfirm, ProvingPeriod, WarmupEnd.

Determinism is a first-class property. Same config + same seed ⇒ byte-identical results. RNG is a registry of independent streams keyed by (clientID, purpose) (internal/engine/rng.go), so adding a client or a new random draw never perturbs the existing streams — every strategy in a comparison sees the exact same workload. This is what makes a compare or sweep apples-to-apples: the only thing that changes between columns is the strategy, never the traffic.

3.3 Workload: configurable, replayable

Clients are defined as archetypes with a count: one entry spawns N statistically-identical, independently-seeded clients, so you scale from 1 to thousands by changing one number. Each archetype has a Poisson arrival rate and configurable size and lifetime (churn) distributions: const, uniform, normal, lognormal, exponential, pareto, and a histogram type that replays measured telemetry (sampling log-uniformly within a bucket so values vary inside ranges, not just at the named edges). The point: you can drive the model with real measured write-size and churn distributions, not just toy parametric ones.

3.4 The strategies — the experiment surface

Everything under test is a pluggable interface (internal/strategy), so a study is a config change, not a code change:

Aggregator (how blobs pack, when a piece seals): none, fixed_size, time_window, size_or_time, churn_aware (bucket blobs by predicted lifetime so whole pieces die together).
Batcher (when queued chain ops submit): immediate, fixed_interval, size_threshold.
DeletePolicy (when marked-deleted blobs are compacted): immediate, batched, sla_bounded, and garbage_collected — the compaction-timing lever: tombstone deletes and compact an aggregate only once its garbage fraction crosses a threshold (the design knob), with an optional max_age cap and SLA force.
Rewrite model: full (the faithful PDP model) vs partial (a counterfactual lower bound, not buildable) — see §3.6.

3.5 The cost model

Gas is contract-grounded and not flat per tx. AddPieces and ProvePossession scale logarithmically with the data set's current piece count (and proving also with byte size), fit to calibnet PDP measurements; the remove/create-dataset ops and the FilecoinWarmStorageService service-contract surcharge are grounded against the PDPVerifier.sol/FilecoinWarmStorageService.sol source — see docs/GAS_GROUNDING.md for the full methodology, the EVM↔FVM unit problem, and what is measured vs. projected (internal/cost/gas.go, anchors in docs/MODELING.md §2):

G_add(n)     = add.base   + add.per_ln_piece · ln(n)              # one-time, on add
G_prove(n,B) = prove.base + prove.per_ln_piece·ln(n) + prove.per_ln_byte·ln(B)   # recurring
G_next       = next.base                                          # recurring, ~constant
G_rem(k)     = rem.base   + rem.per_piece · k                     # one tx drops k pieces
gas in FIL   = units · gas_price · 1e-18    gas in USD = gas_FIL · fil_usd   # post-hoc

Prices are applied post-hoc, so you can re-price a run instead of re-running it. The sim accumulates only price-independent invariants — FVM gas units and TB-months — and reports gas in both FVM gas and FIL, storage in TB-months. Pricing is a final scalar layer, matched to each flow's real settlement token: gas settles in FIL (so FIL/USD, which is volatile — ~$0.70 today — applies to gas only); storage/revenue settle in USDFC (USD-pegged, so the $/TB/month rate applies and they don't move with FIL). pdp-sim reprice -i summary.json --fil-usd 0.70 re-prices a saved run in milliseconds — making gas-price/storage-price sensitivity a reprice, not a re-run. See docs/GAS_GROUNDING.md.

The recurring proving cost is the floor: for every active data set, every proving period, the SP pays G_prove + G_next whether or not anything changed. At low churn this dominates total gas — so minimizing the number of data sets matters more than minimizing adds, a non-obvious result the model surfaces directly. A ProvingPeriod event fires per data set and re-reads its current (piece count, bytes), so the floor tracks the data set as it grows and shrinks.

The ledger is an exact time-integral, not a sampler (internal/cost/ledger.go). It advances the clock and accrues value · Δt on every byte-count change, so storage/revenue integrals are exact to the event, not approximated by periodic sampling. It tracks three distinct byte counts because they have different lifetimes, and conflating them is the usual way these models go wrong:

Quantity	Window	Drives	Whose money
proven bytes	piece on chain → removed	storage payment `S_sp`	fil.one pays SP
held bytes	arrival (PUT) → removal enacted	infra `S_infra` (disk/power)	SP's own cost
billed bytes	arrival → delete request (default)	revenue `R_c`	clients pay fil.one

Held ⊇ proven: the extra is the off-chain holding window ([arrival, on-chain]) where the SP is already burning disk but not yet being paid — modeling infra on held (not proven) bytes is what makes lazy flush show its true holding cost.

Margin is split across the two balance sheets, because the storage payment is an internal transfer (fil.one's cost, the SP's revenue) and cancels in the consolidated view — subtracting it once on each side, as a naive single "margin" does, is double-counting:

fil.one margin = Revenue − StoragePayment
SP margin      = StoragePayment − Infra − Gas
system margin  = Revenue − Infra − Gas        (= fil.one + SP; the transfer cancels)

3.6 Deletes: the PDP removal constraint, scheduled removal, and latency

When blob b in aggregate r is deleted, b is marked deleted at t_d (billing stops, the zombie clock starts) and the DeletePolicy decides when to compact. deleteLatency = enactTime − t_d.

PDP cannot remove a member from a committed aggregate — removePieces drops whole aggregates, with no operation to excise one blob. So to reclaim a deleted blob's space you must remove the whole aggregate and re-add a new one without it (strategy.rewrite.type: full, the faithful model; survivors are re-committed → write amplification, up to 241× on 1 GiB aggregates with eager deletes). partial (shrink-in-place / sub-piece removal) is kept only as a counterfactual lower bound to quantify what that constraint costs — it is not a buildable option, so don't read full-vs-partial as a design choice. The real levers under forced full-rewrite are aggregate size, churn-aware co-location, and compaction timing.

Removal is scheduled, not immediate. A removePieces only takes effect when the proving period it was scheduled in passes — at the next proving boundary for that data set. So delete latency has a floor of up to one proving period, the deleted bytes stay proven and paid-for until then, and during a rewrite the survivors are double-proven (old + new aggregate) over the lag. These are real costs the model now charges; sla_bounded submits confirm_delay + proving_period earlier to still meet the SLA.

Compaction timing is the real lever (the garbage_collected policy). Since the rewrite is unavoidable, the SP decides when to pay it: tombstone deletes and rewrite an aggregate only once its garbage fraction (dead ÷ total bytes) crosses a threshold. One knob sweeps the whole trade-off — at threshold = 1.0 (the default) an aggregate is dropped only when fully dead (write amp ~1.0, but max zombie storage and latency); lower thresholds rewrite eagerly (higher write amp/gas, less zombie). An optional max_age cap bounds tombstone debt, and an SLA force overrides a lazy threshold. See docs/MODELING.md §5.3 for the measured trade-off curve.

3.7 Confirmation, warmup, and other realism knobs

Confirmation delay: a fixed latency from submit to on-chain (no reorgs modeled). It correctly propagates into the SLA logic — sla_bounded submits confirm_delay earlier so the enactment, not the submission, meets the deadline.
Warmup: simulation.warmup excludes a cold-start window from the reported totals (the ledger and metrics reset at the boundary while system state carries forward), so you measure steady state.

4. Why you can trust the numbers

This is the part to show a skeptical reviewer. Confidence comes from four places, not from "the code looks right":

Determinism. Same config + seed ⇒ byte-identical totals (TestDeterminism). Results are reproducible and comparisons are controlled — strategy is the only variable.
Analytic invariants hold. no-aggregation ⇒ pieces == blobs and write amplification == exactly 1.0; immediate delete with zero confirm delay ⇒ every delete enacted instantly (TestNoAggregationInvariants). These have known closed-form answers and the simulator hits them.
Conservation is enforced. A regression test asserts the ledger's proven-byte count always equals the bytes of pieces actually on chain (TestQueuedAddDeleteRaceConserved, TestPartialRewrite) — bytes can't be created, lost, or double-counted as pieces are added, rewritten, partially shrunk, and removed across the confirm delay. Each of these tests was confirmed to fail on the pre-fix code, so they're guarding real bugs, not tautologies.
SLA guarantees are tested under adversarial timing. sla_bounded ⇒ zero violations and max latency ≤ SLA, verified with both zero and 12-hour confirm delays (TestSLABounded).

And the model behaves sensibly under sweeps — e.g. tightening the delete-batch interval trades a monotonic decrease in zombie storage and latency for a monotonic increase in write-amp and gas, a coherent five-metric trade-off curve rather than noise. Coherence across independent metrics is itself evidence the model is internally consistent.

5. What's grounded vs. assumed (read before quoting absolute dollars)

Intellectual honesty about the model's edges is part of the argument. The relative ranking of strategies is trustworthy earlier than the absolute USD figures.

Grounded	Placeholder / assumption
AddPieces, ProvePossession, NextProvingPeriod gas (calibnet anchors)	RemovePieces & CreateDataSet gas — not yet in the benchmark set
Billing structure (per-TB-month, paid on proven bytes)	Gas price & FIL→USD — set per experiment
Log-scaling of add/prove with piece count	Size histogram default has overlapping buckets (needs a clean disjoint set)
Recurring proving floor exists and dominates at low churn	Confirmation modeled as fixed delay; no reorgs
Full-rewrite forced by PDP + scheduled (proving-period) removal	Compaction timing (when to do the unavoidable rewrite) not yet a tunable lever

All of these are config inputs or flagged open questions (docs/MODELING.md §7-8), not hidden fudge factors. Treat absolute USD as indicative until the placeholder gas is measured on the target network; treat comparisons between strategies as valid now.

6. Running it

go build ./...
go test ./...        # runs the validation suite in §4

# Single run from a config (wired with uber-go/fx)
go run . run -c config.example.yaml

# Compare strategies side by side — same workload/seed, apples-to-apples
go run . compare -c configs/fixed-size.yaml -c configs/fixed-size-partial.yaml

# Sweep one key across values (numeric OR categorical), with a progress bar
go run . sweep -c configs/fixed-size.yaml --key strategy.delete_policy.params.interval --values 600,21600,86400
go run . sweep -c configs/fixed-size.yaml --key strategy.rewrite.type --values full,partial

# Re-price a saved run under different prices — NO re-run. Gas re-prices via FIL/USD; the
# USDFC storage rate re-prices storage independently (prices are post-hoc, §3.5).
go run . run -c config.example.yaml -o /tmp/run
go run . reprice -i /tmp/run/summary.json --fil-usd 0.70 --sp-tb-month 2.27

Output formats are set per-config under output.formats (table, json, csv); -o <dir> overrides the output directory. The output reports gas in FVM units and FIL (price-independent) plus USD, and storage in TB-months — so a saved run can be re-priced. A single-run table:

write amplification            1.790
roots created                  96
  of which rewrites            57 (max gen 3)
gas: TOTAL                     $10.51
gas: TOTAL (FIL)               2.102121 FIL      # price-independent SP gas burden
gas: TOTAL (FVM gas)           2.102e+09
proven storage (TB-months)     0.00
revenue (clients→fil.one)      $0.01
storage payment (fil.one→SP)   $0.01
fil.one margin (rev−storage)   $0.00
SP margin (storage−infra−gas)  $-10.51
system margin (rev−infra−gas)  $-10.51
zombie storage %               41.96%

(Small absolute dollars here are just a toy run; scale duration and volume up for realistic magnitudes. Gas dominates at this scale because the proving floor and per-op costs don't shrink with a tiny dataset.)

7. Layout

cmd/                 cobra CLI (run / compare / sweep / reprice), fx wiring
internal/engine/     deterministic event heap + seeded RNG registry
internal/model/      Blob, Root (piece), DataSet, ChainOp, state machine
internal/workload/   client archetypes + distributions (incl. histogram replay)
internal/strategy/   Aggregator / Batcher / DeletePolicy + registry (the experiment surface)
internal/cost/       PDP gas model (FVM units) + physical ledger + post-hoc PriceSet (FIL/USD)
internal/metrics/    latency & time-to-chain distributions, write-amp, zombie, generations, SLA
internal/report/     terminal tables + JSON/CSV export + reprice
internal/sim/        orchestrator: event handlers tying it all together + validation tests
configs/             example strategy variants for compare/sweep
bench/               PiB-scale reference workload + golden-output regression guard
docs/                DESIGN.md, MODELING.md (the math), INTERFACES.md, GAS_GROUNDING.md
analysis/            sensitivity scan + FINDINGS.md, SUMMARY.md (team-facing)

License

Dual-licensed under either of

Apache License, Version 2.0 (LICENSE-APACHE)
MIT license (LICENSE-MIT)

at your option. Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdp-sim

1. The problem, precisely

2. What it measures — and why these are the right metrics

3. What is modeled, and how

3.1 Entities and the blob lifecycle

3.2 The engine: deterministic discrete-event

3.3 Workload: configurable, replayable

3.4 The strategies — the experiment surface

3.5 The cost model

3.6 Deletes: the PDP removal constraint, scheduled removal, and latency

3.7 Confirmation, warmup, and other realism knobs

4. Why you can trust the numbers

5. What's grounded vs. assumed (read before quoting absolute dollars)

6. Running it

7. Layout

License

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
analysis		analysis
bench		bench
cmd		cmd
configs		configs
docs		docs
internal		internal
.gitignore		.gitignore
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
config.example.yaml		config.example.yaml
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Folders and files

Latest commit

History

Repository files navigation

pdp-sim

1. The problem, precisely

2. What it measures — and why these are the right metrics

3. What is modeled, and how

3.1 Entities and the blob lifecycle

3.2 The engine: deterministic discrete-event

3.3 Workload: configurable, replayable

3.4 The strategies — the experiment surface

3.5 The cost model

3.6 Deletes: the PDP removal constraint, scheduled removal, and latency

3.7 Confirmation, warmup, and other realism knobs

4. Why you can trust the numbers

5. What's grounded vs. assumed (read before quoting absolute dollars)

6. Running it

7. Layout

License

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages