spike: benchmark cache-access persistence — file mtime vs SQLite#320
Closed
3lvis wants to merge 4 commits into
Closed
spike: benchmark cache-access persistence — file mtime vs SQLite#3203lvis wants to merge 4 commits into
3lvis wants to merge 4 commits into
Conversation
Self-contained benchmark (Benchmarks/CacheAccessPersistence.swift, run via swift, not part of the package build) comparing how to persist per-entry last-access for the disk cache's sliding TTL, without a manifest. Result (docs/benchmarks/cache-access-persistence.md): on the hot path mtime touch (~26us) and SQLite autocommit UPSERT (~31us) are comparable; debounced mtime drops to a stat. SQLite wins the once-per-launch sweep (indexed, ~40x) but that's a cheap background op at realistic sizes. mtime is zero-dependency, zero-footprint, self-syncing; SQLite adds a ~650KB store that can drift from the files. Recommendation: mtime, debounced.
…ikes The 10k sweep gap (39ms vs 1ms) isn't yet painful; scaling the sweep to 200k surfaces the real spike: mtime 39ms→215ms→471ms→1388ms vs sqlite staying ~flat (0.9→26ms). So the choice depends on cache size: mtime is fine to ~tens of thousands; past ~100k the indexed query (or a sharded sweep) earns its keep. Recommendation refined accordingly.
Sharding the cache dir into K=16 subdirs and sweeping one per launch makes the per-launch sweep O(N/K): at 200k it drops from 864ms (full scan) to 53ms — within ~1.5x of SQLite (33ms), zero dependency. Trade: bounded GC latency (K launches to sweep everything; lazy-on-read still expires requested entries) + a one-integer next-shard counter. Recommendation: mtime debounced -> shard if N gets large -> SQLite only as a last resort.
Clarify the recommendation: shard unconditionally from the first entry (derive the shard from the key hash destinationURL already computes), not dynamically when the cache gets large. Always-sharded is one code path with no migration, no count-tracking, no threshold; dynamic needs an O(N) reorg and two paths. The small-cache cost is negligible.
3lvis
added a commit
that referenced
this pull request
Jun 17, 2026
Replace the in-memory access map (which didn't survive launches) with the file's modification date as the sliding-TTL clock: objectFromCache expires a disk entry whose mtime is older than cacheTTL and re-warms on a disk hit by bumping mtime. NSCache absorbs repeat reads, so the touch is ~once per entry per launch — no explicit debounce. CacheExpiry collapses to a synchronized TTL holder (isExpired(fileDate:)). Shard the cache dir from the start: destinationURL lays files under domain/<shard>/<file> (one hex nibble of the key hash, 16 shards); the startup sweep does one shard per launch via a .sweep-shard cursor, so per-launch work is O(N/16) and never spikes (cf. benchmark #320). Sweep and deleteCacheFolder serialize on cacheMutationLock so the background sweep can't race clearCache. Also clears pre-sharding strays at the domain root (one-time migration). Tests updated/added (mtime expiry, disk-hit re-warm, sharding layout); full suite green (176). README/repo notes updated.
3lvis
added a commit
that referenced
this pull request
Jun 17, 2026
…long-URL fix Implements roadmap #10. One PR covering the whole cache surface. - Consistency: clearCache() empties both the in-memory NSCache and the on-disk files; reset() = clearCache() + wipe credentials/headers/fakes. The disk-only static deleteCachedFiles() (which left memory serving deleted data) is gone. - Warm/cold TTL: on-disk entries carry a sliding TTL (cacheTTL, default 7 days) whose persisted clock is the file's modification date — re-warmed on a disk read/write, no in-memory map, no manifest. NSCache absorbs repeat reads (the debounce). - Sharded from the start: files live under domain/<shard>/<file>; the startup sweep does one shard per launch via a cursor, so per-launch work is O(N/16) and never spikes. Sweep and clears serialize on a lock. - Long-URL fix: destinationURL hashes the filename component past the 255-byte limit (red-first). Breaking vs 7.0.0 (v8). Backed by the mtime-vs-SQLite benchmark in #320. README, repo notes, CHANGELOG, roadmap updated. 176 tests green.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A perf comparison (requested) to decide how the disk cache should persist per-entry last access for the warm/cold sliding TTL without a manifest: file
mtime(option 1, flat and sharded) vs an SQLite index.Benchmarks/CacheAccessPersistence.swiftis a self-contained spike —swift Benchmarks/CacheAccessPersistence.swift [entries accesses], not part of the package build (no library code, no new dependency; SQLite3 is a system framework). Write-up:docs/benchmarks/cache-access-persistence.md.Hot path & footprint — 10k entries / 100k accesses (macOS/APFS, indicative)
mtimeHot path:
mtime-touch and SQLite-autocommit are comparable (durability-bound); debouncedmtimeis a barestat.Sweep scaling — where
mtimespikes, and how sharding fixes itmtimeThe full
mtimescan spikes to ~0.9 s at 200k. Sharding (split the cache dir into K subdirectories, sweep one per launch → O(N/K)) cuts that ~16× to 53 ms — the SQLite ballpark — with zero dependency. Trade: bounded GC latency (a never-requested dead file lingers up to K launches; lazy-on-read still expires anything you actually request) + a one-integer next-shard counter (not a per-entry index — nothing to drift).Recommendation
mtime, debounced — enough on its own for this cache's profile (hundreds to low-thousands; single-digit-ms sweep). If 100k+ is in scope, shard the directory rather than adopt SQLite — it keeps the per-launch sweep in the tens of ms while preserving everymtimeadvantage. Order of preference:mtimedebounced → add sharding if N gets large → SQLite only as a last resort.Decision-support spike — no library change. Follow-up (if accepted): wire debounced
mtimepersistence (+ sharding if wanted) into the cache on #319.