perf: lazy memory architecture — reduce per-pattern overhead 5-7x (#158) by kolkov · Pull Request #160 · coregx/coregex

kolkov · 2026-06-15T14:45:49Z

Summary

Adopts Rust regex Cache separation model to dramatically reduce memory per compiled pattern. Addresses #158 where Coraza WAF (900 OWASP CRS patterns) saw 16x memory overhead vs stdlib.

Changes:

PikeVM lazy init — NewPikeVMLazy() defers thread queues/sparse set allocation to first search (~10 KB saved per unused PikeVM)
Shared DFA PikeVM — SetPikeVM() eliminates duplicate PikeVM per DFA (~15-20 KB per DFA pattern)
Deferred SearchState — removed eager allocation at compile time (~15-50 KB per pattern)
Strategy-aware caches — newSearchState() only allocates caches needed by active strategy (30-70% fewer allocations)
DFA initCap 64→16 — smaller initial maps/slices (~3 MB across 900 patterns)
CI — skip benchmark workflow for docs-only PRs

Architecture (Rust model):

Compiled regex = immutable, shareable
Search state = mutable, per-thread, lazy
Strategy drives what gets allocated

Test plan

go test ./... — all packages pass
golangci-lint run — no new issues
gofmt — all modified files clean
Quick benchmarks — no regression on BenchmarkFind
CI: tests (Linux/macOS/Windows) + benchmark comparison
CI: race detector (Linux)
regex-bench: Go + Rust comparison on AMD EPYC

Fixes #158

…158) Adopt Rust regex Cache separation model to reduce memory overhead from 16x to ~3x vs stdlib when compiling many patterns (WAF workloads). Changes: - PikeVM: add NewPikeVMLazy() with deferred internal state allocation - DFA: share Engine PikeVM via SetPikeVM(), eliminate per-DFA clones - SearchState: defer allocation to first search (not compile time) - SearchState: strategy-aware cache allocation (skip unused engines) - DFA cache: reduce initCap from 64 to 16 entries - CI: skip benchmark workflow for docs-only PRs For 900 OWASP CRS patterns, estimated savings: - PikeVM lazy init: ~10 KB per unused PikeVM (CharClass, Teddy, AC) - Shared DFA PikeVM: ~15-20 KB per DFA pattern - Deferred SearchState: ~15-50 KB per pattern at compile time - Strategy-aware caches: 30-70% fewer allocations per SearchState - DFA initCap 64->16: ~3 MB across 900 patterns Fixes #158

codecov · 2026-06-15T14:48:28Z

Codecov Report

❌ Patch coverage is 83.33333% with 8 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
nfa/pikevm.go	66.66%	5 Missing and 1 partial ⚠️
dfa/lazy/builder.go	0.00%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Fixes maintidx lint: Cyclomatic Complexity 25 -> lower by extracting PikeVM sharing logic into a helper function.

github-actions · 2026-06-15T14:53:07Z

Benchmark Comparison

Comparing main → PR #160

Summary: geomean 81.49n 76.71n -5.87%

⚠️ Potential regressions detected:

geomean                               ³                +0.00%               ³
geomean                               ³                +0.00%               ³
geomean                         ³                +0.00%               ³
geomean                         ³                +0.00%               ³
MatchAnchoredLiteral/no_match_prefix-4                  2.581n ± ∞ ¹     2.604n ± ∞ ¹     +0.89% (p=0.008 n=5)
ASCIIOptimization_Issue79/short_WithoutASCII-4          302.4n ± ∞ ¹     326.6n ± ∞ ¹     +8.00% (p=0.008 n=5)
DNA_VsStdlib/stdlib/dna_4-4                             60.14m ± ∞ ¹     63.03m ± ∞ ¹     +4.80% (p=0.016 n=5)
LangArenaLogParser/ips-4                                48.13µ ± ∞ ¹     48.36µ ± ∞ ¹     +0.48% (p=0.032 n=5)
BranchDispatch_Stdlib/Digits-4                          132.1n ± ∞ ¹     132.5n ± ∞ ¹     +0.30% (p=0.048 n=5)
BranchDispatch_Stdlib/NoMatch-4                         78.14n ± ∞ ¹     78.97n ± ∞ ¹     +1.06% (p=0.008 n=5)

Full results available in workflow artifacts. CI runners have ~10-20% variance.
For accurate benchmarks, run locally: ./scripts/bench.sh --compare

refactor: extract sharePikeVMWithDFAs to reduce CompileRegexp complexity

0418947

Fixes maintidx lint: Cyclomatic Complexity 25 -> lower by extracting PikeVM sharing logic into a helper function.

kolkov mentioned this pull request Jun 15, 2026

test: coregex lazy memory architecture (commit 0418947) kolkov/regex-bench#12

Open

docs: update CHANGELOG and ROADMAP for v0.12.22

eeb3707

kolkov merged commit 2812db7 into main Jun 15, 2026
9 checks passed

kolkov deleted the feature/lazy-memory-architecture branch June 15, 2026 15:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

perf: lazy memory architecture — reduce per-pattern overhead 5-7x (#158)#160

perf: lazy memory architecture — reduce per-pattern overhead 5-7x (#158)#160
kolkov merged 3 commits into
mainfrom
feature/lazy-memory-architecture

kolkov commented Jun 15, 2026

Uh oh!

codecov Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

kolkov commented Jun 15, 2026

Summary

Test plan

Uh oh!

codecov Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Comparison

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented Jun 15, 2026 •

edited

Loading

github-actions Bot commented Jun 15, 2026 •

edited

Loading