interp: bound-check elision for provably in-range array/vector access#3327
Draft
aleksisch wants to merge 3 commits into
Draft
interp: bound-check elision for provably in-range array/vector access#3327aleksisch wants to merge 3 commits into
aleksisch wants to merge 3 commits into
Conversation
4f4714f to
dde1ed5
Compare
Add unchecked At/ArrayAt/AtVector simulate node variants and a
noBoundCheck flag on ExprAt. A conservative optimizer pass, enabled by
options bound_check_elision, marks accesses whose index is provably in
range, mirroring the JIT unsafe_range_check hint for the interpreter:
- constant index into a fixed-size array or vector
- induction var over a constant range, into a fixed-size array/vector
- induction var over range(length(x)), indexing that same dynamic
array x (symbolic length fact)
Facts carry either a constant [lo,hi) interval or a symbolic upper
bound length(var). A constant-index fact is immune to array mutation
(fixed dims never change). A symbolic length(x) fact is killed by ANY
length-changing array op (resize/erase/push/move/rebind) on ANY array
in the loop body: we assume everything may alias everything, so aliased
mutation can never invalidate an elided access. Element writes (x[i]=v)
do not change length and are excused. The index var must not be
reassigned.
Also make the CFG a shared pass: ProgramCfg / buildProgramCfg build a
per-function CFG cache once, handed to consumers as a const pointer.
Escape analysis's flow-sensitive pass now reads the shared cache
instead of building its own CFG per function; each consumer is gated
independently (force_partial_escape_free / bound_check_elision).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
dde1ed5 to
5f645a4
Compare
Replace the AST-visitor fact stack with a forward "must" dataflow over
each function's CFG.
A fact is 0 <= idx < BOUND (BOUND = a constant or length(arrayVar)),
plus a set of provably-nonnegative index vars. Facts are:
- genned by loop induction at the for-loop body block (range(N) /
range(length(x))), and by branch guards on the taken edge
(if (i < length(a)) ... , if (i >= length(a)) return; ...),
- merged by INTERSECTION at CFG joins (a fact must hold on every
incoming edge - so an else-branch that establishes ~cond, or code
dominated by an early-exit guard, carries the guard fact),
- killed by any array mutation (any resize/rebind - everything may
alias everything) and by reassigning the index.
An ExprAt whose (index, array/dim) matches a live fact is marked
noBoundCheck; a constant index into a fixed dim is marked directly.
Pipeline: the CFG is built ONCE at the post-infer stable point and
shared by two consumers - the unsafe-index pass (read-only: only sets
flags) runs first, then the flow-sensitive escape pass reads the same
CFG before it inserts scope_free. noBoundCheck is a runtime-semantic
property, so it survives the later fold loop and the scope_free
re-infer. gated by bound_check_elision / force_partial_escape_free;
the CFG is built only if a consumer is enabled.
CFG change (additive): a for-loop body CfgBlock records its ExprFor
(loopSource) so induction facts survive the flattened cond-less header.
Non-truncating numeric casts (uint/int/int64/uint64) around length() or
the index are seen through; truncating casts are not.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…skip ci] Without this, marking an access noBoundCheck emitted a generic SimNode_AtU / ArrayAtU that the fusion optimizer does not recognize, so the access fell off the fused ArgLoc/LocLoc fast path - a net regression (the lost fusion cost far more than the removed bounds-check branch). Generate fused unchecked node families (AtU / AtR2VU / ArrayAtU / ArrayAtR2VU) alongside the checked ones by parameterizing the bounds check as a toggled macro (DAS_AT_CHECK / DAS_ARRAYAT_CHECK) - one node-body macro, instantiated ON for the checked op-name and OFF for the unchecked one, no duplicated struct bodies. The elided access now fuses to e.g. ArrayAtUArgLoc. Logging: `options log_bound_check_elision` reports every elided access (function, source location, access, reason). Benchmark: benchmarks/micro/bound_check_elision.das - index-heavy loops in plain functions (dynamic range(length) r/w + sum, fixed-array const range). Release, interp: elided vs checked ~ array_rw 3.8->3.4, array_sum 2.0->1.8, fixed_rw 3.5->3.3 ns/op (~6-11%). The pass targets function-body loops; accesses nested in a block/lambda argument stay checked. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Bound-check elision for the interpreter, mirroring the JIT's
[hint(unsafe_range_check)]— but per-access and auto-deduced. Opt-in viaoptions bound_check_elision(default off). WIP / draft, pushed with[skip ci].When the compiler can prove an array/vector index is in range, the access is marked
noBoundCheckand lowered to an unchecked simulate node, dropping the per-accessidx >= sizecompare+branch.How
Unchecked simulate nodes —
SimNode_AtU/SimNode_AtR2VU(fixed arrays,simulate_nodes.h),SimNode_AtVectorU(vectors),SimNode_ArrayAtU/SimNode_ArrayAtR2VU(dynamic arrays,runtime_array.h). Pointers are already unchecked. AnoBoundCheck:1flag onExprAt(serialized, cloned) selects them atsimulate.Fused unchecked nodes — a generic unchecked node would fall off the interpreter's fusion fast path (the
ArgLoc/LocLoc/… specializations), which costs more than the removed check. So the fusion generators emit unchecked families (AtU/ArrayAtU/…) alongside the checked ones by toggling the bounds check as a macro (DAS_AT_CHECK/DAS_ARRAYAT_CHECK) — one node-body macro, instantiated check-ON for the checked op-name and check-OFF for the unchecked one. An elided access fuses to e.g.ArrayAtUArgLoc.Fact analysis (CFG dataflow) — a forward must analysis over each function's CFG. A fact is
0 <= idx < BOUND(BOUND = a constant orlength(arrayVar)) plus a set of proven-nonnegative index vars. Facts are:range(N),range(length(x))) and by branch guards on the taken edge (if (i < length(a)) …,if (i >= length(a)) return; …);elseestablishing¬cond, or code dominated by an early-exit guard, carries the fact;resize/erase/push/rebind — we assume everything may alias everything, so any length-changing op invalidates everylength(·)fact) and by reassigning the index. Element writes (a[i]=v) don't change length and are excused.A constant index into a fixed dim is marked directly. Non-truncating numeric casts (
uint/int/int64/uint64) aroundlength()or the index are seen through.Pipeline — the CFG is a single shared pass: built once at the post-infer stable point (only if a consumer is enabled) and handed to two consumers as a
constpointer — the unsafe-index pass (read-only, only sets flags) runs first, then the flow-sensitive escape pass reads the same CFG before it insertsscope_free.noBoundCheckis a runtime-semantic property, so it survives the later fold loop and re-infer. (CfgBlockgained an additiveloopSourceanchor so induction survives the flattened cond-less loop header.)Logging —
options log_bound_check_elisionreports every elided access (function, source location, access, reason).Benchmark
benchmarks/micro/bound_check_elision.das— index-heavy loops in plain functions (dynamicrange(length)read/write + sum, fixed-array constant range). Release, interpreter, ns/op (median of 5):~6–11% on tight index loops. The bounds check is a small slice of per-element interpreter cost, so the win is modest but consistent — and, importantly, not a regression (the earlier generic-unchecked-node version regressed ~70% by losing fusion, which the fused variants fix).
How much does it catch (corpus statistics)
Measured over 696 modules (
daslib+tests+examples+tutorials, each compiled once) by forcing the pass on and counting candidate accesses vs elided:Bimodal, not uniform — most array-indexing files elide nothing (they use
unsafe, iterators, or computed indices the analysis can't prove), but a tail of ~25 files with tightrange(length)/ fixed-dim loops are 75–100% elidable.14% is a conservative floor, chiefly because accesses inside block/lambda arguments (
foreach/run/comprehension bodies — very common in daslib) are not analysed: the CFG is a function's CFG, and facts can't soundly cross into a deferred lambda body. Running the dataflow per block/lambda scope (future work) would raise this.Limitations / notes
*Ufusion added for those); the flag is ignored by the AOT C++ emitter.run/foreachblock) are left checked; the analysis walks a function's own CFG.idx >= 0— satisfied by an unsigned index or a lower-bound fact; a bare signedx < lenwon't elide (a negativexwould slip the check). Loop induction carries>= 0for free.🤖 Generated with Claude Code