Skip to content

interp: bound-check elision for provably in-range array/vector access#3327

Draft
aleksisch wants to merge 3 commits into
masterfrom
aleksisch/bound-check
Draft

interp: bound-check elision for provably in-range array/vector access#3327
aleksisch wants to merge 3 commits into
masterfrom
aleksisch/bound-check

Conversation

@aleksisch

@aleksisch aleksisch commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

What

Bound-check elision for the interpreter, mirroring the JIT's [hint(unsafe_range_check)] — but per-access and auto-deduced. Opt-in via options bound_check_elision (default off). WIP / draft, pushed with [skip ci].

When the compiler can prove an array/vector index is in range, the access is marked noBoundCheck and lowered to an unchecked simulate node, dropping the per-access idx >= size compare+branch.

How

Unchecked simulate nodesSimNode_AtU / SimNode_AtR2VU (fixed arrays, simulate_nodes.h), SimNode_AtVectorU (vectors), SimNode_ArrayAtU / SimNode_ArrayAtR2VU (dynamic arrays, runtime_array.h). Pointers are already unchecked. A noBoundCheck:1 flag on ExprAt (serialized, cloned) selects them at simulate.

Fused unchecked nodes — a generic unchecked node would fall off the interpreter's fusion fast path (the ArgLoc/LocLoc/… specializations), which costs more than the removed check. So the fusion generators emit unchecked families (AtU/ArrayAtU/…) alongside the checked ones by toggling the bounds check as a macro (DAS_AT_CHECK / DAS_ARRAYAT_CHECK) — one node-body macro, instantiated check-ON for the checked op-name and check-OFF for the unchecked one. An elided access fuses to e.g. ArrayAtUArgLoc.

Fact analysis (CFG dataflow) — a forward must analysis over each function's CFG. A fact is 0 <= idx < BOUND (BOUND = a constant or length(arrayVar)) plus a set of proven-nonnegative index vars. Facts are:

  • genned by loop induction at the for-body block (range(N), range(length(x))) and by branch guards on the taken edge (if (i < length(a)) …, if (i >= length(a)) return; …);
  • merged by intersection at CFG joins — so an else establishing ¬cond, or code dominated by an early-exit guard, carries the fact;
  • killed by any array mutation (any resize/erase/push/rebind — we assume everything may alias everything, so any length-changing op invalidates every length(·) fact) and by reassigning the index. Element writes (a[i]=v) don't change length and are excused.

A constant index into a fixed dim is marked directly. Non-truncating numeric casts (uint/int/int64/uint64) around length() or the index are seen through.

Pipeline — the CFG is a single shared pass: built once at the post-infer stable point (only if a consumer is enabled) and handed to two consumers as a const pointer — the unsafe-index pass (read-only, only sets flags) runs first, then the flow-sensitive escape pass reads the same CFG before it inserts scope_free. noBoundCheck is a runtime-semantic property, so it survives the later fold loop and re-infer. (CfgBlock gained an additive loopSource anchor so induction survives the flattened cond-less loop header.)

Loggingoptions log_bound_check_elision reports every elided access (function, source location, access, reason).

Benchmark

benchmarks/micro/bound_check_elision.das — index-heavy loops in plain functions (dynamic range(length) read/write + sum, fixed-array constant range). Release, interpreter, ns/op (median of 5):

bench checked (fused) elided (fused)
array_rw/100000 3.8 3.4
array_sum/100000 2.0 1.8
fixed_rw/256 3.5 3.3

~6–11% on tight index loops. The bounds check is a small slice of per-element interpreter cost, so the win is modest but consistent — and, importantly, not a regression (the earlier generic-unchecked-node version regressed ~70% by losing fusion, which the fused variants fix).

How much does it catch (corpus statistics)

Measured over 696 modules (daslib + tests + examples + tutorials, each compiled once) by forcing the pass on and counting candidate accesses vs elided:

  • 5,500 array/vector index accesses carry a runtime bounds check ("candidates").
  • 763 (≈14%) are provably in range and get elided.

Bimodal, not uniform — most array-indexing files elide nothing (they use unsafe, iterators, or computed indices the analysis can't prove), but a tail of ~25 files with tight range(length) / fixed-dim loops are 75–100% elidable.

14% is a conservative floor, chiefly because accesses inside block/lambda arguments (foreach/run/comprehension bodies — very common in daslib) are not analysed: the CFG is a function's CFG, and facts can't soundly cross into a deferred lambda body. Running the dataflow per block/lambda scope (future work) would raise this.

Limitations / notes

  • Interpreter only — AOT and the interpreter fusion of I64/U64-indexed arrays keep full checks (no *U fusion added for those); the flag is ignored by the AOT C++ emitter.
  • Function-body loops — accesses nested inside a block/lambda argument (e.g. the body of a run/foreach block) are left checked; the analysis walks a function's own CFG.
  • Guard facts need idx >= 0 — satisfied by an unsigned index or a lower-bound fact; a bare signed x < len won't elide (a negative x would slip the check). Loop induction carries >= 0 for free.

🤖 Generated with Claude Code

@aleksisch aleksisch force-pushed the aleksisch/bound-check branch 2 times, most recently from 4f4714f to dde1ed5 Compare July 1, 2026 10:20
Add unchecked At/ArrayAt/AtVector simulate node variants and a
noBoundCheck flag on ExprAt. A conservative optimizer pass, enabled by
options bound_check_elision, marks accesses whose index is provably in
range, mirroring the JIT unsafe_range_check hint for the interpreter:

  - constant index into a fixed-size array or vector
  - induction var over a constant range, into a fixed-size array/vector
  - induction var over range(length(x)), indexing that same dynamic
    array x (symbolic length fact)

Facts carry either a constant [lo,hi) interval or a symbolic upper
bound length(var). A constant-index fact is immune to array mutation
(fixed dims never change). A symbolic length(x) fact is killed by ANY
length-changing array op (resize/erase/push/move/rebind) on ANY array
in the loop body: we assume everything may alias everything, so aliased
mutation can never invalidate an elided access. Element writes (x[i]=v)
do not change length and are excused. The index var must not be
reassigned.

Also make the CFG a shared pass: ProgramCfg / buildProgramCfg build a
per-function CFG cache once, handed to consumers as a const pointer.
Escape analysis's flow-sensitive pass now reads the shared cache
instead of building its own CFG per function; each consumer is gated
independently (force_partial_escape_free / bound_check_elision).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@aleksisch aleksisch force-pushed the aleksisch/bound-check branch from dde1ed5 to 5f645a4 Compare July 1, 2026 12:36
aleksisch and others added 2 commits July 1, 2026 16:30
Replace the AST-visitor fact stack with a forward "must" dataflow over
each function's CFG.

A fact is 0 <= idx < BOUND (BOUND = a constant or length(arrayVar)),
plus a set of provably-nonnegative index vars. Facts are:
  - genned by loop induction at the for-loop body block (range(N) /
    range(length(x))), and by branch guards on the taken edge
    (if (i < length(a)) ... , if (i >= length(a)) return; ...),
  - merged by INTERSECTION at CFG joins (a fact must hold on every
    incoming edge - so an else-branch that establishes ~cond, or code
    dominated by an early-exit guard, carries the guard fact),
  - killed by any array mutation (any resize/rebind - everything may
    alias everything) and by reassigning the index.
An ExprAt whose (index, array/dim) matches a live fact is marked
noBoundCheck; a constant index into a fixed dim is marked directly.

Pipeline: the CFG is built ONCE at the post-infer stable point and
shared by two consumers - the unsafe-index pass (read-only: only sets
flags) runs first, then the flow-sensitive escape pass reads the same
CFG before it inserts scope_free. noBoundCheck is a runtime-semantic
property, so it survives the later fold loop and the scope_free
re-infer. gated by bound_check_elision / force_partial_escape_free;
the CFG is built only if a consumer is enabled.

CFG change (additive): a for-loop body CfgBlock records its ExprFor
(loopSource) so induction facts survive the flattened cond-less header.
Non-truncating numeric casts (uint/int/int64/uint64) around length() or
the index are seen through; truncating casts are not.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…skip ci]

Without this, marking an access noBoundCheck emitted a generic SimNode_AtU /
ArrayAtU that the fusion optimizer does not recognize, so the access fell off
the fused ArgLoc/LocLoc fast path - a net regression (the lost fusion cost far
more than the removed bounds-check branch). Generate fused unchecked node
families (AtU / AtR2VU / ArrayAtU / ArrayAtR2VU) alongside the checked ones by
parameterizing the bounds check as a toggled macro (DAS_AT_CHECK /
DAS_ARRAYAT_CHECK) - one node-body macro, instantiated ON for the checked
op-name and OFF for the unchecked one, no duplicated struct bodies. The elided
access now fuses to e.g. ArrayAtUArgLoc.

Logging: `options log_bound_check_elision` reports every elided access
(function, source location, access, reason).

Benchmark: benchmarks/micro/bound_check_elision.das - index-heavy loops in plain
functions (dynamic range(length) r/w + sum, fixed-array const range). Release,
interp: elided vs checked ~ array_rw 3.8->3.4, array_sum 2.0->1.8,
fixed_rw 3.5->3.3 ns/op (~6-11%). The pass targets function-body loops; accesses
nested in a block/lambda argument stay checked.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant