perf(felt252): compute felt252_mul via a 2-CIOS runtime binding#1647
Conversation
|
✅ Code is now correctly formatted. |
Benchmarking resultsBenchmark for program
|
| Command | Mean [s] | Min [s] | Max [s] | Relative |
|---|---|---|---|---|
Cairo-vm (Rust, Cairo 1) |
10.711 ± 0.126 | 10.564 | 10.959 | 5.64 ± 0.12 |
cairo-native (embedded AOT) |
1.932 ± 0.021 | 1.909 | 1.980 | 1.02 ± 0.02 |
cairo-native (embedded JIT using LLVM's ORC Engine) |
1.899 ± 0.035 | 1.843 | 1.945 | 1.00 |
Benchmark for program dict_snapshot
Open benchmarks
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
Cairo-vm (Rust, Cairo 1) |
570.5 ± 9.6 | 559.2 | 585.9 | 1.00 |
cairo-native (embedded AOT) |
1730.8 ± 51.4 | 1635.5 | 1829.6 | 3.03 ± 0.10 |
cairo-native (embedded JIT using LLVM's ORC Engine) |
1716.4 ± 49.6 | 1667.5 | 1821.2 | 3.01 ± 0.10 |
Benchmark for program factorial_2M
Open benchmarks
| Command | Mean [s] | Min [s] | Max [s] | Relative |
|---|---|---|---|---|
Cairo-vm (Rust, Cairo 1) |
4.631 ± 0.058 | 4.561 | 4.772 | 2.60 ± 0.06 |
cairo-native (embedded AOT) |
1.782 ± 0.031 | 1.730 | 1.832 | 1.00 |
cairo-native (embedded JIT using LLVM's ORC Engine) |
1.856 ± 0.022 | 1.815 | 1.888 | 1.04 ± 0.02 |
Benchmark for program fib_2M
Open benchmarks
| Command | Mean [s] | Min [s] | Max [s] | Relative |
|---|---|---|---|---|
Cairo-vm (Rust, Cairo 1) |
4.617 ± 0.133 | 4.477 | 4.913 | 2.70 ± 0.10 |
cairo-native (embedded AOT) |
1.732 ± 0.049 | 1.654 | 1.817 | 1.01 ± 0.04 |
cairo-native (embedded JIT using LLVM's ORC Engine) |
1.707 ± 0.042 | 1.670 | 1.788 | 1.00 |
Benchmark for program linear_search
Open benchmarks
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
Cairo-vm (Rust, Cairo 1) |
603.6 ± 8.2 | 593.6 | 615.0 | 1.00 |
cairo-native (embedded AOT) |
1736.8 ± 11.3 | 1722.0 | 1759.5 | 2.88 ± 0.04 |
cairo-native (embedded JIT using LLVM's ORC Engine) |
1746.5 ± 51.2 | 1683.1 | 1851.2 | 2.89 ± 0.09 |
Benchmark for program logistic_map
Open benchmarks
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
Cairo-vm (Rust, Cairo 1) |
510.0 ± 24.0 | 490.1 | 573.0 | 1.00 |
cairo-native (embedded AOT) |
1668.1 ± 23.7 | 1648.4 | 1720.4 | 3.27 ± 0.16 |
cairo-native (embedded JIT using LLVM's ORC Engine) |
1729.7 ± 35.9 | 1689.5 | 1793.1 | 3.39 ± 0.17 |
Benchmark results Main vs HEAD.Base
Head
Base
Head
Base
Head
Base
Head
Base
Head
Base
Head
|
e126ffd to
8e70357
Compare
orizi
left a comment
There was a problem hiding this comment.
@orizi reviewed all commit messages and made 1 comment.
Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion (waiting on TomerStarkware).
src/runtime.rs line 101 at r1 (raw file):
/// Montgomery multiplication. pub extern "C" fn cairo_native__felt252_mul(dst: &mut [u8; 32], lhs: &[u8; 32], rhs: &[u8; 32]) { // value = a * R⁻¹ (free: reinterpret canonical bytes as raw limbs).
comment on all lines with raw vs montgomery representation.
abc0202 to
6cc4024
Compare
TomerStarkware
left a comment
There was a problem hiding this comment.
@TomerStarkware made 1 comment.
Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion (waiting on orizi).
src/runtime.rs line 101 at r1 (raw file):
Previously, orizi wrote…
comment on all lines with raw vs montgomery representation.
Done.
orizi
left a comment
There was a problem hiding this comment.
@orizi reviewed 1 file, made 1 comment, and resolved 1 discussion.
Reviewable status: 1 of 3 files reviewed, 1 unresolved discussion (waiting on TomerStarkware).
src/runtime.rs line 112 at r2 (raw file):
let product = lhs * rhs; // read the raw bytes (= a · b), the canonical product. *dst = felt_raw_to_le_bytes(&product);
impl doc - only on impl.
use programming notation.
Code quote:
let lhs = felt_from_raw_le_bytes(lhs);
// value = b raw bytes = b · R (1 Montgomery mul)
let rhs = Felt::from_bytes_le(rhs);
// value = a · b · R⁻¹ raw bytes = a · b (1 Montgomery mul)
let product = lhs * rhs;
// read the raw bytes (= a · b), the canonical product.
*dst = felt_raw_to_le_bytes(&product);6cc4024 to
f876fb2
Compare
TomerStarkware
left a comment
There was a problem hiding this comment.
@TomerStarkware made 1 comment.
Reviewable status: 1 of 3 files reviewed, 1 unresolved discussion (waiting on orizi).
src/runtime.rs line 112 at r2 (raw file):
Previously, orizi wrote…
impl doc - only on impl.
use programming notation.
Done.
orizi
left a comment
There was a problem hiding this comment.
@orizi reviewed 2 files and all commit messages, and resolved 1 discussion.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on TomerStarkware).
Route felt252 multiplication through a runtime binding instead of the inline i512 multiply-and-reduce. The inline lowering legalized into huge limb sequences that made O3 register allocation of multiplication-heavy contracts pathologically slow (e.g. the Falcon account contract: 655s -> 173s). The binding computes the product with a 2-CIOS canonical multiply, which is also ~1.77x faster than the naive 4-CIOS form. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
orizi
left a comment
There was a problem hiding this comment.
@orizi made 1 comment.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on TomerStarkware).
f876fb2 to
1cb3979
Compare
What
Route
felt252multiplication through a runtime binding (cairo_native__felt252_mul) instead of lowering it inline as an i512 multiply-and-reduce.Why
The inline lowering legalizes into very large limb-operation sequences. For multiplication-heavy contracts this makes O3 register allocation pathologically slow — the wide (252/512-bit) live values blow up the greedy allocator's interference graph on large functions.
Concretely, on the Falcon post-quantum account contract (a polynomial/NTT-heavy mega-function), O3 compilation dropped from ~655s to ~173s.
The binding computes the product with a 2-CIOS canonical multiply (interpreting the canonical little-endian operand as raw Montgomery limbs for one side), which also makes the multiply itself ~1.77× faster than the naive 4-CIOS form — verified by a microbench and a field-product equivalence test in
runtime.rs.felt252 stays an inline
i256everywhere else, so add/sub/is_zero and runtime performance for all other felt operations are unchanged.Changes
src/runtime.rs—cairo_native__felt252_mul(2-CIOS) +felt_from_raw_le_bytes/felt_raw_to_le_byteshelpers + equivalence/microbench testssrc/metadata/runtime_bindings.rs—Felt252Mulbinding (symbol, fn-ptr, MLIR emitter,setup_runtimeregistration)src/libfuncs/felt252.rs—Mulbranch routed to the binding instead of the inline i512 path🤖 Generated with Claude Code
This change is