perf(felt252): compute felt252_mul via a 2-CIOS runtime binding by TomerStarkware · Pull Request #1647 · starkware-libs/cairo_native

TomerStarkware · 2026-06-29T11:59:25Z

What

Route felt252 multiplication through a runtime binding (cairo_native__felt252_mul) instead of lowering it inline as an i512 multiply-and-reduce.

Why

The inline lowering legalizes into very large limb-operation sequences. For multiplication-heavy contracts this makes O3 register allocation pathologically slow — the wide (252/512-bit) live values blow up the greedy allocator's interference graph on large functions.

Concretely, on the Falcon post-quantum account contract (a polynomial/NTT-heavy mega-function), O3 compilation dropped from ~655s to ~173s.

The binding computes the product with a 2-CIOS canonical multiply (interpreting the canonical little-endian operand as raw Montgomery limbs for one side), which also makes the multiply itself ~1.77× faster than the naive 4-CIOS form — verified by a microbench and a field-product equivalence test in runtime.rs.

felt252 stays an inline i256 everywhere else, so add/sub/is_zero and runtime performance for all other felt operations are unchanged.

Changes

src/runtime.rs — cairo_native__felt252_mul (2-CIOS) + felt_from_raw_le_bytes/felt_raw_to_le_bytes helpers + equivalence/microbench tests
src/metadata/runtime_bindings.rs — Felt252Mul binding (symbol, fn-ptr, MLIR emitter, setup_runtime registration)
src/libfuncs/felt252.rs — Mul branch routed to the binding instead of the inline i512 path

🤖 Generated with Claude Code

This change is

github-actions · 2026-06-29T11:59:54Z

✅ Code is now correctly formatted.

github-actions · 2026-06-29T12:33:19Z

Benchmarking results

Benchmark for program `dict_insert`

Open benchmarks

Command	Mean [s]	Min [s]	Max [s]	Relative
`Cairo-vm (Rust, Cairo 1)`	10.711 ± 0.126	10.564	10.959	5.64 ± 0.12
`cairo-native (embedded AOT)`	1.932 ± 0.021	1.909	1.980	1.02 ± 0.02
`cairo-native (embedded JIT using LLVM's ORC Engine)`	1.899 ± 0.035	1.843	1.945	1.00

Benchmark for program `dict_snapshot`

Open benchmarks

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`Cairo-vm (Rust, Cairo 1)`	570.5 ± 9.6	559.2	585.9	1.00
`cairo-native (embedded AOT)`	1730.8 ± 51.4	1635.5	1829.6	3.03 ± 0.10
`cairo-native (embedded JIT using LLVM's ORC Engine)`	1716.4 ± 49.6	1667.5	1821.2	3.01 ± 0.10

Benchmark for program `factorial_2M`

Open benchmarks

Command	Mean [s]	Min [s]	Max [s]	Relative
`Cairo-vm (Rust, Cairo 1)`	4.631 ± 0.058	4.561	4.772	2.60 ± 0.06
`cairo-native (embedded AOT)`	1.782 ± 0.031	1.730	1.832	1.00
`cairo-native (embedded JIT using LLVM's ORC Engine)`	1.856 ± 0.022	1.815	1.888	1.04 ± 0.02

Benchmark for program `fib_2M`

Open benchmarks

Command	Mean [s]	Min [s]	Max [s]	Relative
`Cairo-vm (Rust, Cairo 1)`	4.617 ± 0.133	4.477	4.913	2.70 ± 0.10
`cairo-native (embedded AOT)`	1.732 ± 0.049	1.654	1.817	1.01 ± 0.04
`cairo-native (embedded JIT using LLVM's ORC Engine)`	1.707 ± 0.042	1.670	1.788	1.00

Benchmark for program `linear_search`

Open benchmarks

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`Cairo-vm (Rust, Cairo 1)`	603.6 ± 8.2	593.6	615.0	1.00
`cairo-native (embedded AOT)`	1736.8 ± 11.3	1722.0	1759.5	2.88 ± 0.04
`cairo-native (embedded JIT using LLVM's ORC Engine)`	1746.5 ± 51.2	1683.1	1851.2	2.89 ± 0.09

Benchmark for program `logistic_map`

Open benchmarks

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`Cairo-vm (Rust, Cairo 1)`	510.0 ± 24.0	490.1	573.0	1.00
`cairo-native (embedded AOT)`	1668.1 ± 23.7	1648.4	1720.4	3.27 ± 0.16
`cairo-native (embedded JIT using LLVM's ORC Engine)`	1729.7 ± 35.9	1689.5	1793.1	3.39 ± 0.17

github-actions · 2026-06-29T12:35:18Z

Benchmark results Main vs HEAD.

Base

Command	Mean [s]	Min [s]	Max [s]	Relative
`base dict_insert.cairo (JIT)`	1.819 ± 0.011	1.803	1.837	1.02 ± 0.01
`base dict_insert.cairo (AOT)`	1.784 ± 0.010	1.761	1.797	1.00

Head

Command	Mean [s]	Min [s]	Max [s]	Relative
`head dict_insert.cairo (JIT)`	1.870 ± 0.008	1.858	1.882	1.01 ± 0.01
`head dict_insert.cairo (AOT)`	1.848 ± 0.012	1.831	1.865	1.00

Base

Command	Mean [s]	Min [s]	Max [s]	Relative
`base dict_snapshot.cairo (JIT)`	1.641 ± 0.006	1.631	1.652	1.02 ± 0.01
`base dict_snapshot.cairo (AOT)`	1.616 ± 0.013	1.600	1.634	1.00

Head

Command	Mean [s]	Min [s]	Max [s]	Relative
`head dict_snapshot.cairo (JIT)`	1.670 ± 0.009	1.657	1.683	1.01 ± 0.01
`head dict_snapshot.cairo (AOT)`	1.652 ± 0.007	1.644	1.668	1.00

Base

Command	Mean [s]	Min [s]	Max [s]	Relative
`base factorial_2M.cairo (JIT)`	2.100 ± 0.030	2.071	2.171	1.00 ± 0.02
`base factorial_2M.cairo (AOT)`	2.098 ± 0.024	2.070	2.132	1.00

Head

Command	Mean [s]	Min [s]	Max [s]	Relative
`head factorial_2M.cairo (JIT)`	1.734 ± 0.009	1.721	1.752	1.00 ± 0.02
`head factorial_2M.cairo (AOT)`	1.732 ± 0.028	1.702	1.787	1.00

Base

Command	Mean [s]	Min [s]	Max [s]	Relative
`base fib_2M.cairo (JIT)`	1.663 ± 0.013	1.644	1.682	1.01 ± 0.01
`base fib_2M.cairo (AOT)`	1.646 ± 0.015	1.616	1.666	1.00

Head

Command	Mean [s]	Min [s]	Max [s]	Relative
`head fib_2M.cairo (JIT)`	1.680 ± 0.008	1.666	1.690	1.02 ± 0.01
`head fib_2M.cairo (AOT)`	1.655 ± 0.012	1.642	1.681	1.00

Base

Command	Mean [s]	Min [s]	Max [s]	Relative
`base linear_search.cairo (JIT)`	1.735 ± 0.054	1.670	1.833	1.03 ± 0.04
`base linear_search.cairo (AOT)`	1.685 ± 0.029	1.657	1.754	1.00

Head

Command	Mean [s]	Min [s]	Max [s]	Relative
`head linear_search.cairo (JIT)`	1.703 ± 0.007	1.692	1.714	1.02 ± 0.01
`head linear_search.cairo (AOT)`	1.671 ± 0.008	1.660	1.684	1.00

Base

Command	Mean [s]	Min [s]	Max [s]	Relative
`base logistic_map.cairo (JIT)`	1.937 ± 0.021	1.908	1.973	1.07 ± 0.02
`base logistic_map.cairo (AOT)`	1.813 ± 0.030	1.781	1.864	1.00

Head

Command	Mean [s]	Min [s]	Max [s]	Relative
`head logistic_map.cairo (JIT)`	1.693 ± 0.025	1.673	1.762	1.02 ± 0.02
`head logistic_map.cairo (AOT)`	1.661 ± 0.014	1.645	1.689	1.00

orizi

@orizi reviewed all commit messages and made 1 comment.
Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion (waiting on TomerStarkware).

src/runtime.rs line 101 at r1 (raw file):

/// Montgomery multiplication.
pub extern "C" fn cairo_native__felt252_mul(dst: &mut [u8; 32], lhs: &[u8; 32], rhs: &[u8; 32]) {
    // value = a * R⁻¹ (free: reinterpret canonical bytes as raw limbs).

comment on all lines with raw vs montgomery representation.

TomerStarkware

@TomerStarkware made 1 comment.
Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion (waiting on orizi).

src/runtime.rs line 101 at r1 (raw file):

Previously, orizi wrote…

comment on all lines with raw vs montgomery representation.

Done.

orizi

@orizi reviewed 1 file, made 1 comment, and resolved 1 discussion.
Reviewable status: 1 of 3 files reviewed, 1 unresolved discussion (waiting on TomerStarkware).

src/runtime.rs line 112 at r2 (raw file):

    let product = lhs * rhs;
    // read the raw bytes (= a · b), the canonical product.
    *dst = felt_raw_to_le_bytes(&product);

impl doc - only on impl.

use programming notation.

Code quote:

    let lhs = felt_from_raw_le_bytes(lhs);
    // value = b          raw bytes = b · R          (1 Montgomery mul)
    let rhs = Felt::from_bytes_le(rhs);
    // value = a · b · R⁻¹   raw bytes = a · b        (1 Montgomery mul)
    let product = lhs * rhs;
    // read the raw bytes (= a · b), the canonical product.
    *dst = felt_raw_to_le_bytes(&product);

TomerStarkware

@TomerStarkware made 1 comment.
Reviewable status: 1 of 3 files reviewed, 1 unresolved discussion (waiting on orizi).

src/runtime.rs line 112 at r2 (raw file):

Previously, orizi wrote…

impl doc - only on impl.

use programming notation.

Done.

orizi

@orizi reviewed 2 files and all commit messages, and resolved 1 discussion.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on TomerStarkware).

Route felt252 multiplication through a runtime binding instead of the inline i512 multiply-and-reduce. The inline lowering legalized into huge limb sequences that made O3 register allocation of multiplication-heavy contracts pathologically slow (e.g. the Falcon account contract: 655s -> 173s). The binding computes the product with a 2-CIOS canonical multiply, which is also ~1.77x faster than the naive 4-CIOS form. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

orizi

@orizi made 1 comment.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on TomerStarkware).

TomerStarkware force-pushed the tomer/felt252_mul_pr branch 2 times, most recently from e126ffd to 8e70357 Compare June 29, 2026 13:54

TomerStarkware requested a review from orizi June 29, 2026 14:21

TomerStarkware mentioned this pull request Jun 30, 2026

perf(felt252): compute felt252_div via a runtime binding #1648

Merged

orizi requested changes Jun 30, 2026

View reviewed changes

TomerStarkware force-pushed the tomer/felt252_mul_pr branch from abc0202 to 6cc4024 Compare June 30, 2026 12:05

TomerStarkware commented Jun 30, 2026

View reviewed changes

orizi requested changes Jun 30, 2026

View reviewed changes

TomerStarkware force-pushed the tomer/felt252_mul_pr branch from 6cc4024 to f876fb2 Compare June 30, 2026 13:08

TomerStarkware commented Jun 30, 2026

View reviewed changes

orizi approved these changes Jun 30, 2026

View reviewed changes

TomerStarkware force-pushed the tomer/felt252_mul_pr branch from f876fb2 to 1cb3979 Compare June 30, 2026 13:13

TomerStarkware added this pull request to the merge queue Jun 30, 2026

Merged via the queue into main with commit 289c67c Jun 30, 2026
15 checks passed

TomerStarkware deleted the tomer/felt252_mul_pr branch June 30, 2026 14:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(felt252): compute felt252_mul via a 2-CIOS runtime binding#1647

perf(felt252): compute felt252_mul via a 2-CIOS runtime binding#1647
TomerStarkware merged 1 commit into
mainfrom
tomer/felt252_mul_pr

TomerStarkware commented Jun 29, 2026 •

edited by orizi

Loading

Uh oh!

github-actions Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

orizi left a comment

Uh oh!

TomerStarkware left a comment

Uh oh!

orizi left a comment

Uh oh!

TomerStarkware left a comment

Uh oh!

orizi left a comment

Uh oh!

orizi left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

TomerStarkware commented Jun 29, 2026 • edited by orizi Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Changes

Uh oh!

github-actions Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarking results

Benchmark for program dict_insert

Benchmark for program dict_snapshot

Benchmark for program factorial_2M

Benchmark for program fib_2M

Benchmark for program linear_search

Benchmark for program logistic_map

Uh oh!

github-actions Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark results Main vs HEAD.

Base

Head

Base

Head

Base

Head

Base

Head

Base

Head

Base

Head

Uh oh!

orizi left a comment

Choose a reason for hiding this comment

Uh oh!

TomerStarkware left a comment

Choose a reason for hiding this comment

Uh oh!

orizi left a comment

Choose a reason for hiding this comment

Uh oh!

TomerStarkware left a comment

Choose a reason for hiding this comment

Uh oh!

orizi left a comment

Choose a reason for hiding this comment

Uh oh!

orizi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TomerStarkware commented Jun 29, 2026 •

edited by orizi

Loading

github-actions Bot commented Jun 29, 2026 •

edited

Loading

github-actions Bot commented Jun 29, 2026 •

edited

Loading

Benchmark for program `dict_insert`

Benchmark for program `dict_snapshot`

Benchmark for program `factorial_2M`

Benchmark for program `fib_2M`

Benchmark for program `linear_search`

Benchmark for program `logistic_map`

github-actions Bot commented Jun 29, 2026 •

edited

Loading