Skip to content

perf(felt252): compute felt252_mul via a 2-CIOS runtime binding#1647

Merged
TomerStarkware merged 1 commit into
mainfrom
tomer/felt252_mul_pr
Jun 30, 2026
Merged

perf(felt252): compute felt252_mul via a 2-CIOS runtime binding#1647
TomerStarkware merged 1 commit into
mainfrom
tomer/felt252_mul_pr

Conversation

@TomerStarkware

@TomerStarkware TomerStarkware commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

What

Route felt252 multiplication through a runtime binding (cairo_native__felt252_mul) instead of lowering it inline as an i512 multiply-and-reduce.

Why

The inline lowering legalizes into very large limb-operation sequences. For multiplication-heavy contracts this makes O3 register allocation pathologically slow — the wide (252/512-bit) live values blow up the greedy allocator's interference graph on large functions.

Concretely, on the Falcon post-quantum account contract (a polynomial/NTT-heavy mega-function), O3 compilation dropped from ~655s to ~173s.

The binding computes the product with a 2-CIOS canonical multiply (interpreting the canonical little-endian operand as raw Montgomery limbs for one side), which also makes the multiply itself ~1.77× faster than the naive 4-CIOS form — verified by a microbench and a field-product equivalence test in runtime.rs.

felt252 stays an inline i256 everywhere else, so add/sub/is_zero and runtime performance for all other felt operations are unchanged.

Changes

  • src/runtime.rscairo_native__felt252_mul (2-CIOS) + felt_from_raw_le_bytes/felt_raw_to_le_bytes helpers + equivalence/microbench tests
  • src/metadata/runtime_bindings.rsFelt252Mul binding (symbol, fn-ptr, MLIR emitter, setup_runtime registration)
  • src/libfuncs/felt252.rsMul branch routed to the binding instead of the inline i512 path

🤖 Generated with Claude Code


This change is Reviewable

@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown

✅ Code is now correctly formatted.

@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown

Benchmarking results

Benchmark for program dict_insert

Open benchmarks
Command Mean [s] Min [s] Max [s] Relative
Cairo-vm (Rust, Cairo 1) 10.711 ± 0.126 10.564 10.959 5.64 ± 0.12
cairo-native (embedded AOT) 1.932 ± 0.021 1.909 1.980 1.02 ± 0.02
cairo-native (embedded JIT using LLVM's ORC Engine) 1.899 ± 0.035 1.843 1.945 1.00

Benchmark for program dict_snapshot

Open benchmarks
Command Mean [ms] Min [ms] Max [ms] Relative
Cairo-vm (Rust, Cairo 1) 570.5 ± 9.6 559.2 585.9 1.00
cairo-native (embedded AOT) 1730.8 ± 51.4 1635.5 1829.6 3.03 ± 0.10
cairo-native (embedded JIT using LLVM's ORC Engine) 1716.4 ± 49.6 1667.5 1821.2 3.01 ± 0.10

Benchmark for program factorial_2M

Open benchmarks
Command Mean [s] Min [s] Max [s] Relative
Cairo-vm (Rust, Cairo 1) 4.631 ± 0.058 4.561 4.772 2.60 ± 0.06
cairo-native (embedded AOT) 1.782 ± 0.031 1.730 1.832 1.00
cairo-native (embedded JIT using LLVM's ORC Engine) 1.856 ± 0.022 1.815 1.888 1.04 ± 0.02

Benchmark for program fib_2M

Open benchmarks
Command Mean [s] Min [s] Max [s] Relative
Cairo-vm (Rust, Cairo 1) 4.617 ± 0.133 4.477 4.913 2.70 ± 0.10
cairo-native (embedded AOT) 1.732 ± 0.049 1.654 1.817 1.01 ± 0.04
cairo-native (embedded JIT using LLVM's ORC Engine) 1.707 ± 0.042 1.670 1.788 1.00

Benchmark for program linear_search

Open benchmarks
Command Mean [ms] Min [ms] Max [ms] Relative
Cairo-vm (Rust, Cairo 1) 603.6 ± 8.2 593.6 615.0 1.00
cairo-native (embedded AOT) 1736.8 ± 11.3 1722.0 1759.5 2.88 ± 0.04
cairo-native (embedded JIT using LLVM's ORC Engine) 1746.5 ± 51.2 1683.1 1851.2 2.89 ± 0.09

Benchmark for program logistic_map

Open benchmarks
Command Mean [ms] Min [ms] Max [ms] Relative
Cairo-vm (Rust, Cairo 1) 510.0 ± 24.0 490.1 573.0 1.00
cairo-native (embedded AOT) 1668.1 ± 23.7 1648.4 1720.4 3.27 ± 0.16
cairo-native (embedded JIT using LLVM's ORC Engine) 1729.7 ± 35.9 1689.5 1793.1 3.39 ± 0.17

@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown

Benchmark results Main vs HEAD.

Base

Command Mean [s] Min [s] Max [s] Relative
base dict_insert.cairo (JIT) 1.819 ± 0.011 1.803 1.837 1.02 ± 0.01
base dict_insert.cairo (AOT) 1.784 ± 0.010 1.761 1.797 1.00

Head

Command Mean [s] Min [s] Max [s] Relative
head dict_insert.cairo (JIT) 1.870 ± 0.008 1.858 1.882 1.01 ± 0.01
head dict_insert.cairo (AOT) 1.848 ± 0.012 1.831 1.865 1.00

Base

Command Mean [s] Min [s] Max [s] Relative
base dict_snapshot.cairo (JIT) 1.641 ± 0.006 1.631 1.652 1.02 ± 0.01
base dict_snapshot.cairo (AOT) 1.616 ± 0.013 1.600 1.634 1.00

Head

Command Mean [s] Min [s] Max [s] Relative
head dict_snapshot.cairo (JIT) 1.670 ± 0.009 1.657 1.683 1.01 ± 0.01
head dict_snapshot.cairo (AOT) 1.652 ± 0.007 1.644 1.668 1.00

Base

Command Mean [s] Min [s] Max [s] Relative
base factorial_2M.cairo (JIT) 2.100 ± 0.030 2.071 2.171 1.00 ± 0.02
base factorial_2M.cairo (AOT) 2.098 ± 0.024 2.070 2.132 1.00

Head

Command Mean [s] Min [s] Max [s] Relative
head factorial_2M.cairo (JIT) 1.734 ± 0.009 1.721 1.752 1.00 ± 0.02
head factorial_2M.cairo (AOT) 1.732 ± 0.028 1.702 1.787 1.00

Base

Command Mean [s] Min [s] Max [s] Relative
base fib_2M.cairo (JIT) 1.663 ± 0.013 1.644 1.682 1.01 ± 0.01
base fib_2M.cairo (AOT) 1.646 ± 0.015 1.616 1.666 1.00

Head

Command Mean [s] Min [s] Max [s] Relative
head fib_2M.cairo (JIT) 1.680 ± 0.008 1.666 1.690 1.02 ± 0.01
head fib_2M.cairo (AOT) 1.655 ± 0.012 1.642 1.681 1.00

Base

Command Mean [s] Min [s] Max [s] Relative
base linear_search.cairo (JIT) 1.735 ± 0.054 1.670 1.833 1.03 ± 0.04
base linear_search.cairo (AOT) 1.685 ± 0.029 1.657 1.754 1.00

Head

Command Mean [s] Min [s] Max [s] Relative
head linear_search.cairo (JIT) 1.703 ± 0.007 1.692 1.714 1.02 ± 0.01
head linear_search.cairo (AOT) 1.671 ± 0.008 1.660 1.684 1.00

Base

Command Mean [s] Min [s] Max [s] Relative
base logistic_map.cairo (JIT) 1.937 ± 0.021 1.908 1.973 1.07 ± 0.02
base logistic_map.cairo (AOT) 1.813 ± 0.030 1.781 1.864 1.00

Head

Command Mean [s] Min [s] Max [s] Relative
head logistic_map.cairo (JIT) 1.693 ± 0.025 1.673 1.762 1.02 ± 0.02
head logistic_map.cairo (AOT) 1.661 ± 0.014 1.645 1.689 1.00

@orizi orizi left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orizi reviewed all commit messages and made 1 comment.
Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion (waiting on TomerStarkware).


src/runtime.rs line 101 at r1 (raw file):

/// Montgomery multiplication.
pub extern "C" fn cairo_native__felt252_mul(dst: &mut [u8; 32], lhs: &[u8; 32], rhs: &[u8; 32]) {
    // value = a * R⁻¹ (free: reinterpret canonical bytes as raw limbs).

comment on all lines with raw vs montgomery representation.

@TomerStarkware TomerStarkware left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomerStarkware made 1 comment.
Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion (waiting on orizi).


src/runtime.rs line 101 at r1 (raw file):

Previously, orizi wrote…

comment on all lines with raw vs montgomery representation.

Done.

@orizi orizi left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orizi reviewed 1 file, made 1 comment, and resolved 1 discussion.
Reviewable status: 1 of 3 files reviewed, 1 unresolved discussion (waiting on TomerStarkware).


src/runtime.rs line 112 at r2 (raw file):

    let product = lhs * rhs;
    // read the raw bytes (= a · b), the canonical product.
    *dst = felt_raw_to_le_bytes(&product);

impl doc - only on impl.

use programming notation.

Code quote:

    let lhs = felt_from_raw_le_bytes(lhs);
    // value = b          raw bytes = b · R          (1 Montgomery mul)
    let rhs = Felt::from_bytes_le(rhs);
    // value = a · b · R⁻¹   raw bytes = a · b        (1 Montgomery mul)
    let product = lhs * rhs;
    // read the raw bytes (= a · b), the canonical product.
    *dst = felt_raw_to_le_bytes(&product);

@TomerStarkware TomerStarkware left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomerStarkware made 1 comment.
Reviewable status: 1 of 3 files reviewed, 1 unresolved discussion (waiting on orizi).


src/runtime.rs line 112 at r2 (raw file):

Previously, orizi wrote…

impl doc - only on impl.

use programming notation.

Done.

@orizi orizi left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orizi reviewed 2 files and all commit messages, and resolved 1 discussion.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on TomerStarkware).

Route felt252 multiplication through a runtime binding instead of the
inline i512 multiply-and-reduce. The inline lowering legalized into huge
limb sequences that made O3 register allocation of multiplication-heavy
contracts pathologically slow (e.g. the Falcon account contract: 655s ->
173s). The binding computes the product with a 2-CIOS canonical multiply,
which is also ~1.77x faster than the naive 4-CIOS form.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@orizi orizi left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@orizi made 1 comment.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on TomerStarkware).

@TomerStarkware TomerStarkware added this pull request to the merge queue Jun 30, 2026
Merged via the queue into main with commit 289c67c Jun 30, 2026
15 checks passed
@TomerStarkware TomerStarkware deleted the tomer/felt252_mul_pr branch June 30, 2026 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants