Reduce CKKS rotation-key footprint via single-giant-step Horner BSGS by pascoec · Pull Request #1209 · openfheorg/openfhe-development

pascoec · 2026-06-24T20:44:02Z

Summary

The baby-step/giant-step (BSGS) linear transforms used throughout CKKS -- the
bootstrapping homomorphic FFT (CoeffsToSlots / SlotsToCoeffs) and the CKKS<->FHEW
scheme-switching transforms -- generate one rotation (automorphism) key per giant
step. These keys dominate the memory footprint of a bootstrapping / scheme-
switching context. This PR reformulates the giant-step accumulation into Horner
form, which needs only a single giant-step key per level instead of (b-1) distinct
keys, then drops the now-unneeded keys from the matching key generation. The
underlying math is unchanged; results are equivalent.

What changed

Horner giant-step accumulation: sum_i Aut_{it}(block_i) becomes
block_0 + Aut_t(block_1 + Aut_t(...)), using one giant-step key (t = gscale)
instead of {2t, 3t, ..., (b-1)t}. Applied to every BSGS transform in both
subsystems.
Zero-based inner indices: the j=0 baby term maps to rotation 0 (handled by
KeySwitchExt, no key); the per-level offset is pushed offline into the
precomputed plaintexts.
Folded per-level corrections into a single accumulated rotation applied once at
the end of SlotsToCoeffs, replacing the previous O(levels) EvalAtIndex calls;
split so CoeffsToSlots still emits correctly-ordered slots (fixes FBT_CONSECLEV).
Sparse-packing index reduction modulo min(2slots, M/4), keeping automorphisms
consistent with the period-(2slots) plaintext pre-rotations (no-op for full and
half packing).
Key-gen reduction: FindCoeffsToSlotsRotationIndices, FindSlotsToCoeffsRotation-
Indices, FindLinearTransformRotationIndices, FindLTRotationIndicesSwitch, and
FindLTRotationIndicesSwitchArgmin no longer emit the giant-step keys.

Results

Rotation-key count, dev vs. this PR, CKKS bootstrapping (ring 2^16, UNIFORM,
budget {3,3}):

packing	keys.dev	keys.new	reduction
full	54	49	1.10x
1/4	88	63	1.40x
1/16	81	57	1.42x
1/64	50	29	1.72x

The reduction grows with sparsity (more giant steps relative to the work).
Scheme switching uses the identical technique for the same per-transform saving.

Not included

The SlotsToCoeffs decode-layer re-partition (moving the remainder group to scale 1
to mirror CoeffsToSlots) is intentionally left out -- the StC remainder still sits
at the largest-scale position, so the additional key sharing that enables is a
follow-up.

Reformulate the baby-step/giant-step (BSGS) evaluation of the CoeffsToSlots and SlotsToCoeffs linear transforms so that the same slot permutations are realized with far fewer distinct rotation (automorphism) keys. The keys these transforms need dominate the memory cost of a bootstrapping context, and EvalBootstrapKeyGen now generates a smaller, more heavily shared set. The underlying math is unchanged. Core changes to EvalCoeffsToSlots / EvalSlotsToCoeffs and their precompute: - Horner single giant-step. Replace the forward outer sum sum_i Aut_{i*t}(block_i) with the nested Horner form block_0 + Aut_t(block_1 + Aut_t(...)). Both use the same rotation count, but Horner needs only one giant-step key per level (stride t = g*scale) instead of the b-1 distinct keys {t, 2t, ..., (b-1)t}. - Zero-based hoisted inner rotations. Replace the centered inner baby-step indices {(j-offset)*sigma} with zero-based {j*sigma}. The j=0 term is now always rotation 0 and is handled directly by KeySwitchExt (no key), and the per-level offset delta_s = offset*scale is pushed offline into the precomputed plaintexts as a pre-rotation. - Folded per-level corrections. The per-level zero-basing corrections (an O(levels) set of runtime EvalAtIndex calls in both transforms) are commuted forward and absorbed into the precomputed plaintexts, leaving a single accumulated rotation applied once at the end of SlotsToCoeffs. EvalMod is equivariant under slot rotations, so the CoeffsToSlots correction passes through it unchanged. - Split, not combined, accumulated correction. Apply Aut_{-(slots-1)} at the end of CoeffsToSlots so it always outputs correctly-ordered slots, rather than deferring a single combined correction to the end of SlotsToCoeffs. This fixes EvalFBTNoDecoding + EvalHomDecoding (FBT_CONSECLEV), where a user operation between the two transforms previously saw a residual rotation. - Sparse-packing index reduction. Reduce all BSGS rotation indices modulo min(2*slots, M/4). Under sparse packing the precomputed-plaintext vector has cyclic period 2*slots (the concatenated real/imaginary blocks), not M/4, so indices reduced only to [0, M/4) could otherwise be inconsistent with the period-2*slots plaintext pre-rotations. For full and half packing the modulus equals M/4 and behavior is unchanged. The single-level linear transform (EvalLinearTransform, used when the level budget is 1) is converted to the same single-giant-step Horner form, and FindLinearTransformRotationIndices no longer emits the giant-step keys {2g, 3g, ..., (h-1)g} that the forward form required. Supporting cleanups (no behavior change): - Inline the precomputed rot_in index tables in EvalCoeffsToSlots / EvalSlotsToCoeffs (compute per level, drop the 2D allocation and a redundant scale pass). - Fix the over-large reserve() in the Find*RotationIndices helpers (they reserved ~M entries for a list of a few hundred). - Hoist a redundant KeySwitchExt out of the EvalLinearTransform giant-step loop, and hoist repeated GetParams()/GetElementAtIndex() calls in ExtendCiphertext. - Take crypto parameters by const reference and compute the bootstrap scale factor with std::ldexp. - Add Doxygen for the transform functions and a note in CKKS_BOOTSTRAPPING.md. Not included: the SlotsToCoeffs decode-layer re-partition that moves the remainder group to scale 1 to mirror CoeffsToSlots. The StC remainder still sits at the last (largest-scale) position, so the additional key sharing that re-partition enables (e.g. dropping the StC-specific large-scale remainder key) is not realized here.

…nsforms

pascoec added 2 commits June 24, 2026 11:52

Apply single-giant-step Horner BSGS to CKKS/FHEW scheme-switching tra…

cd91d8b

…nsforms

pascoec added this to the Release 1.6.0 milestone Jun 24, 2026

pascoec self-assigned this Jun 24, 2026

pascoec added cleanup Code cleanup optimization Improves performance labels Jun 24, 2026

pascoec linked an issue Jun 24, 2026 that may be closed by this pull request

Implement CKKS bootstrapping linear transform optimizations from FIDESlib #1203

Open

pascoec requested a review from yspolyakov June 25, 2026 02:13

map-hoist + seed-peel of the Horner giant-step loop

2eb89dc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce CKKS rotation-key footprint via single-giant-step Horner BSGS#1209

Reduce CKKS rotation-key footprint via single-giant-step Horner BSGS#1209
pascoec wants to merge 3 commits into
devfrom
transform-optimizations

pascoec commented Jun 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

pascoec commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Results

Not included

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pascoec commented Jun 24, 2026 •

edited

Loading