perf: 2.5x faster string split on single-char separators#504
Merged
Conversation
Contributor
Benchmark Results (Linux x86-64)
CLI Tool Benchmarks
|
cs01
added a commit
that referenced
this pull request
Apr 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
User-visible impact
"text".split("\n"),.split(","),.split(" ")— the most common split patterns — are now roughly 2.5x faster. Any ChadScript program that parses line-based or delimited data (log readers, CSV loaders, tokenizers, grep-style tools) gets the speedup for free.Measured on the string search benchmark (recursively grep
console.loginsrc/)Chad now runs within ~20% of a hand-written C implementation of the same benchmark. Instruction count drops from 267M to 101M; cycles from 91M to 37M.
Root cause
cs_str_split(the C bridge backingString.prototype.split) walked the source buffer one byte at a time and calledmemcmp(src + pos, sep, 1)on every byte when the separator was a single character. That's a PLT-stub call per byte, andmemcmpwithn=1never reaches its SIMD fast path. For a typical 2-3 KB source file that's thousands of stub calls per split — profiling the old benchmark showed ~80% of total time incs_str_split→_platform_memcmp.Fix
Added a
sep_len == 1fast path that usesmemchrinstead.memchris a libc SIMD primitive that scans whole cache lines at once (NEON on arm64, AVX2 on glibc x86_64), so finding the next separator in a long string costs ~1 cycle per 16 bytes instead of a full function call per byte. Both passes ofcs_str_split(counting parts, then filling them) take the fast path.Multi-char separators are unchanged.
Test plan
npm run verify:quick— all tests + stage 1 self-hosting pass