Skip to content

perf: 2.5x faster string split on single-char separators#504

Merged
cs01 merged 1 commit intomainfrom
perf/string-search
Apr 13, 2026
Merged

perf: 2.5x faster string split on single-char separators#504
cs01 merged 1 commit intomainfrom
perf/string-search

Conversation

@cs01
Copy link
Copy Markdown
Owner

@cs01 cs01 commented Apr 13, 2026

User-visible impact

"text".split("\n"), .split(","), .split(" ") — the most common split patterns — are now roughly 2.5x faster. Any ChadScript program that parses line-based or delimited data (log readers, CSV loaders, tokenizers, grep-style tools) gets the speedup for free.

Measured on the string search benchmark (recursively grep console.log in src/)

binary before after
chad ~21 ms ~8.5 ms
C baseline ~7 ms ~7 ms

Chad now runs within ~20% of a hand-written C implementation of the same benchmark. Instruction count drops from 267M to 101M; cycles from 91M to 37M.

Root cause

cs_str_split (the C bridge backing String.prototype.split) walked the source buffer one byte at a time and called memcmp(src + pos, sep, 1) on every byte when the separator was a single character. That's a PLT-stub call per byte, and memcmp with n=1 never reaches its SIMD fast path. For a typical 2-3 KB source file that's thousands of stub calls per split — profiling the old benchmark showed ~80% of total time in cs_str_split_platform_memcmp.

Fix

Added a sep_len == 1 fast path that uses memchr instead. memchr is a libc SIMD primitive that scans whole cache lines at once (NEON on arm64, AVX2 on glibc x86_64), so finding the next separator in a long string costs ~1 cycle per 16 bytes instead of a full function call per byte. Both passes of cs_str_split (counting parts, then filling them) take the fast path.

Multi-char separators are unchanged.

Test plan

  • npm run verify:quick — all tests + stage 1 self-hosting pass
  • String search benchmark improves from 21 ms to ~8.5 ms
  • String ops benchmark (also uses single-char split) runs clean at ~13 ms
  • CI green

@github-actions
Copy link
Copy Markdown
Contributor

Benchmark Results (Linux x86-64)

Benchmark C ChadScript Go Node Place
Binary Trees 1.589s 1.300s 2.720s 1.157s 🥈
Cold Start 0.9ms 0.8ms 1.2ms 28.7ms 🥇
Fibonacci 0.816s 0.765s 1.393s 3.164s 🥇
File I/O 0.117s 0.093s 0.086s 0.205s 🥈
JSON Parse/Stringify 0.004s 0.005s 0.018s 0.015s 🥈
Matrix Multiply 0.445s 0.732s 0.614s 0.364s #4
Monte Carlo Pi 0.389s 0.410s 0.405s 2.248s 🥉
N-Body Simulation 1.666s 2.122s 2.202s 2.400s 🥈
Quicksort 0.216s 0.247s 0.213s 0.262s 🥉
SQLite 0.355s 0.406s 0.424s 🥈
Sieve of Eratosthenes 0.015s 0.027s 0.020s 0.040s 🥉
String Manipulation 0.008s 0.037s 0.016s 0.037s 🥉

CLI Tool Benchmarks

Benchmark ChadScript grep node xxd Place
Hex Dump 0.434s 0.995s 0.131s 🥈
Recursive Grep 0.019s 0.010s 0.099s 🥈

@cs01 cs01 merged commit db3ba9a into main Apr 13, 2026
13 checks passed
cs01 added a commit that referenced this pull request Apr 13, 2026
Co-authored-by: cs01 <cs01@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant