[Fix] Skip FTS search for queries shorter than 2 characters (#391) by dpanbug · Pull Request #557 · clawwork-ai/ClawWork

dpanbug · 2026-07-01T05:57:03Z

Closes #391.

Problem

globalSearch and searchArtifacts turn the query into an FTS5 prefix match by appending *. A single-character query like "a" becomes "a*", which matches almost every indexed token — effectively a full-table scan that blocks the main process for hundreds of milliseconds on a sizeable workspace.

Fix

Extract the shared normalization into toFtsPrefixQuery, which returns null for empty or single-character queries so both functions bail out before preparing and executing the statement (MIN_FTS_QUERY_LENGTH = 2). Word and CJK characters are still preserved.

Question for maintainers (CJK)

This guard also skips single-character CJK queries (e.g. a lone 中) — a single-character prefix carries the same full-scan cost regardless of script, so I applied the threshold uniformly (matching the issue's suggested < 3). If single CJK-character search should stay enabled, I'm happy to special-case it — just let me know your preference.

Tests

Added unit tests covering the single-character guard, punctuation-only input, and the normal two-character path. Search tests pass (vitest run test/artifact-search.test.ts → 7 passed).

Release note

Fixed a brief UI freeze when typing a single character in search; queries now run from two characters onward.

…-ai#391) `globalSearch` and `searchArtifacts` turn the query into an FTS5 prefix match by appending `*`. A single-character query like `"a"` becomes `"a*"`, which matches almost every indexed token — effectively a full-table scan that blocks the main process for hundreds of milliseconds on a sizeable workspace. Extract the shared normalization into `toFtsPrefixQuery`, which returns `null` for empty or single-character queries so both functions bail out before preparing and executing the statement. Word and CJK characters are still preserved. Added unit tests for the single-character guard, punctuation-only input, and the normal two-character path. Note: this also skips single-character CJK queries (e.g. a lone `中`), which carry the same short-prefix scan cost. See the PR description for a question on whether single CJK-character search should be exempted. Release note: Fixed a brief UI freeze when typing a single character in search; queries now run from two characters onward. Signed-off-by: kainam <tek@dpan-bug.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions · 2026-07-01T05:57:11Z

Hi @dpanbug,
Thanks for your pull request!
If the PR is ready, use the /auto-cc command to assign Reviewer to Review.
We will review it shortly.

Details

Instructions for interacting with me using comments are available here.
If you have questions or suggestions related to my behavior, please file an issue against the gh-ci-bot repository.

gemini-code-assist · 2026-07-01T06:01:06Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a performance issue where single-character search queries triggered inefficient full-table scans, causing UI freezes. By enforcing a minimum length of two characters for FTS5 prefix matches, the application now avoids unnecessary database operations for short or invalid queries. The change includes a refactor to unify query normalization and adds robust test coverage to prevent regressions.

Highlights

Search Performance Optimization: Introduced a minimum query length requirement of 2 characters for FTS5 prefix searches to prevent expensive full-table scans.
Code Refactoring: Centralized query normalization logic into a new toFtsPrefixQuery helper function used by both global and artifact search modules.
Test Coverage: Added comprehensive unit tests to verify the new length guard, ensuring single-character and punctuation-only queries are correctly ignored.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a minimum query length guard of two characters for FTS5 prefix matches in global and artifact searches to prevent performance degradation, along with corresponding unit tests. The review feedback identifies a critical internationalization bug where non-ASCII and non-CJK characters are stripped, as well as a usability issue where single-character CJK queries are blocked. It is recommended to use JavaScript Unicode property escapes to preserve international characters and dynamically allow single-character searches for CJK scripts, and to add tests covering these scenarios.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-07-01T06:03:43Z

+/**
+ * Normalize a raw user query into an FTS5 prefix expression, or return `null`
+ * when it is empty or shorter than {@link MIN_FTS_QUERY_LENGTH}. Word and CJK
+ * characters are preserved; everything else is treated as a separator.
+ */
+function toFtsPrefixQuery(query: string): string | null {
+  const normalized = query.replace(/[^\w\u4e00-\u9fff]/g, ' ').trim();
+  if (normalized.length < MIN_FTS_QUERY_LENGTH) return null;
+  return normalized + '*';
+}


🌍 Internationalization & CJK Search Usability Issues

There are two significant issues with the current implementation of toFtsPrefixQuery:

Internationalization Bug (High Severity): The regex /[^\w\u4e00-\u9fff]/g strips out all non-ASCII and non-CJK letters (such as accented characters like é, ö, Cyrillic, Greek, Japanese Hiragana/Katakana, and Korean Hangul) and replaces them with spaces. This completely breaks search for users in non-English/non-Chinese locales.

CJK Usability (Medium Severity): As you noted in the PR description, skipping single-character CJK queries (e.g., 中) degrades usability because single-character words are extremely common and meaningful in Chinese, Japanese, and Korean.

💡 Solution

We can use modern JavaScript Unicode property escapes (/gu flag) to:

Correctly preserve letters and numbers across all languages (\p{L} and \p{N}).

Detect CJK/Japanese/Korean scripts (Han, Hiragana, Katakana, Hangul) to dynamically allow single-character searches for those languages while keeping the 2-character limit for Latin/alphanumeric scripts.

/** * Normalize a raw user query into an FTS5 prefix expression, or return `null` * when it is empty or shorter than the minimum required length. * Letters (including international scripts like CJK, Hiragana, Katakana, Hangul, * Cyrillic, etc.) and numbers are preserved; everything else is treated as a separator. */ function toFtsPrefixQuery(query: string): string | null { // Use Unicode property escapes to preserve letters and numbers across all languages, // while collapsing multiple non-alphanumeric characters into a single space. const normalized = query.replace(/[^\p{L}\p{N}_]+/gu, ' ').trim(); // For CJK/Japanese/Korean scripts, single-character queries are highly meaningful // and should not be blocked. const isCjk = /[\p{Script=Han}\p{Script=Hiragana}\p{Script=Katakana}\p{Script=Hangul}]/u.test(normalized); const minLength = isCjk ? 1 : MIN_FTS_QUERY_LENGTH; if (normalized.length < minLength) return null; return normalized + '*'; }

gemini-code-assist · 2026-07-01T06:03:43Z

+  it('runs the prefix query once the minimum length is reached', () => {
+    const { db, all } = mockDb([]);
+
+    globalSearch(db as never, 'ab');
+
+    expect(all).toHaveBeenCalledWith('ab*', 'ab*', 'ab*');
+  });
+});


🧪 Add Tests for Internationalization and CJK Queries

To ensure that international characters are correctly preserved and single-character CJK queries are allowed, we should add unit tests covering these cases.

it('runs the prefix query once the minimum length is reached', () => { const { db, all } = mockDb([]); globalSearch(db as never, 'ab'); expect(all).toHaveBeenCalledWith('ab*', 'ab*', 'ab*'); }); it('allows single-character CJK/Japanese/Korean queries', () => { const { db, all } = mockDb([]); globalSearch(db as never, '中'); expect(all).toHaveBeenCalledWith('中*', '中*', 'Middle*'); }); it('preserves international and accented characters', () => { const { db, all } = mockDb([]); globalSearch(db as never, 'Café'); expect(all).toHaveBeenCalledWith('Café*', 'Café*', 'Café*'); }); });

dpanbug · 2026-07-01T06:07:25Z

/auto-cc

Addresses the Gemini review: the previous `[^\w一-鿿]` character class kept only ASCII word characters plus CJK ideographs, so it stripped accented Latin (é, ñ, ü), Cyrillic, Arabic — and, despite the CJK intent, Japanese kana and Korean Hangul (both outside U+4E00–U+9FFF), which the app ships locales for. Switch to `[^\p{L}\p{N}_]` with the `u` flag so letters and digits of any script are preserved. Added tests for accented Latin and Japanese kana. The single-character guard still applies uniformly, so a lone CJK/kana/ Hangul character is still skipped — see the PR discussion for the open question on exempting single-character searches for those scripts. Signed-off-by: kainam <tek@dpan-bug.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

dpanbug · 2026-07-01T06:20:28Z

Thanks @gemini-code-assist — good catch on the tokenization. Fixed in df1829d: the character class is now [^\p{L}\p{N}_] (with the u flag), which preserves accented Latin, Cyrillic, Japanese kana, and Korean Hangul instead of stripping them. The previous \w-based class only kept ASCII word chars plus CJK ideographs, so kana/Hangul were dropped despite the CJK intent — even though the app ships ja/ko locales. Added tests for accented Latin (café) and kana (ひらがな).

Still open for maintainers: the minimum-length guard applies uniformly, so a single-character CJK/kana/Hangul query is skipped along with single ASCII characters. A one-character prefix carries the same near-full-scan cost in any script, so I kept it uniform — but if single-character search should stay enabled for those scripts, I'm happy to special-case it. Let me know your preference.

gemini-code-assist · 2026-07-01T06:20:40Z

Great work on the tokenization fix! Using the u flag with the \p{L} and \p{N} Unicode property escapes is definitely the right approach for robust, script-agnostic normalization. It correctly handles the wide range of characters your users might input across different locales.

Regarding your question about single-character CJK queries: keeping the threshold uniform is a sound decision for performance, especially since FTS5 prefix matching on a single character can still be quite expensive depending on the index size and distribution. If you find that users are frequently searching for single CJK characters and it's a critical use case, you could consider a more nuanced approach later (e.g., checking the script type before applying the length guard), but for now, maintaining consistency is cleaner and safer for the UI performance goal.

dpanbug requested review from HiddenPuppy, mvanhorn and samzong as code owners July 1, 2026 05:57

gemini-code-assist Bot reviewed Jul 1, 2026

View reviewed changes

github-actions Bot requested a review from yankay July 1, 2026 06:07

samzong merged commit 182a861 into clawwork-ai:main Jul 3, 2026
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Fix] Skip FTS search for queries shorter than 2 characters (#391)#557

[Fix] Skip FTS search for queries shorter than 2 characters (#391)#557
samzong merged 2 commits into
clawwork-ai:mainfrom
dpanbug:fix/search-min-query-length

dpanbug commented Jul 1, 2026

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

gemini-code-assist Bot commented Jul 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Uh oh!

dpanbug commented Jul 1, 2026

Uh oh!

dpanbug commented Jul 1, 2026

Uh oh!

gemini-code-assist Bot commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

dpanbug commented Jul 1, 2026

Problem

Fix

Question for maintainers (CJK)

Tests

Release note

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

gemini-code-assist Bot commented Jul 1, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Choose a reason for hiding this comment

🌍 Internationalization & CJK Search Usability Issues

💡 Solution

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Choose a reason for hiding this comment

🧪 Add Tests for Internationalization and CJK Queries

Uh oh!

dpanbug commented Jul 1, 2026

Uh oh!

dpanbug commented Jul 1, 2026

Uh oh!

gemini-code-assist Bot commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants