Skip to content

fix: support non-Latin text in InMemoryMemoryService search#5504

Open
weiguangli-io wants to merge 2 commits intogoogle:mainfrom
weiguangli-io:fix/memory-unicode-search-5501
Open

fix: support non-Latin text in InMemoryMemoryService search#5504
weiguangli-io wants to merge 2 commits intogoogle:mainfrom
weiguangli-io:fix/memory-unicode-search-5501

Conversation

@weiguangli-io
Copy link
Copy Markdown
Contributor

Fixes #5501

Root Cause

_extract_words_lower uses re.findall(r'[A-Za-z]+', text) which only matches ASCII letters. All non-Latin characters (Japanese, Chinese, Korean, Cyrillic, etc.) are silently discarded, making search_memory unable to match any non-Latin text.

Fix

Change the regex from [A-Za-z]+ to \w+ with re.UNICODE flag, which matches all Unicode word characters (letters, digits, underscore) across all scripts.

Fixes google#5501. `_extract_words_lower` used `[A-Za-z]+` regex which
only matched ASCII letters, silently discarding Japanese, Chinese,
Korean, Cyrillic and other non-Latin characters.

Change to `\w+` with `re.UNICODE` to match all Unicode word characters.
@adk-bot adk-bot added the services [Component] This issue is related to runtime services, e.g. sessions, memory, artifacts, etc label Apr 27, 2026
@adk-bot
Copy link
Copy Markdown
Collaborator

adk-bot commented Apr 27, 2026

Response from ADK Triaging Agent

Hello @weiguangli-io, thank you for your contribution!

Could you please add a testing plan section to your PR description to detail how you've tested these changes? This will help the reviewers to better understand and verify the fix.

You can find more information about the testing requirements in our contribution guidelines. Thanks!

@rohityan rohityan self-assigned this Apr 27, 2026
@rohityan
Copy link
Copy Markdown
Collaborator

Hi @weiguangli-io , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Your PR has been received by the team and is currently under review. We will provide feedback as soon as we have an update to share.

@rohityan rohityan added the needs review [Status] The PR/issue is awaiting review from the maintainer label Apr 27, 2026
@rohityan
Copy link
Copy Markdown
Collaborator

Hi @sasha-gitg , can you please review this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs review [Status] The PR/issue is awaiting review from the maintainer services [Component] This issue is related to runtime services, e.g. sessions, memory, artifacts, etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

InMemoryMemoryService search doesn't work with non-Latin text (Japanese, CJK, etc.)

3 participants