Skip to content

cachedb_redis: add ASK redirect handling for cluster resharding#3852

Open
dondetir wants to merge 1 commit intoOpenSIPS:masterfrom
dondetir:feature/redis-cluster-ask-topology
Open

cachedb_redis: add ASK redirect handling for cluster resharding#3852
dondetir wants to merge 1 commit intoOpenSIPS:masterfrom
dondetir:feature/redis-cluster-ask-topology

Conversation

@dondetir
Copy link
Copy Markdown
Contributor

@dondetir dondetir commented Mar 29, 2026

Summary

Add support for Redis ASK redirects during cluster resharding, completing the cluster redirect handling started in PR #3639.

Background

When a Redis cluster is resharding (migrating slots between nodes), the source node returns an ASK response instead of MOVED. The key difference:

  • MOVED = permanent redirect (slot ownership changed) — already handled by PR Add Redis MOVED Redirection support #3639
  • ASK = temporary redirect (slot is mid-migration) — requires sending ASKING to the target node before retrying

Without ASK handling, queries during resharding operations fail with unhandled error responses.

Implementation

  • Detects ASK responses alongside existing MOVED handling in the query retry loop
  • Sends ASKING command to the target node before retrying the original query
  • Does NOT update the slot table (correct per Redis spec — ASK is one-time, unlike MOVED)
  • Refactors parse_moved_reply() into parse_redirect_reply() that accepts the prefix as a parameter, with inline wrappers parse_moved_reply() and parse_ask_reply() for backward compatibility

Note on hash slot fix

The original version of this PR also included a hash slot calculation fix. That has been removed per @NormB's feedback, as his PR #3815 addresses this more comprehensively (including hash tag support). This PR now contains only the ASK redirect handling.

Testing

  • Builds clean, zero warnings
  • All 2545 unit tests pass
  • @NormB verified the parse_redirect_reply() refactor passes integration tests with live slot migrations across a 3-node cluster

Partially addresses #2811

@NormB
Copy link
Copy Markdown
Member

NormB commented Mar 29, 2026

This appears to be a partial duplicate of #3815

@dondetir
Copy link
Copy Markdown
Contributor Author

Thanks for pointing that out @NormB. You're right — the hash slot fix (& con->slots_assigned% REDIS_CLUSTER_SLOTS) overlaps with your PR #3815, which also adds hash tag support that mine doesn't cover.

The unique part of this PR is the ASK redirect handling (detection + ASKING command before retry), which completes the cluster redirect support alongside the MOVED handling you added in #3639.

Happy to rebase and drop the hash slot change once #3815 merges, keeping only the ASK redirect logic. Or if you'd prefer, I can split this into a separate ASK-only PR right now. Let me know what works best.

Add support for Redis ASK redirects during cluster resharding.  When
a slot is being migrated between nodes, Redis returns an ASK response
instead of MOVED.  Unlike MOVED (permanent redirect), ASK is a
one-time redirect that requires sending the ASKING command to the
target node before retrying the original query.

The implementation:
 - Detects ASK responses alongside existing MOVED handling
 - Sends ASKING command to the target node before retrying
 - Reuses the MOVED redirect infrastructure (endpoint lookup,
   reconnection, retry logic)

Also refactor parse_moved_reply() into parse_redirect_reply() that
accepts the prefix as a parameter, with inline wrappers
parse_moved_reply() and parse_ask_reply() for backward compatibility.

Partially addresses OpenSIPS#2811
@dondetir dondetir force-pushed the feature/redis-cluster-ask-topology branch from f3ee6c8 to fc8a736 Compare March 30, 2026 13:18
@dondetir dondetir changed the title cachedb_redis: add ASK redirect handling and fix hash slot calculation cachedb_redis: add ASK redirect handling for cluster resharding Mar 30, 2026
@dondetir
Copy link
Copy Markdown
Contributor Author

Rebased to remove the hash slot fix per @NormB's feedback — this PR now contains only the ASK redirect handling + parse_redirect_reply() refactor.

The hash slot calculation change is properly covered by #3815 (which also adds hash tag support).

Force-pushed with the updated commit.

NormB added a commit to NormB/opensips that referenced this pull request Apr 20, 2026
With use_tls=1 on cachedb_redis, every SIP worker that touches a
Redis/Valkey connection SIGSEGVs on the first SSL_new. Two
independent problems stacked on the same line:

1. The old code indexed d->ctx as a per-process array:
     SSL_new(((void**)d->ctx)[process_no])
   This predates the tls_mgm refactor (upstream 11aa15d,
   'refactor TCP/TLS code to have everything in a single process'),
   after which ctx became a single SSL_CTX *. Every worker with
   process_no > 0 read garbage past the SSL_CTX struct.

2. Even after fixing the dereference, d->ctx is not safely usable
   from sibling workers. tls_mgm builds its SSL_CTX in
   child_init(PROC_TCP_MAIN), which runs *after* SIP workers and
   the MI process have already forked. So those workers see
   d->ctx == NULL when their cachedb_redis:child_init ->
   cachedb_do_init -> redis_init_ssl path fires. Even waiting for
   TCP_MAIN doesn't help: the SSL_CTX was built post-fork in
   TCP_MAIN's address space, and OpenSSL embeds pointers into
   per-process storage inside the CTX (method tables, providers,
   OSSL_LIB_CTX).

Fix: each worker builds its own SSL_CTX from the static
configuration on the tls_domain (cert, pkey, ca, verify_cert).
CTXs are cached per-worker, keyed by tls_domain pointer, so
multiple connections sharing a domain share one CTX inside that
worker. Floors min proto at TLS 1.2 to match Valkey/Redis 7+
defaults and the tls_method=TLSv1_2 that most tls_mgm configs
already set.

Also tightens the adjacent error paths: SSL_free the SSL on
redisInitiateSSL failure, and promote the printf in that path to
LM_ERR for consistency.

Note: a companion parser fix for 'tls-port' in parse_cluster_shards
is still needed for TLS cluster topology updates to use the right
port on MOVED/ASK redirects, but depends on PR OpenSIPS#3852
(parse_cluster_shards introduction); it can land as a small
follow-up once that PR is merged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants