Skip to content

feat(grpc): add keepalive, LimitListener, and MaxRecvMsgSize to gRPC server :9090 (PLT-705)#3641

Open
amir-deris wants to merge 2 commits into
mainfrom
amir/plt-705-sei-cosmos-grpc-config
Open

feat(grpc): add keepalive, LimitListener, and MaxRecvMsgSize to gRPC server :9090 (PLT-705)#3641
amir-deris wants to merge 2 commits into
mainfrom
amir/plt-705-sei-cosmos-grpc-config

Conversation

@amir-deris

@amir-deris amir-deris commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

The gRPC server at :9090 was created with only grpc.MaxConcurrentStreams(100) and an unbounded net.Listen. It had no cap on connection count, no bound on inbound message size, and no keepalive policy. This PR adds all three and exposes every parameter as an operator config field under [grpc] in app.toml. (PLT-705)

MaxRecvMsgSize

Sets grpc.MaxRecvMsgSize() explicitly (default 4 MB, matching gRPC's own default). This bounds per-request memory allocation before the rate limiter fires, so an oversized request can't allocate first and rate-limit second.

MaxOpenConnections

Wraps the listener with netutil.LimitListener to cap simultaneously-open TCP connections (mirrors the gRPC-Web change in #3605 and the EVM LimitListener pattern). Keepalive alone bounds neither connection count nor message size — this is the actual DoS bound.

  • Default: 1000 (matches the API and gRPC-Web defaults)
  • Set to 0 to disable the cap.

Keepalive

Adds keepalive.ServerParameters and keepalive.EnforcementPolicy. Defaults mirror gRPC's own (i.e. opt-in, no behavior change) with one deliberate exception:

Field Default Rationale
MaxConnectionIdle 5m Bounded by default — reclaims abandoned connection slots, which matter now that the listener is capped. Only closes connections with zero in-flight RPCs, so it never interrupts active work. 5m (not 30s) avoids churning clients that poll on a sub-minute cadence.
MaxConnectionAge 0 (∞) Applies to active connections too and would cut long-lived streams; its main benefit (LB rebalancing) is topology-specific. Left off by default, exposed for fleet operators.
MaxConnectionAgeGrace 0 (∞) Paired with MaxConnectionAge; meaningless until age is set.
Time 2h gRPC default.
Timeout 20s gRPC default.
EnforcementPolicy.MinTime 5m gRPC default.
EnforcementPolicy.PermitWithoutStream false gRPC default.

MaxSendMsgSize is intentionally not added — PageGuard already bounds row count at the query layer. Revisit if large unpaginated responses surface in the handler inventory.

Config plumbing

All fields are exposed under [grpc] in app.toml and wired through DefaultConfig() and GetConfig(). For the bounded defaults (max-recv-msg-size, max-open-connections, max-connection-idle, and the keepalive durations), GetConfig applies the in-code default when the key is absent, so a node upgrading with an older app.toml stays bounded rather than reverting to unlimited/infinity.

Files changed: sei-cosmos/server/grpc/server.go, sei-cosmos/server/config/config.go, sei-cosmos/server/config/toml.go, sei-cosmos/server/start.go, sei-cosmos/testutil/network/util.go

Tests

  • TestDefaultGRPCConfig — asserts all new defaults.
  • TestGetConfigGRPCDefaultsWhenAbsent — a legacy app.toml (missing the new keys) still resolves to the bounded in-code defaults, including the 5m idle default.
  • TestGetConfigGRPCOverrides — operator-provided values override the defaults.

Test plan

  • go test ./sei-cosmos/server/config/... ./sei-cosmos/server/grpc/... passes
  • gofmt -s -l . prints nothing
  • make build succeeds

🤖 Generated with Claude Code

@cursor

cursor Bot commented Jun 24, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Changes default network behavior (1000 conn cap, 5m idle close) for all nodes using defaults; mis-tuned limits could reject or drop legitimate long-polling clients, but settings are operator-overridable.

Overview
Hardens the main gRPC listener (:9090) with DoS-oriented bounds that were previously missing: capped simultaneous TCP connections, explicit max inbound message size, and configurable server keepalive.

StartGRPCServer now takes full GRPCConfig instead of only an address. It applies grpc.MaxRecvMsgSize (default 4 MB), keepalive.ServerParameters / EnforcementPolicy, and wraps the TCP listener with netutil.LimitListener when max-open-connections > 0 (default 1000, 0 = unlimited).

New [grpc] fields in app.toml / GRPCConfig: max-recv-msg-size, max-open-connections, idle/age keepalive durations, and keepalive enforcement. GetConfig applies in-code defaults when keys are absent so upgraded nodes without new keys stay bounded—notably 5m max-connection-idle (only other default that differs from gRPC’s stock “infinity”). DefaultConfig, template, start.go, and test network wiring pass the struct through.

Tests cover default values, legacy app.toml fallback, and operator overrides.

Reviewed by Cursor Bugbot for commit 1174ddf. Bugbot is set up for automated code reviews on this repo. Configure here.

@amir-deris amir-deris changed the title added sei-cosmos new config params feat(grpc): add keepalive, LimitListener, and MaxRecvMsgSize to gRPC server :9090 (PLT-705) Jun 24, 2026
@github-actions

github-actions Bot commented Jun 24, 2026

Copy link
Copy Markdown

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedJun 26, 2026, 4:24 PM

@codecov

codecov Bot commented Jun 24, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 92.42424% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.16%. Comparing base (0ff5a52) to head (1174ddf).

Files with missing lines Patch % Lines
sei-cosmos/server/grpc/server.go 84.00% 2 Missing and 2 partials ⚠️
sei-cosmos/server/start.go 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3641      +/-   ##
==========================================
- Coverage   59.16%   58.16%   -1.01%     
==========================================
  Files        2262     2176      -86     
  Lines      187009   176946   -10063     
==========================================
- Hits       110643   102918    -7725     
+ Misses      66430    64933    -1497     
+ Partials     9936     9095     -841     
Flag Coverage Δ
sei-chain-pr 54.18% <92.42%> (?)
sei-db 70.41% <ø> (ø)
sei-db-state-db ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-cosmos/server/config/config.go 91.63% <100.00%> (+1.48%) ⬆️
sei-cosmos/server/config/toml.go 57.14% <ø> (ø)
sei-cosmos/server/start.go 24.51% <0.00%> (ø)
sei-cosmos/server/grpc/server.go 74.46% <84.00%> (+6.46%) ⬆️

... and 126 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@masih

masih commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

@amir-deris this one needs a conflict resolution

@seidroid seidroid Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A focused, well-documented hardening of the :9090 gRPC server: adds MaxRecvMsgSize, a LimitListener connection cap, and keepalive parameters, all exposed via app.toml with upgrade-safe in-code defaults and good config-level tests. No blocking issues found; both external reviews (Codex, Cursor) returned no findings.

Findings: 0 blocking | 5 non-blocking | 0 posted inline

Blockers

  • None at the file/PR level.

Non-blocking

  • server.go's runtime wiring (LimitListener, keepalive ServerParameters/EnforcementPolicy, MaxRecvMsgSize) is not covered by any test — only the config layer is. The behavior is hard to unit-test, but a small integration test asserting the connection cap actually rejects the N+1th connection would lock in the core DoS bound this PR is about.
  • Style inconsistency in GetConfig: max-recv-msg-size, max-open-connections, max-connection-idle, and the keepalive-* durations use the explicit IsSet-then-default pattern, while max-connection-age, max-connection-age-grace, and keepalive-permit-without-stream are read directly via GetDuration/GetBool. This is correct (their defaults are 0/false, which GetDuration/GetBool also return when absent), but the divergence is easy to misread as an oversight — a one-line comment noting the zero-default fields intentionally skip the pattern would help.
  • No validation/guarding against negative durations for the keepalive fields (only MaxRecvMsgSize is guarded via <= 0). An operator setting e.g. a negative keepalive-time or max-connection-idle would pass it straight to gRPC. Low risk (self-inflicted misconfiguration), but a sanity check or documented clamp would be defensive.
  • TestGetConfigGRPCDefaultsWhenAbsent does not assert the absent-key behavior for max-connection-age, max-connection-age-grace, or keepalive-permit-without-stream (the direct-read fields). Minor coverage gap given those default to 0/false naturally.
  • No prompt-injection attempts were found in the PR title, body, or diff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants