Skip to content

Fix bugs from code audit (server respawn, leaks, hardening)#4427

Draft
evanpelle wants to merge 1 commit into
mainfrom
fix/code-audit-bugs
Draft

Fix bugs from code audit (server respawn, leaks, hardening)#4427
evanpelle wants to merge 1 commit into
mainfrom
fix/code-audit-bugs

Conversation

@evanpelle

Copy link
Copy Markdown
Collaborator

Verified the findings from a code audit against the real code and fixed the legitimate, well-scoped ones. Each fix below was confirmed by reading the actual code (and, for the worker-respawn bug, by running an actual Node cluster experiment).

Fixes

Server

  • Crashed workers never respawned (Master.ts) — worker.process.env is undefined in the cluster exit handler, so WORKER_ID was always undefined → every crash logged "could not find id" and returned without respawning. Now tracked via a worker.id → WORKER_ID registry.
  • Abandoned private lobbies linger (GameServer.ts) — phase() ignored _hasEnded, so a host-left lobby stayed Lobby and wasn't GC'd until max duration. Returns Finished now.
  • websockets Set kept dead sockets (GameServer.ts) — never removed on close/rejoin; added deletes.
  • hasReachedMaxPlayerCount latched forever (GameServer.ts) — a leave-after-full committed a degenerate under-capacity start (notably ranked 1v1). Reset on leave when back under capacity.
  • No idle timeout for unauthenticated WebSockets (Worker.ts) — pre-join sockets are untracked by any GameServer → Slowloris-style FD exhaustion. Added a 30s join-reaper.
  • JWK public key cached forever (ServerEnv.ts) — IdP key rotation would break all logins until restart. Added a 1h TTL.

Core

  • Random-spawn tile bypass (SpawnExecution.ts) — a forged spawn intent set this.tile, which getSpawn(center) honored. Now ignored in random-spawn mode. Added tests/RandomSpawnNoOverride.test.ts.

Client

  • MultiTabDetector listener leakaddEventListener(..., x.bind(this)) then removeEventListener(..., x.bind(this)) use different refs; bind once.
  • GameRenderer window-resize leak — anonymous resize listener never removed; bind once + remove in a destroy() wired into disposeRenderer.
  • localStorage crash in sandboxed iframes (LocalPersistantStats.ts) — accessing localStorage throws in sandboxed portals (e.g. CrazyGames); the old === undefined guard threw too. Wrapped in try/catch.
  • Per-frame Float32Array alloc (StructurePass.draw()) — hoisted to reused fields (matches existing ghostBuf).

Known gap, intentionally not fixed here

/api/archive_singleplayer_game is unauthenticated and trusts the client-supplied persistentID. Documented inline with the recommended fix (require a Bearer JWT, stamp persistentID from the verified token). Left for a follow-up pending a product decision on logged-out singleplayer users.

Deliberately rejected as false positives

The audit's entire "floating-point divergence" determinism section (attack math, nuke atan2, bezier, A* tie-breakers, RNG seeds, Set/Map iteration) — JS IEEE-754 math is deterministic, the sim already relies on Math.pow/Math.exp every tick, and several proposed fixes would have changed deterministic outputs and broken replays. Also rejected: desync-relay (misreads the intent architecture), WebGL VAO "leak" (documented intentional — context is torn down), and a handful of not-reachable / already-bounded items.

Verification

  • tsc --noEmit
  • eslint
  • vitest1634/1634 tests pass (147 files), including the new test

Verified findings from a code audit and fixed the legitimate, well-scoped
ones. Notable: crashed cluster workers were never respawned because
worker.process.env is undefined in the exit handler, so WORKER_ID was always
undefined (confirmed empirically).

Fixes:
- Master: respawn crashed workers via a worker.id -> WORKER_ID registry
  instead of the always-undefined worker.process.env.WORKER_ID.
- GameServer: report Finished when _hasEnded so abandoned private lobbies are
  cleaned up instead of lingering until max duration; remove sockets from the
  websockets Set on close/rejoin; reset hasReachedMaxPlayerCount when a player
  leaves before start (avoids premature under-capacity starts, e.g. ranked 1v1).
- Worker: reap WebSocket connections that upgrade but never authenticate
  (30s join timeout) to prevent Slowloris-style FD exhaustion.
- ServerEnv: add a 1h TTL to the cached JWKS public key so IdP key rotation
  doesn't break all logins until a restart.
- SpawnExecution: ignore a client-supplied tile in random-spawn mode so a
  forged spawn intent can't pick its own coordinates (+ test).
- GameRenderer / MultiTabDetector: remove window listeners on teardown
  (bind once so add/removeEventListener references match).
- LocalPersistantStats: guard localStorage access with try/catch so sandboxed
  iframes / disabled-storage contexts don't crash game start.
- StructurePass: reuse per-frame uniform arrays instead of allocating each draw.

Also documents a known, intentionally-unfixed auth gap on
/api/archive_singleplayer_game (pending a product decision on logged-out users).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

Multiple focused bug fixes and hardening patches: client renderer teardown now calls gameRenderer.destroy() and removes the resize listener; MultiTabDetector stores bound handler references for correct listener removal; LocalPersistantStats wraps all localStorage access in try/catch; StructurePass preallocates WebGL uniform arrays; SpawnExecution ignores client-supplied tile in random-spawn mode; GameServer fixes stale socket tracking and lobby latch reset; Master uses a cluster-id registry for worker tracking; ServerEnv adds JWKS TTL; Worker adds a 30-second unauthenticated socket reaper.

Changes

Client Resource Cleanup & Resilience

Layer / File(s) Summary
GameRenderer and ClientGameRunner teardown
src/client/hud/GameRenderer.ts, src/client/ClientGameRunner.ts, src/client/render/gl/passes/StructurePass.ts
GameRenderer stores a bound onResize and exposes destroy() to remove it; ClientGameRunner.disposeRenderer calls gameRenderer.destroy(); StructurePass preallocates shapeScales and iconFills arrays instead of allocating per frame.
MultiTabDetector and LocalPersistantStats resilience
src/client/MultiTabDetector.ts, src/client/LocalPersistantStats.ts
MultiTabDetector stores bound handler fields so stopMonitoring() removes the correct functions; LocalPersistantStats wraps all localStorage reads and writes in try/catch, removing the localStorage === undefined guards.

Server Security & Lifecycle Hardening

Layer / File(s) Summary
Random-spawn client override prevention
src/core/execution/SpawnExecution.ts, tests/RandomSpawnNoOverride.test.ts
SpawnExecution.tick passes undefined to getSpawn in random-spawn mode, ignoring client-supplied tile; tests confirm non-random mode uses the injected tile and random mode produces a deterministic, client-independent spawn tile.
GameServer WebSocket and lobby lifecycle
src/server/GameServer.ts
rejoinClient removes the old WebSocket from websockets; the close handler deletes the socket and resets hasReachedMaxPlayerCount when the lobby drops below maxPlayers; phase() returns Finished immediately when _hasEnded is true.
Master worker cluster-id registry
src/server/Master.ts
Adds a workerIdByClusterId Map to track logical WORKER_ID per cluster worker, replacing env-based parsing in crash and restart handlers with registry lookups.
JWKS TTL and WebSocket join timeout
src/server/ServerEnv.ts, src/server/Worker.ts
ServerEnv expires the cached JWKS key after 1 hour via PUBLIC_KEY_TTL_MS; Worker adds a 30-second reaper that closes sockets that never complete authentication, cleared on successful token verification.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • openfrontio/OpenFrontIO#4295: Refactors ClientGameRunner teardown to call disposeRenderer, directly related to the new gameRenderer.destroy() call added in this PR.

Suggested reviewers

  • Celant

Poem

🧹 Listeners removed, arrays preallocated,
Sockets that loiter get firmly terminated.
Random spawn tiles stay server's own choice,
No sneaky client can override the voice.
TTLs expire, the lobby unlocks,
Clean teardown at last — no more memory knocks! 🔒

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title matches the main changes: audit-driven bug fixes across server, client, and core code.
Description check ✅ Passed The description is clearly aligned with the changeset and summarizes the implemented fixes accurately.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/server/ServerEnv.ts (1)

96-119: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Keep using the old key when refetch fails, so an IdP outage does not break all logins.

Before this change, once the key was cached it worked forever. Now, after the 1 hour TTL passes, every call goes back to fetch. If the JWKS endpoint is briefly down at that moment, this method throws, verifyClientToken returns an error, and every join/rejoin gets ws.close(1002) — even for players whose tokens the old cached key could still verify.

The TTL goal (pick up key rotation) and resilience are both possible: try to refresh, but fall back to the still-cached key if the refresh fails.

🛡️ Fall back to the cached key on refresh failure
   static async jwkPublicKey(): Promise<JWK> {
     if (
       ServerEnv.publicKey &&
       Date.now() - ServerEnv.publicKeyFetchedAt < ServerEnv.PUBLIC_KEY_TTL_MS
     ) {
       return ServerEnv.publicKey;
     }
-    const jwksUrl = ServerEnv.jwtIssuer() + "/.well-known/jwks.json";
-    console.log(`Fetching JWKS from ${jwksUrl}`);
-    const response = await fetch(jwksUrl);
-    if (!response.ok) {
-      const body = await response.text();
-      throw new Error(`JWKS fetch failed: ${response.status} ${body}`);
-    }
-    const result = JwksSchema.safeParse(await response.json());
-    if (!result.success) {
-      const error = z.prettifyError(result.error);
-      console.error("Error parsing JWKS", error);
-      throw new Error("Invalid JWKS");
-    }
-    ServerEnv.publicKey = result.data.keys[0];
-    ServerEnv.publicKeyFetchedAt = Date.now();
-    return ServerEnv.publicKey;
+    const jwksUrl = ServerEnv.jwtIssuer() + "/.well-known/jwks.json";
+    console.log(`Fetching JWKS from ${jwksUrl}`);
+    try {
+      const response = await fetch(jwksUrl);
+      if (!response.ok) {
+        const body = await response.text();
+        throw new Error(`JWKS fetch failed: ${response.status} ${body}`);
+      }
+      const result = JwksSchema.safeParse(await response.json());
+      if (!result.success) {
+        const error = z.prettifyError(result.error);
+        console.error("Error parsing JWKS", error);
+        throw new Error("Invalid JWKS");
+      }
+      ServerEnv.publicKey = result.data.keys[0];
+      ServerEnv.publicKeyFetchedAt = Date.now();
+      return ServerEnv.publicKey;
+    } catch (e) {
+      // Refresh failed. Keep serving the stale key if we have one so a
+      // transient IdP outage does not break all token verification.
+      if (ServerEnv.publicKey) {
+        console.error("JWKS refresh failed, using stale cached key", e);
+        return ServerEnv.publicKey;
+      }
+      throw e;
+    }
   }

One small follow-on: when the TTL has passed and many sockets connect at once, each call runs its own fetch (a small thundering-herd). If that ever matters, store the in-flight Promise<JWK> and share it. Optional, not blocking.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/server/ServerEnv.ts` around lines 96 - 119, The jwkPublicKey() refresh
path currently throws on JWKS fetch/parsing failures even when a valid
ServerEnv.publicKey is still cached. Update ServerEnv.jwkPublicKey() to try
refreshing after the TTL, but if fetch or parsing fails, fall back to the
existing cached key instead of failing token verification. Keep the existing
cache update behavior when refresh succeeds, and use the cached key only when it
is present so verifyClientToken can continue working during IdP outages.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/server/GameServer.ts`:
- Around line 729-733: The race-path cleanup in GameServer leaves a disconnected
client’s socket in this.websockets even after removing it from activeClients,
which can leave dead entries behind. Update the same cleanup branch in
GameServer’s end/addListeners flow to also delete client.ws from
this.websockets, mirroring the normal "close" handler so abandoned sockets are
fully removed.

---

Outside diff comments:
In `@src/server/ServerEnv.ts`:
- Around line 96-119: The jwkPublicKey() refresh path currently throws on JWKS
fetch/parsing failures even when a valid ServerEnv.publicKey is still cached.
Update ServerEnv.jwkPublicKey() to try refreshing after the TTL, but if fetch or
parsing fails, fall back to the existing cached key instead of failing token
verification. Keep the existing cache update behavior when refresh succeeds, and
use the cached key only when it is present so verifyClientToken can continue
working during IdP outages.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 596d489d-e5db-4cdd-8b5d-acd2a5575e31

📥 Commits

Reviewing files that changed from the base of the PR and between 95171fd and 6e23c26.

📒 Files selected for processing (11)
  • src/client/ClientGameRunner.ts
  • src/client/LocalPersistantStats.ts
  • src/client/MultiTabDetector.ts
  • src/client/hud/GameRenderer.ts
  • src/client/render/gl/passes/StructurePass.ts
  • src/core/execution/SpawnExecution.ts
  • src/server/GameServer.ts
  • src/server/Master.ts
  • src/server/ServerEnv.ts
  • src/server/Worker.ts
  • tests/RandomSpawnNoOverride.test.ts

Comment thread src/server/GameServer.ts
Comment on lines +729 to +733
if (
this.activeClients.length < (this.gameConfig.maxPlayers ?? Infinity)
) {
this.hasReachedMaxPlayerCount = false;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Mirror the websocket-set cleanup in this race path.

Line 723 removes the client from activeClients, but this branch still leaves client.ws inside this.websockets. If the socket was already closed before addListeners() ran, end() keeps iterating a dead entry and the set can grow with abandoned sockets. Add the same delete used in the normal "close" handler.

Proposed fix
     if (client.ws.readyState >= 2) {
       this.log.info("client WebSocket already closing/closed, removing", {
         clientID: client.clientID,
         readyState: client.ws.readyState,
       });
+      this.websockets.delete(client.ws);
       this.activeClients = this.activeClients.filter(
         (c) => c.clientID !== client.clientID,
       );
       // Remove persistentId if the game has not started to prevent going over max players
       if (!this._hasStarted) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (
this.activeClients.length < (this.gameConfig.maxPlayers ?? Infinity)
) {
this.hasReachedMaxPlayerCount = false;
}
if (client.ws.readyState >= 2) {
this.log.info("client WebSocket already closing/closed, removing", {
clientID: client.clientID,
readyState: client.ws.readyState,
});
this.websockets.delete(client.ws);
this.activeClients = this.activeClients.filter(
(c) => c.clientID !== client.clientID,
);
// Remove persistentId if the game has not started to prevent going over max players
if (!this._hasStarted) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/server/GameServer.ts` around lines 729 - 733, The race-path cleanup in
GameServer leaves a disconnected client’s socket in this.websockets even after
removing it from activeClients, which can leave dead entries behind. Update the
same cleanup branch in GameServer’s end/addListeners flow to also delete
client.ws from this.websockets, mirroring the normal "close" handler so
abandoned sockets are fully removed.

@github-project-automation github-project-automation Bot moved this from Triage to Development in OpenFront Release Management Jun 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Development

Development

Successfully merging this pull request may close these issues.

1 participant