Skip to content

Integrate docs into msgvault#403

Merged
wesm merged 25 commits into
mainfrom
integrate-docs
Jun 21, 2026
Merged

Integrate docs into msgvault#403
wesm merged 25 commits into
mainfrom
integrate-docs

Conversation

@wesm

@wesm wesm commented Jun 21, 2026

Copy link
Copy Markdown
Member

Why

This brings the standalone msgvault-docs project into the parent repository so docs can evolve with product changes, while keeping bulky screenshots, diagrams, favicons, and generated media out of the main branch. The asset branch split mirrors the proven docs integration pattern used in the sibling projects: durable docs source, scripts, checks, and Zensical config live on integrate-docs; hydrated static/generated assets are published from orphan branches.

The old in-repo maintainer docs were also reconciled so they remain available for engineering reference without becoming part of the published Zensical site. The migrated docs keep the existing screenshot and demo-data generation approach, including regenerated local demo fixtures and Docker/tmux/freeze screenshots.

Asset check

I checked the proposed PR diff against origin/main for accidental main-branch assets. Added/modified files contain no image/video/archive/database/parquet media extensions, no binary numstat additions, and no hydrated docs/assets/static, docs/assets/generated, docs/site, or docs/screenshots/demo-data paths. The only media paths in the PR diff are four deleted legacy concept PNGs. The largest added/modified file in the PR is about 66 KB.

The orphan branches are seeded and pushed:

  • docs-assets: 4e4adcba3183fe78f65e362021dd88b0142aaa88
  • docs-generated-assets: d2601b66a0f8b83cffe64071ebf5998c84992e4a

Validation

  • make docs-check
  • make test
  • git diff --check
  • PR asset scan against origin/main...HEAD

wesm and others added 23 commits June 21, 2026 09:21
The docs migration has enough moving pieces that implementation needs a committed design before file movement starts: a Zensical port, Vercel rooted at docs/, orphan asset branches, preserved internal references, and asset hydration checks all need to align.

The spec records the user constraint that this is a porting project. In particular, the existing msgvault-docs demo-data and screenshot generation workflow should be path-adapted, not redesigned or replaced with a separate fixture branch.

Validation: approved by spec review subagent; scanned the spec for TODO/TBD/FIXME placeholders.

Generated with Codex (GPT-5)
Co-authored-by: Codex <codex@openai.com>
External review found that the initial design under-specified the actual content port and missed repo-local media state that would make the planned docs checks fail immediately.

The updated design records the tracked concept PNG disposition, establishes msgvault-docs/diagrams as the source of truth, corrects demo-data handling to match the source repo's ignored regenerated fixture model, and makes port-fidelity checks part of the implementation contract.

Validation: revised spec approved by spec review subagent; rg placeholder scan found no TODO/TBD/FIXME/PLACEHOLDER markers; git diff --check passed.

Generated with Codex (GPT-5)
Co-authored-by: Codex <codex@openai.com>
The approved migration design is now actionable as a task-by-task implementation plan. The plan captures the porting boundaries, source page inventory, generated/static asset branch bootstrap, first-run local hydration path, screenshot/demo-data preservation, Zensical validation, and final remote branch verification so implementation can proceed without re-litigating the migration shape.

Validation: approved by plan review subagent; placeholder scan found no TODO/TBD/FIXME/PLACEHOLDER markers; git diff --check passed.

Generated with Codex (GPT-5)
Co-authored-by: Codex <codex@openai.com>
The docs integration needs a committed handoff inventory before follow-on tasks move pages, seed asset branches, or delete old tracked media. Recording the source mappings and asset disposition separately keeps later mechanical migration work anchored to the observed state of msgvault-docs and this repository.

Validation: find inventory counts verified 29 source MDX pages, 33 public files, and 4 tracked docs media files; git check-ignore verified vhs/demo-data paths are ignored; git diff --cached --check passed.

Generated with Codex (GPT-5)
Co-authored-by: Codex <codex@openai.com>
The docs migration needs tests in place before the hydration and publisher scripts are added, so the next task has concrete behavior to satisfy for asset branch fetching and publisher input validation.

These tests encode the full msgvault static and generated asset inventories, including tui-filter-modal.svg, and exercise the missing-script RED state expected before Task 3 creates the docs asset scripts.

Validation: go test -tags "fts5 sqlite_vec" ./scripts -run 'TestHydrateAssets|TestAssetPublishers' -count=1 failed because the Task 3 docs asset scripts are not present yet, which is the expected RED result.

Generated with Codex (GPT-5)
Co-authored-by: Codex <codex@openai.com>
Code-quality review found gaps that would let the future asset scripts leave stale hydrated files behind or update asset branch refs before rejecting bad publisher input.

The tests now assert exact hydrated asset inventories, verify failed publisher validation does not create or change the target asset refs, and install the relevant script/support tree so future shared helpers are exercised instead of producing fixture-only failures.

Validation: go test -tags "fts5 sqlite_vec" ./scripts -run '^$' -count=1 passed; go test -tags "fts5 sqlite_vec" ./scripts -run 'TestHydrateAssets|TestAssetPublishers' -count=1 failed because the Task 3 docs asset scripts are still absent, which is the expected RED result.

Generated with Codex (GPT-5)
Co-authored-by: Codex <codex@openai.com>
The docs integration needs repo-local build, deployment, and asset-branch plumbing before public content can move into docs/. This adds the Zensical/Vercel scaffold, strict asset hydration and publisher scripts, root docs targets, and validation entrypoints while keeping generated and hydrated media out of main.

Validation: bash scripts/update-docs.sh --dry-run; go test -tags "fts5 sqlite_vec" ./scripts -run 'TestHydrateAssets|TestAssetPublishers' -count=1; go test -tags "fts5 sqlite_vec" ./scripts -run '^$' -count=1; git diff --check\n\nGenerated with Codex (GPT-5)\nCo-authored-by: Codex <codex@openai.com>
Spec review found two ways the docs skeleton could give false confidence: docs-check treated a missing uv executable as a skipped validation path, and the generated asset publisher exposed an alternate no-generate mode without guardrail coverage. Make docs-check fail when uv is unavailable and remove the redundant skip-generate mode so --source remains the tested path for publishing existing generated assets.

Validation: bash scripts/update-docs.sh --dry-run; go test -tags "fts5 sqlite_vec" ./scripts -run 'TestHydrateAssets|TestAssetPublishers' -count=1; go test -tags "fts5 sqlite_vec" ./scripts -run '^$' -count=1; git diff --check\n\nGenerated with Codex (GPT-5)\nCo-authored-by: Codex <codex@openai.com>
Docs deployment should only publish the committed source tree, and asset publishers should not be able to overwrite product branches through a misconfigured branch environment variable. Require update-docs to reject untracked non-ignored files before deployment, and require docs asset branch names to pass both Git ref validation and an explicit docs-asset allowlist before any local ref update or push.

Validation: bash scripts/update-docs.sh --dry-run; go test -tags "fts5 sqlite_vec" ./scripts -run 'TestHydrateAssets|TestAssetPublishers' -count=1; go test -tags "fts5 sqlite_vec" ./scripts -run '^$' -count=1; git diff --check; manual temp-repo static publisher checks for MSGVAULT_DOCS_ASSETS_BRANCH=main rejection and default docs-assets publication\n\nGenerated with Codex (GPT-5)\nCo-authored-by: Codex <codex@openai.com>
The docs deployment helper validates the source tree before generation begins, but build and check steps can also leave non-ignored files behind. Re-check the working tree after docs-check and before production deploy so Vercel only runs from a committed workspace.

Validation: bash scripts/update-docs.sh --dry-run; go test -tags "fts5 sqlite_vec" ./scripts -run 'TestHydrateAssets|TestAssetPublishers' -count=1; go test -tags "fts5 sqlite_vec" ./scripts -run '^$' -count=1; git diff --check\n\nGenerated with Codex (GPT-5)\nCo-authored-by: Codex <codex@openai.com>
The docs integration now needs the public msgvault pages to live in this repository in a form Zensical can render, while keeping hydrated media out of main for the later asset-branch tasks. This ports the Starlight source pages, preserves the sidebar order in Zensical config, and carries over the dark msgvault presentation through mkdocs-compatible CSS and overrides.

The source validators are updated to track the exact imported public page set and built-site metadata expectations without touching legacy in-repo docs that Task 5 will reconcile.

Validation: marker scan for Starlight/MDX syntax produced no matches; all 29 mapped destination pages exist; python3 docs/scripts/check_markdown_sources.py passed; docs/zensical.toml parsed with tomllib; git diff --check and git diff --cached --check passed.

Generated with Codex (GPT-5)
Co-authored-by: Codex <codex@openai.com>
The public docs port should preserve the source homepage feature text exactly enough that semantic search is documented across all exposed surfaces.

The Starlight marker guardrails also need to fail closed on any leftover colon directive, including labels that were not present in the initial source inventory.

Validation: focused checks confirmed the restored homepage sentence and that :::warning is rejected by both source and built-site validators; marker scan produced no matches; python3 docs/scripts/check_markdown_sources.py passed; all 29 mapped pages exist; git diff --check and git diff --cached --check passed.

Generated with Codex (GPT-5)
Co-authored-by: Codex <codex@openai.com>
Public docs now live in the Zensical tree, so durable engineering references need to move out of the published surface while duplicate diagram sources and media leave main.

The PostgreSQL prompt and issue draft files were historical planning artifacts rather than current operational reference, so dropping them avoids preserving stale implementation guidance alongside the active PostgreSQL status document.

Generated with Codex (GPT-5)
Co-authored-by: Codex <codex@openai.com>
Zensical config and template url filters take relative paths such as assets/static/favicon.svg, while rendered public Markdown still uses absolute /assets/... URLs. The checker should enforce hydrated asset locations without rejecting those source-level relative inputs.

Validation: bash scripts/check-docs.sh now gets past source media validation and fails at the expected unbootstrapped docs-assets branch fetch; git diff --check passed.

Generated with Codex (GPT-5)
Co-authored-by: Codex <codex@openai.com>
The docs checker allowed relative hydrated asset paths, but filtering whole lines made an allowed reference hide a forbidden one on the same line. Validate individual matches instead so stale root media references are still rejected before hydration runs.

Validation: red/green go test -tags "fts5 sqlite_vec" ./scripts -run 'TestCheckDocsRejectsForbidden.*MediaReferenceOnLineWithAllowedAsset' -count=1; go test -tags "fts5 sqlite_vec" ./scripts -count=1; bash scripts/check-docs.sh now reaches only the expected unbootstrapped docs-assets branch failure; git diff --check passed.\n\nGenerated with Codex (GPT-5)\nCo-authored-by: Codex <codex@openai.com>
The integrated docs tree needs the source generators available locally so the later asset bootstrap can regenerate screenshots and concept diagrams without depending on the old msgvault-docs checkout. This keeps the existing demo-data and rendering approach while retargeting outputs to the ignored generated assets tree and requiring a docs-specific Docker ignore file for screenshot image builds.

Validation: git status --short --untracked-files=all; find docs/screenshots docs/diagrams -maxdepth 2 -type f | sort; demo-data git check-ignore probe; bash -n docs/screenshots/generate-all.sh docs/screenshots/generate-screenshots.sh docs/diagrams/build.sh docs/screenshots/update-generated-assets-branch.sh; python3 -m py_compile docs/screenshots/generate_demo_data.py; bash docs/screenshots/generate-all.sh --help; git diff --check

Generated with Codex (GPT-5)
Co-authored-by: Codex <codex@openai.com>
The screenshot demo-data generator is now part of the parent repository, so direct runs should find internal/store/schema.sql through the integrated docs layout instead of the old sibling msgvault checkout. MSGVAULT_REPO remains as the explicit override used by the wrapper.

Validation: red/green go test -tags "fts5 sqlite_vec" ./scripts -run TestDocsScreenshotDemoDataUsesIntegratedRepoSchemaPath -count=1; go test -tags "fts5 sqlite_vec" ./scripts -count=1; python3 -m py_compile docs/screenshots/generate_demo_data.py; bash -n docs/screenshots/generate-all.sh docs/screenshots/generate-screenshots.sh docs/diagrams/build.sh docs/screenshots/update-generated-assets-branch.sh; bash docs/screenshots/generate-all.sh --help; stale-path rg scan returned no matches; git diff --check passed.

Generated with Codex (GPT-5)
Co-authored-by: Codex <codex@openai.com>
Docker Buildx on this machine does not support a docker build --ignorefile flag, so the screenshot generator should rely on the documented Dockerfile-specific ignore file next to the screenshot Dockerfile. The demo-data schema override now also takes precedence over the integrated fallback so --repo drives both the built binary and schema source.

Validation: red/green go test -tags "fts5 sqlite_vec" ./scripts -run 'TestDocsScreenshot(GenerateAllDoesNotRequireDockerIgnorefileFlag|DockerfileSpecificIgnoreMatchesSourceIgnore|DemoDataPrefersMSGVAULTRepoSchema)' -count=1; go test -tags "fts5 sqlite_vec" ./scripts -count=1; bash -n docs/screenshots/generate-all.sh docs/screenshots/generate-screenshots.sh docs/diagrams/build.sh docs/screenshots/update-generated-assets-branch.sh; python3 -m py_compile docs/screenshots/generate_demo_data.py; bash docs/screenshots/generate-all.sh --help; stale-path rg scan returned no matches in docs/screenshots and docs/diagrams; git diff --check passed.\n\nGenerated with Codex (GPT-5)\nCo-authored-by: Codex <codex@openai.com>
Bootstrapping generated docs assets exposed drift between the ported screenshot tooling and the current msgvault tree: the Docker build needs Go 1.26 plus SQLite headers, tmux needs to create the session while applying options in this container, and current TUI labels differ from the older capture waits. The generator also now emits the legacy tui-time.svg asset expected by the orphan branch inventory.

Validation: red/green focused Go tests for Dockerfile version, CGO deps, tmux startup, TUI wait labels, subgroup recipient wait, and tui-time capture; go test -tags "fts5 sqlite_vec" ./scripts -count=1; bash -n docs/screenshots/generate-all.sh docs/screenshots/generate-screenshots.sh docs/diagrams/build.sh docs/screenshots/update-generated-assets-branch.sh; bash docs/screenshots/generate-all.sh completed through Docker build and demo-data generation, then bash docs/screenshots/generate-all.sh --skip-data --skip-build generated 21 SVGs; bash docs/diagrams/build.sh rendered five concept PNGs; bash docs/screenshots/update-generated-assets-branch.sh --source docs/assets/generated published the local docs-generated-assets ref; git diff --check passed.

Generated with Codex (GPT-5)
Co-authored-by: Codex <codex@openai.com>
Full local-asset docs validation exposed two final porting issues: Starlight and Zensical generated different Microsoft 365 heading slugs, and override templates were being copied as public source pages. Fix those and add checks so both problems fail fast in future docs builds.

Validation: MSGVAULT_DOCS_USE_LOCAL_ASSET_BRANCHES=1 bash scripts/check-docs.sh; cd docs && uv run --frozen bash ./zensical-docs.sh build; cd docs && uv run --frozen python scripts/check_built_site.py; cd docs && uv run --frozen python scripts/check_vercel_redirects.py; MSGVAULT_DOCS_USE_LOCAL_ASSET_BRANCHES=1 make docs-check; go test -tags "fts5 sqlite_vec" ./scripts -count=1; python3 docs/scripts/check_markdown_sources.py; git diff --check passed.

Generated with Codex (GPT-5)
Co-authored-by: Codex <codex@openai.com>
The docs integration exposed a few local polish issues after the first end-to-end pass: Vercel can leave a docs-local .gitignore behind, the custom header duplicated the msgvault wordmark, and generated terminal SVGs depended on Liberation Mono being installed in the reader's browser. Keep the logo source icon-only, make screenshot SVGs fall back to common monospace fonts, and guard those expectations in the docs checks.

Jesse Robbins' post-0.16.0 documentation contributions landed in the source docs just after the release docs update, so credit those vector-search and accounts/deduplication corrections in the 0.16.0 acknowledgements.

Validation: reproduced the docs/.gitignore leak with a red guardrail test; regenerated screenshots through docs/screenshots/generate-all.sh --skip-data --skip-build; published origin/docs-generated-assets at d2601b6; verified rendered header and TUI screenshots with headless Chrome.

Generated with Codex (GPT-5)
Co-authored-by: Codex <codex@openai.com>
The docs media-reference tests should verify check-docs behavior, not whether a CI runner happens to have ripgrep installed. macOS and PostgreSQL CI runners failed before those assertions because rg was absent from PATH.

Provide a tiny rg fixture for those tests so the harness reaches the intended media-reference checks consistently across local machines and CI images.

Validation: reproduced the CI failure with a stripped PATH containing no rg, then reran that same stripped-PATH targeted test successfully.

Generated with Codex (GPT-5)
Co-authored-by: Codex <codex@openai.com>
The CI lint target runs the custom testify-helper analyzer after tests. The docs asset test file had several assertion-heavy cases that were fine at runtime but violated that repository convention once the test step no longer failed early.

Use local assert and require helpers in those cases so the docs checks follow the same testify style enforced across the rest of the repo.

Validation: make lint-ci; make test; make docs-check; stripped-PATH media-reference reproduction without rg; git diff --check.

Generated with Codex (GPT-5)
Co-authored-by: Codex <codex@openai.com>
@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (5b7180c)

High: asset hydration can publish build secrets via symlinked assets.

  • High: docs/assets/hydrate-assets.sh:125 / has_expected_assets at docs/assets/hydrate-assets.sh:60
    The Vercel build extracts remote asset branch archives directly into docs/assets, and the post-check only uses -f, which follows symlinks. An actor able to update an asset branch could make an expected file like favicon.svg a symlink to /proc/self/environ; hydration would pass, and the docs build could publish build environment variables into the public site.
    Fix: Treat hydrated branches as untrusted input: extract to a temporary directory, reject symlinks and non-regular files, enforce an allowlist, and only then move assets into docs/assets. Also require [[ -f "$path" && ! -L "$path" ]] or validate git ls-tree modes for regular blobs only.

  • Medium: docs/screenshots/generate_demo_data.py:632
    The generator picks a random sid per message but may reuse the previous conv_id without confirming that conversation belongs to the same source. With multiple accounts, this can create messages whose messages.source_id does not match conversations.source_id, corrupting the demo database used for screenshots and analytics cache generation.
    Fix: Track reusable conversations per source, or only reuse a conversation when its source_id matches the current sid.


Panel: ci_default_security | Synthesis: codex, 10s | Members: codex_default (codex/default, done, 7m47s), codex_security (codex/security, done, 6m48s) | Total: 14m45s

Windows CI runs the screenshot wrapper through a Unix shell, so the docker command log contains MSYS-style temp paths even though the Go test built the fixture path with native Windows separators. The assertion should verify the requested repo context without depending on one temp-path spelling.

Validation: inspected failed Windows CI job 82596760481; go test -tags "fts5 sqlite_vec" ./scripts -run TestDocsScreenshotGenerateAllDoesNotRequireDockerIgnorefileFlag -count=1; make lint-ci; go test -tags "fts5 sqlite_vec" ./scripts -count=1; make test; make docs-check; stripped-PATH media-reference reproduction without rg; git diff --check.

Generated with Codex (GPT-5)
Co-authored-by: Codex <codex@openai.com>
@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (cb0e4ae)

High severity issue found.

High

  • docs/zensical-docs.sh:61 stages the docs tree with tar but only excludes ./.env*.local. Repo-ignored secret files such as docs/.env or docs/.env.production could be copied into the temporary docs tree and published as static output, exposing credentials to unauthenticated users.

    Suggested remediation: exclude all dotenv and known secret patterns during staging, such as ./.env, ./.env.*, ./client_secret*.json, ./oauth_client*.json, or switch to an explicit allowlist of public docs files. Add a built-site validation check that fails if dotfiles or known secret filenames appear under docs/site.


Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 11m32s), codex_security (codex/security, done, 5m10s) | Total: 16m49s

The docs build is a public artifact boundary, unlike the single-user local app. Repo-ignored dotenv and OAuth client files should not be able to cross into the temporary Zensical tree or survive in docs/site, and review guidance should treat that boundary as real while avoiding generic local-file noise.

Document the testing rule that came out of this fix: avoid tautological shell-copy tests that prove a fixture rather than production behavior. Future exceptions need to prove a stable external contract that the real path cannot cover.

Validation: created ignored docs/.env, docs/.env.production, docs/client_secret.json, and docs/oauth_client.json, ran the real Zensical docs build, and confirmed no matching files appeared under docs/site; then planted docs/site/.env and confirmed check_built_site.py failed with "forbidden public site dotfile".

Generated with Codex (GPT-5)
Co-authored-by: Codex <codex@openai.com>
@wesm wesm merged commit 6e39c04 into main Jun 21, 2026
12 of 13 checks passed
@wesm wesm deleted the integrate-docs branch June 21, 2026 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant