Skip to content

cache: publish cached closure narinfos#127

Open
amaanq wants to merge 1 commit into
mainfrom
cache-closure-narinfos
Open

cache: publish cached closure narinfos#127
amaanq wants to merge 1 commit into
mainfrom
cache-closure-narinfos

Conversation

@amaanq

@amaanq amaanq commented Jun 24, 2026

Copy link
Copy Markdown
Member

Cached builds that reused an existing output only published narinfos for the primary output, so clients could miss dependencies from the project cache.

Publish signed narinfos for the whole closure, keep project ownership separate for shared store paths, and move recursive lookup and signing out of the scheduler loop.

@amaanq amaanq force-pushed the cache-closure-narinfos branch from 04d9936 to 558242d Compare June 24, 2026 20:07
Cached builds that reused an existing output only published narinfos
for the primary output, so clients could miss dependencies from the
project cache.

Publish signed narinfos for the whole closure, keep project ownership
separate for shared store paths, and move recursive lookup and signing
out of the scheduler loop.

Co-authored-by: Amaan Qureshi <git@amaanq.com>
@amaanq amaanq force-pushed the cache-closure-narinfos branch from 558242d to 9f713ce Compare June 24, 2026 20:39

@NotAShelf NotAShelf left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. Some of them are just a little bit speculative but I'd like to catch things before they explode.

Comment on lines 67 to +99
@@ -82,6 +83,20 @@ pub async fn upsert(pool: &PgPool, info: UpsertNarInfo<'_>) -> Result<()> {
.bind(info.project_id)
.execute(pool)
.await?;

if let Some(project_id) = info.project_id {
sqlx::query(
"INSERT INTO narinfo_cache_projects (store_path, project_id, build_id, \
updated_at) VALUES ($1, $2, $3, NOW()) ON CONFLICT (store_path, \
project_id) DO UPDATE SET build_id = COALESCE(EXCLUDED.build_id, \
narinfo_cache_projects.build_id), updated_at = NOW()",
)
.bind(info.store_path)
.bind(project_id)
.bind(info.build_id)
.execute(pool)
.await?;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the first insert succeeds and the second fails, the narinfo_cache row exists without its project association. I feel like this'd silently break narinfo serving for that path until the next build touches it. If I'm correct, wrapping both in a transaction eliminates this partial-write window.

Comment on lines +266 to +282
async fn has_signed_persisted_narinfo(
pool: &PgPool,
store_path: &str,
project_id: Option<Uuid>,
) -> Result<bool, ApiError> {
sqlx::query_scalar::<_, bool>(
"SELECT EXISTS(SELECT 1 FROM narinfo_cache n WHERE n.store_path = $1 AND \
n.sig IS NOT NULL AND btrim(n.sig) != '' AND ($2::uuid IS NULL OR \
n.project_id = $2 OR EXISTS (SELECT 1 FROM narinfo_cache_projects ncp \
WHERE ncp.store_path = n.store_path AND ncp.project_id = $2)))",
)
.bind(store_path)
.bind(project_id)
.fetch_one(pool)
.await
.map_err(|e| ApiError(circus_common::CiError::Database(e)))
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit superficial. We're checking btrim(n.sig) != '' rather than verifying the signature cryptographically against the store path fingerprint. "Exploiting" this would require DB write access anyway so it's not particularly a concern right now, but I'd like to consider (now or in a followup PR) validating the signature against the narinfo fingerprint using the server's signing key.

Comment on lines +128 to +144
fn narinfo_row_uses_local_nar_route(
row: &circus_common::repo::narinfo_cache::NarInfo,
) -> bool {
if row.compression != "none" {
return false;
}
let Some(path) = row.url.strip_prefix("nar/") else {
return false;
};
let Some((object, query)) = path.split_once('?') else {
return false;
};
path::Path::new(object)
.extension()
.is_some_and(|ext| ext.eq_ignore_ascii_case("nar"))
&& query.split('&').any(|part| part.starts_with("hash="))
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This classifies NAR routes as local purely by URL format (compression == "none", starts with nar/, has ?hash=, ends in .nar). It doesn't validate hash segments are valid nix store hashes, and seems incredibly fragile to me.

Comment on lines +124 to +142
fn spawn_succeeded_output_closure_publication(
pool: &PgPool,
worker_pool: &Arc<WorkerPool>,
build_id: Uuid,
output_path: &str,
) {
let pool = pool.clone();
let worker_pool = Arc::clone(worker_pool);
let output_path = output_path.to_owned();
tokio::spawn(async move {
publish_succeeded_output_closure(
&pool,
&worker_pool,
build_id,
&output_path,
)
.await;
});
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Panics or runner shutdown silently drop the work because closure publication is spawned as tokio::spawn. The main build dispatch in worker.rs, however, calls persist_closure_narinfos inline (with .await) and thus, we've got a discrepancy. I think it's worth reconciling so maybe consider a JoinSet or tracking completion so panics don't silently drop publication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants