cache: publish cached closure narinfos#127
Conversation
04d9936 to
558242d
Compare
Cached builds that reused an existing output only published narinfos for the primary output, so clients could miss dependencies from the project cache. Publish signed narinfos for the whole closure, keep project ownership separate for shared store paths, and move recursive lookup and signing out of the scheduler loop. Co-authored-by: Amaan Qureshi <git@amaanq.com>
558242d to
9f713ce
Compare
NotAShelf
left a comment
There was a problem hiding this comment.
Left some comments. Some of them are just a little bit speculative but I'd like to catch things before they explode.
| @@ -82,6 +83,20 @@ pub async fn upsert(pool: &PgPool, info: UpsertNarInfo<'_>) -> Result<()> { | |||
| .bind(info.project_id) | |||
| .execute(pool) | |||
| .await?; | |||
|
|
|||
| if let Some(project_id) = info.project_id { | |||
| sqlx::query( | |||
| "INSERT INTO narinfo_cache_projects (store_path, project_id, build_id, \ | |||
| updated_at) VALUES ($1, $2, $3, NOW()) ON CONFLICT (store_path, \ | |||
| project_id) DO UPDATE SET build_id = COALESCE(EXCLUDED.build_id, \ | |||
| narinfo_cache_projects.build_id), updated_at = NOW()", | |||
| ) | |||
| .bind(info.store_path) | |||
| .bind(project_id) | |||
| .bind(info.build_id) | |||
| .execute(pool) | |||
| .await?; | |||
| } | |||
There was a problem hiding this comment.
If the first insert succeeds and the second fails, the narinfo_cache row exists without its project association. I feel like this'd silently break narinfo serving for that path until the next build touches it. If I'm correct, wrapping both in a transaction eliminates this partial-write window.
| async fn has_signed_persisted_narinfo( | ||
| pool: &PgPool, | ||
| store_path: &str, | ||
| project_id: Option<Uuid>, | ||
| ) -> Result<bool, ApiError> { | ||
| sqlx::query_scalar::<_, bool>( | ||
| "SELECT EXISTS(SELECT 1 FROM narinfo_cache n WHERE n.store_path = $1 AND \ | ||
| n.sig IS NOT NULL AND btrim(n.sig) != '' AND ($2::uuid IS NULL OR \ | ||
| n.project_id = $2 OR EXISTS (SELECT 1 FROM narinfo_cache_projects ncp \ | ||
| WHERE ncp.store_path = n.store_path AND ncp.project_id = $2)))", | ||
| ) | ||
| .bind(store_path) | ||
| .bind(project_id) | ||
| .fetch_one(pool) | ||
| .await | ||
| .map_err(|e| ApiError(circus_common::CiError::Database(e))) | ||
| } |
There was a problem hiding this comment.
This feels a bit superficial. We're checking btrim(n.sig) != '' rather than verifying the signature cryptographically against the store path fingerprint. "Exploiting" this would require DB write access anyway so it's not particularly a concern right now, but I'd like to consider (now or in a followup PR) validating the signature against the narinfo fingerprint using the server's signing key.
| fn narinfo_row_uses_local_nar_route( | ||
| row: &circus_common::repo::narinfo_cache::NarInfo, | ||
| ) -> bool { | ||
| if row.compression != "none" { | ||
| return false; | ||
| } | ||
| let Some(path) = row.url.strip_prefix("nar/") else { | ||
| return false; | ||
| }; | ||
| let Some((object, query)) = path.split_once('?') else { | ||
| return false; | ||
| }; | ||
| path::Path::new(object) | ||
| .extension() | ||
| .is_some_and(|ext| ext.eq_ignore_ascii_case("nar")) | ||
| && query.split('&').any(|part| part.starts_with("hash=")) | ||
| } |
There was a problem hiding this comment.
This classifies NAR routes as local purely by URL format (compression == "none", starts with nar/, has ?hash=, ends in .nar). It doesn't validate hash segments are valid nix store hashes, and seems incredibly fragile to me.
| fn spawn_succeeded_output_closure_publication( | ||
| pool: &PgPool, | ||
| worker_pool: &Arc<WorkerPool>, | ||
| build_id: Uuid, | ||
| output_path: &str, | ||
| ) { | ||
| let pool = pool.clone(); | ||
| let worker_pool = Arc::clone(worker_pool); | ||
| let output_path = output_path.to_owned(); | ||
| tokio::spawn(async move { | ||
| publish_succeeded_output_closure( | ||
| &pool, | ||
| &worker_pool, | ||
| build_id, | ||
| &output_path, | ||
| ) | ||
| .await; | ||
| }); | ||
| } |
There was a problem hiding this comment.
Panics or runner shutdown silently drop the work because closure publication is spawned as tokio::spawn. The main build dispatch in worker.rs, however, calls persist_closure_narinfos inline (with .await) and thus, we've got a discrepancy. I think it's worth reconciling so maybe consider a JoinSet or tracking completion so panics don't silently drop publication.
Cached builds that reused an existing output only published narinfos for the primary output, so clients could miss dependencies from the project cache.
Publish signed narinfos for the whole closure, keep project ownership separate for shared store paths, and move recursive lookup and signing out of the scheduler loop.