fix(FromTarGz): use Extraction field, compare basename#7
Merged
myleshorton merged 2 commits intomainfrom Apr 22, 2026
Merged
Conversation
Two back-to-back bugs in the mholt/archives port (commit 017d542): 1. archives.CompressedArchive.Extract() reads ca.Extraction (not ca.Archival) and returns "no extraction format" if it's nil. We were setting Archival, which made every Fetch() fail silently — every error was swallowed by the retry loop up to the 30-try ceiling. 2. info.NameInArchive is the full stored path (e.g. "GeoLite2-City_20260116/GeoLite2-City.mmdb"). Callers pass a basename, matching the archiver/v3 behavior this replaced. Compare on path.Base so real tarballs with dated directory prefixes (as MaxMind ships) actually match. Observed downstream: lantern-box's MaxMind fetch never completed on any VPS, so every proxy.io data point was tagged geo.country.iso_code="" and the dashboard's country filter matched zero everywhere.
There was a problem hiding this comment.
Pull request overview
Fixes regressions in the FromTarGz source implementation introduced during the mholt/archiver → mholt/archives port, ensuring tar.gz extraction works and expected filenames match archived paths.
Changes:
- Use
archives.CompressedArchive.Extraction(instead ofArchival) soExtract()has a non-nil extraction format. - Compare
path.Base(info.NameInArchive)to the expected filename to match callers that pass basenames.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Regression tests for the two bugs fixed in the prior commit. Verified
that both fail without the fix:
- TestFromTarGzBasenameMatch: "no extraction format" (bug 1) and,
after that's fixed, falls through to the full-path compare (bug 2).
- TestFromTarGzMissingFile: surfaces "not found" rather than the
silent "no extraction format" that hid the original regression.
Uses an in-memory tar.gz with a MaxMind-style dated directory prefix
so no fixture files are needed.
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two back-to-back bugs in the `mholt/archiver` → `mholt/archives` port (commit `017d542`), discovered while debugging why the lantern-box country tag was empty across the whole bandit fleet.
Bug 1 — "no extraction format"
`archives.CompressedArchive.Extract()` reads the `Extraction` field, not `Archival`:
```go
// archives/formats.go:285
if ca.Extraction == nil {
return fmt.Errorf("no extraction format")
}
```
keepcurrent was setting `Archival` instead, so every `Fetch()` returned that error. Because `keepcurrent.syncOnce` feeds errors into `OnSourceError` (which is typically `ExpBackoffThenFail(time.Minute, 30, ...)`), the failure was silently retried for up to 30 tries with exponential backoff before any log line was emitted — the caller could wait >17 years for the final error.
Bug 2 — basename vs full path
`info.NameInArchive` is the full stored path in the archive (e.g. `GeoLite2-City_20260116/GeoLite2-City.mmdb` as MaxMind ships it). Callers pass the basename (`GeoLite2-City.mmdb`), matching the `archiver/v3` behavior this replaced. Compare with `path.Base()` so these match.
How this was found
lantern-box calls `geo.FromWeb("https://lanterngeo.lantern.io/GeoLite2-City.mmdb.tar.gz\", "GeoLite2-City.mmdb", ...)`. On every bandit VPS, the download was silently failing. Downstream effect: every `proxy.io` metric was tagged with `geo.country.iso_code=""` (the default after `*lookup.CountryCode` returns empty when the db never loads), and the Bandwidth dashboard's country filter matched zero everywhere. Confirmed by running a minimal probe binary against a live VPS — reproduced both failures, then verified the fix extracts 63 MB cleanly in <2 s.
Test plan