Skip to content

fix(FromTarGz): use Extraction field, compare basename#7

Merged
myleshorton merged 2 commits intomainfrom
fisk/fix-targz-extract
Apr 22, 2026
Merged

fix(FromTarGz): use Extraction field, compare basename#7
myleshorton merged 2 commits intomainfrom
fisk/fix-targz-extract

Conversation

@myleshorton
Copy link
Copy Markdown
Contributor

Two back-to-back bugs in the `mholt/archiver` → `mholt/archives` port (commit `017d542`), discovered while debugging why the lantern-box country tag was empty across the whole bandit fleet.

Bug 1 — "no extraction format"

`archives.CompressedArchive.Extract()` reads the `Extraction` field, not `Archival`:

```go
// archives/formats.go:285
if ca.Extraction == nil {
return fmt.Errorf("no extraction format")
}
```

keepcurrent was setting `Archival` instead, so every `Fetch()` returned that error. Because `keepcurrent.syncOnce` feeds errors into `OnSourceError` (which is typically `ExpBackoffThenFail(time.Minute, 30, ...)`), the failure was silently retried for up to 30 tries with exponential backoff before any log line was emitted — the caller could wait >17 years for the final error.

Bug 2 — basename vs full path

`info.NameInArchive` is the full stored path in the archive (e.g. `GeoLite2-City_20260116/GeoLite2-City.mmdb` as MaxMind ships it). Callers pass the basename (`GeoLite2-City.mmdb`), matching the `archiver/v3` behavior this replaced. Compare with `path.Base()` so these match.

How this was found

lantern-box calls `geo.FromWeb("https://lanterngeo.lantern.io/GeoLite2-City.mmdb.tar.gz\", "GeoLite2-City.mmdb", ...)`. On every bandit VPS, the download was silently failing. Downstream effect: every `proxy.io` metric was tagged with `geo.country.iso_code=""` (the default after `*lookup.CountryCode` returns empty when the db never loads), and the Bandwidth dashboard's country filter matched zero everywhere. Confirmed by running a minimal probe binary against a live VPS — reproduced both failures, then verified the fix extracts 63 MB cleanly in <2 s.

Test plan

  • `go build ./...` clean
  • `go test ./...` passes (existing tests still OK)
  • Verified live against the real MaxMind tarball on a prod bandit VPS: extracts `GeoLite2-City.mmdb` (63417044 bytes) in 1.6 s
  • Once merged, bump in lantern-box and rebuild packer image

Two back-to-back bugs in the mholt/archives port (commit 017d542):

1. archives.CompressedArchive.Extract() reads ca.Extraction (not
   ca.Archival) and returns "no extraction format" if it's nil. We were
   setting Archival, which made every Fetch() fail silently — every
   error was swallowed by the retry loop up to the 30-try ceiling.

2. info.NameInArchive is the full stored path (e.g.
   "GeoLite2-City_20260116/GeoLite2-City.mmdb"). Callers pass a
   basename, matching the archiver/v3 behavior this replaced. Compare
   on path.Base so real tarballs with dated directory prefixes (as
   MaxMind ships) actually match.

Observed downstream: lantern-box's MaxMind fetch never completed on
any VPS, so every proxy.io data point was tagged
geo.country.iso_code="" and the dashboard's country filter matched
zero everywhere.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes regressions in the FromTarGz source implementation introduced during the mholt/archivermholt/archives port, ensuring tar.gz extraction works and expected filenames match archived paths.

Changes:

  • Use archives.CompressedArchive.Extraction (instead of Archival) so Extract() has a non-nil extraction format.
  • Compare path.Base(info.NameInArchive) to the expected filename to match callers that pass basenames.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread source.go
Regression tests for the two bugs fixed in the prior commit. Verified
that both fail without the fix:
  - TestFromTarGzBasenameMatch: "no extraction format" (bug 1) and,
    after that's fixed, falls through to the full-path compare (bug 2).
  - TestFromTarGzMissingFile: surfaces "not found" rather than the
    silent "no extraction format" that hid the original regression.

Uses an in-memory tar.gz with a MaxMind-style dated directory prefix
so no fixture files are needed.
@myleshorton myleshorton merged commit 54a4d9a into main Apr 22, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants