feat: log environment info and IO throughput summary for debugging by aftersnow · Pull Request #490 · modelpack/modctl

aftersnow · 2026-04-07T13:34:01Z

Summary

Add diagnostic logging to help identify performance bottlenecks in push/pull/fetch operations:

Log system environment info (CPU, memory, disk, cgroup) at startup
Show IO throughput summary with bottleneck indicator after each transfer

Example Output

Push (100MB to registry, tested on Linux x86_64):

IO summary: 100.00 MB in 34.2s, 2.93 MB/s | disk read: 1723.95 MB/s | network write: 2.93 MB/s ← bottleneck

Pull (50MB from registry):

IO summary: 50.00 MB in 1.2s, 40.21 MB/s | network read: 360.24 MB/s | disk write: 36.60 MB/s ← bottleneck

Log file (structured, for automated analysis):

level=info msg="io throughput summary" bottleneck="network write" disk\ read="1723.95 MB/s" effectiveThroughput="2.93 MB/s" network\ write="2.93 MB/s" operation=push totalBytes="100.00 MB" wallClock=34.163s

How It Works

Wraps the source io.Reader with a countingReader using sync/atomic counters. Each Read() call records bytes and duration. TrackTransfer() measures per-goroutine wall-clock time. After all goroutines complete:

Source throughput = totalBytes / cumulative source Read() time
Sink throughput = totalBytes / (cumulative transfer time − source read time)
The slower side is marked as ← bottleneck

Labels are operation-aware: push shows disk read / network write, pull/fetch shows network read / disk write.

Changes

New packages:

pkg/envinfo/ — CPU, memory, disk usage, cgroup v1/v2 limits (Linux), virtual FS detection
pkg/iometrics/ — Tracker, countingReader, throughput formatting

Modified files:

cmd/root.go — call envinfo.LogEnvironment() at startup
cmd/build.go, cmd/pull.go — log disk info for work directories
pkg/backend/push.go, pull.go, fetch.go — integrate Tracker

Test Results

Unit tests (macOS, 17 tests):

ok  github.com/modelpack/modctl/pkg/iometrics  1.394s  (12 tests)
ok  github.com/modelpack/modctl/pkg/envinfo     1.660s  (5 tests)

Covers: concurrent aggregation, timing accuracy, bottleneck detection, error propagation, zero-value safety, data integrity, virtual FS detection.

Remote machine test (Linux x86_64, Intel Xeon E5-2640 v3):

# Push 100MB random data → registry
IO summary: 100.00 MB in 34.2s, 2.93 MB/s | disk read: 1723.95 MB/s | network write: 2.93 MB/s ← bottleneck

# Pull 50MB from registry → local disk
IO summary: 50.00 MB in 1.2s, 40.21 MB/s | network read: 360.24 MB/s | disk write: 36.60 MB/s ← bottleneck

Both correctly identified the bottleneck side.

gemini-code-assist

Code Review

This pull request introduces a new envinfo package to log system environment details, including CPU, memory, disk usage, and cgroup limits, and adds an iometrics package to track and summarize IO throughput during push, pull, and fetch operations. The review identified several areas for improvement: tryV2 and tryV1 in cgroup_linux.go should be refactored to ensure memory limits are detected even if CPU limits are inaccessible; cpu.Percent usage in envinfo.go may return unhelpful data at startup; the isVirtualFS function appears to be dead code; and the formatBytes utility is duplicated and should be consolidated or replaced with the existing go-humanize dependency.

gemini-code-assist · 2026-04-07T13:39:02Z

pkg/envinfo/cgroup_linux.go

+	cpuMax, err := os.ReadFile("/sys/fs/cgroup/cpu.max")
+	if err != nil {
+		return false
+	}
+
+	parts := strings.Fields(strings.TrimSpace(string(cpuMax)))
+	if len(parts) == 2 && parts[0] != "max" {
+		quota, err1 := strconv.ParseFloat(parts[0], 64)
+		period, err2 := strconv.ParseFloat(parts[1], 64)
+		if err1 == nil && err2 == nil && period > 0 {
+			limits.CPUQuota = quota / period
+			limits.InCgroup = true
+		}
+	}
+
+	memMax, err := os.ReadFile("/sys/fs/cgroup/memory.max")
+	if err != nil {
+		return limits.InCgroup
+	}
+
+	memStr := strings.TrimSpace(string(memMax))
+	if memStr != "max" {
+		memLimit, err := strconv.ParseUint(memStr, 10, 64)
+		if err == nil {
+			limits.MemLimit = memLimit
+			limits.InCgroup = true
+		}
+	}
+
+	return limits.InCgroup


The tryV2 function returns early if it fails to read the CPU quota file. This prevents the function from checking for memory limits, which might still be present even if CPU limits are not configured or accessible. It's better to attempt reading both independently to ensure comprehensive limit detection.

if cpuMax, err := os.ReadFile("/sys/fs/cgroup/cpu.max"); err == nil { parts := strings.Fields(strings.TrimSpace(string(cpuMax))) if len(parts) == 2 && parts[0] != "max" { quota, err1 := strconv.ParseFloat(parts[0], 64) period, err2 := strconv.ParseFloat(parts[1], 64) if err1 == nil && err2 == nil && period > 0 { limits.CPUQuota = quota / period limits.InCgroup = true } } } if memMax, err := os.ReadFile("/sys/fs/cgroup/memory.max"); err == nil { memStr := strings.TrimSpace(string(memMax)) if memStr != "max" { if memLimit, err := strconv.ParseUint(memStr, 10, 64); err == nil { limits.MemLimit = memLimit limits.InCgroup = true } } }

gemini-code-assist · 2026-04-07T13:39:03Z

pkg/envinfo/cgroup_linux.go

+	quotaBytes, err := os.ReadFile("/sys/fs/cgroup/cpu/cpu.cfs_quota_us")
+	if err != nil {
+		return
+	}
+
+	quota, err := strconv.ParseFloat(strings.TrimSpace(string(quotaBytes)), 64)
+	if err != nil || quota <= 0 {
+		// -1 means no limit.
+		return
+	}
+
+	periodBytes, err := os.ReadFile("/sys/fs/cgroup/cpu/cpu.cfs_period_us")
+	if err != nil {
+		return
+	}
+
+	period, err := strconv.ParseFloat(strings.TrimSpace(string(periodBytes)), 64)
+	if err != nil || period <= 0 {
+		return
+	}
+
+	limits.CPUQuota = quota / period
+	limits.InCgroup = true
+
+	// cgroup v1: Memory limit.
+	memBytes, err := os.ReadFile("/sys/fs/cgroup/memory/memory.limit_in_bytes")
+	if err != nil {
+		return
+	}
+
+	memLimit, err := strconv.ParseUint(strings.TrimSpace(string(memBytes)), 10, 64)
+	if err != nil {
+		return
+	}
+
+	// Very large values (like 2^63) indicate no limit.
+	const noLimitThreshold = 1 << 62
+	if memLimit < noLimitThreshold {
+		limits.MemLimit = memLimit
+	}


In tryV1, the function returns early if any CPU-related file is missing or if the quota is not set, which skips the memory limit detection. Additionally, limits.InCgroup should be set to true if a memory limit is successfully detected, even if no CPU limit is found.

if quotaBytes, err := os.ReadFile("/sys/fs/cgroup/cpu/cpu.cfs_quota_us"); err == nil { if quota, err := strconv.ParseFloat(strings.TrimSpace(string(quotaBytes)), 64); err == nil && quota > 0 { if periodBytes, err := os.ReadFile("/sys/fs/cgroup/cpu/cpu.cfs_period_us"); err == nil { if period, err := strconv.ParseFloat(strings.TrimSpace(string(periodBytes)), 64); err == nil && period > 0 { limits.CPUQuota = quota / period limits.InCgroup = true } } } } // cgroup v1: Memory limit. if memBytes, err := os.ReadFile("/sys/fs/cgroup/memory/memory.limit_in_bytes"); err == nil { if memLimit, err := strconv.ParseUint(strings.TrimSpace(string(memBytes)), 10, 64); err == nil { // Very large values (like 2^63) indicate no limit. const noLimitThreshold = 1 << 62 if memLimit < noLimitThreshold { limits.MemLimit = memLimit limits.InCgroup = true } } }

gemini-code-assist · 2026-04-07T13:39:03Z

pkg/envinfo/envinfo.go

+		fields["model"] = infos[0].ModelName
+	}
+
+	percents, err := cpu.Percent(0, false)


cpu.Percent(0, false) returns the CPU usage since the last call. When called for the first time at process startup with a 0 interval, it typically returns 0.0% or an error, which doesn't provide useful information about the system's current load. Consider using a small interval (e.g., 100ms) to get a meaningful snapshot, or omit this field if startup delay is a concern.

gemini-code-assist · 2026-04-07T13:39:03Z

pkg/envinfo/envinfo.go

+func isVirtualFS(fstype, device string) bool {
+	fstypeLower := strings.ToLower(fstype)
+
+	virtualFSTypes := []string{
+		"fuse", "nfs", "cifs", "smb", "smbfs",
+		"tmpfs", "ramfs", "devtmpfs",
+		"overlay", "aufs",
+		"sshfs", "s3fs", "gcsfuse", "ossfs",
+		"9p", "virtiofs",
+	}
+
+	for _, vfs := range virtualFSTypes {
+		if fstypeLower == vfs || strings.HasPrefix(fstypeLower, vfs+".") {
+			return true
+		}
+	}
+
+	// Devices not under /dev/ are generally not block devices.
+	// e.g., "s3fs", "sshfs#user@host:", "server:/export"
+	if device != "" && !strings.HasPrefix(device, "/dev/") {
+		return true
+	}
+
+	return false
+}


The isVirtualFS function is defined and tested but appears to be unused in the current implementation. If it was intended to filter disk statistics in LogDiskInfo or elsewhere, please integrate it; otherwise, it should be removed to avoid dead code.

gemini-code-assist · 2026-04-07T13:39:03Z

pkg/envinfo/envinfo.go

+func formatBytes(b uint64) string {
+	const (
+		KB = 1024
+		MB = KB * 1024
+		GB = MB * 1024
+		TB = GB * 1024
+	)
+
+	switch {
+	case b >= TB:
+		return fmt.Sprintf("%.2f TB", float64(b)/float64(TB))
+	case b >= GB:
+		return fmt.Sprintf("%.2f GB", float64(b)/float64(GB))
+	case b >= MB:
+		return fmt.Sprintf("%.2f MB", float64(b)/float64(MB))
+	case b >= KB:
+		return fmt.Sprintf("%.2f KB", float64(b)/float64(KB))
+	default:
+		return fmt.Sprintf("%d B", b)
+	}
+}


The formatBytes function is implemented twice (here and in pkg/iometrics/format.go). Furthermore, the project already depends on github.com/dustin/go-humanize, which provides robust and well-tested functions for formatting byte sizes (e.g., humanize.Bytes). Consider consolidating this logic or using the existing dependency.

Add envinfo package that logs build version, runtime, CPU, memory, and disk usage information at startup. On Linux, cgroup CPU/memory limits are also logged to help diagnose container resource constraints. Signed-off-by: Zhao Chen <zhaochen.zju@gmail.com> Signed-off-by: Zhao Chen <winters.zc@antgroup.com>

Add pkg/iometrics package with a Tracker that wraps source io.Reader using atomic counters to measure bytes and cumulative Read() call duration. TrackTransfer() measures per-goroutine wall-clock time. After each operation completes, a summary is output to both terminal (stderr) and log file showing effective throughput, source read throughput, and readFraction (sourceReadTime / transferTime) to help identify whether the bottleneck is in source reading or sink writing. Signed-off-by: Zhao Chen <winters.zc@antgroup.com>

Replace generic "source/effective" labels with operation-aware labels (push: disk read / network write, pull: network read / disk write). Derive sink throughput from transferTime - sourceReadTime. Mark the slower side as bottleneck. Add overall effective throughput to output. Signed-off-by: Zhao Chen <winters.zc@antgroup.com>

- Fix cgroup v1/v2: detect CPU and memory limits independently so one missing limit doesn't prevent detecting the other - Remove cpu.Percent(0) from startup log — unreliable at process start - Remove unused isVirtualFS function (dead code after IO removal) - Replace duplicated formatBytes with humanize.IBytes from go-humanize Signed-off-by: Zhao Chen <winters.zc@antgroup.com>

- Read /proc/self/cgroup to find the process's actual cgroup path before reading limit files, so containers in k8s/Docker sub-cgroups get the correct CPU/memory limits instead of root cgroup values - Wrap manifest bytes.NewReader with tracker.WrapReader so push summaries include manifest bytes in totalBytes Signed-off-by: Zhao Chen <winters.zc@antgroup.com>

gemini-code-assist bot reviewed Apr 7, 2026

View reviewed changes

aftersnow force-pushed the show-env branch from 8499bde to e5ad7d2 Compare April 7, 2026 13:39

aftersnow changed the title ~~feat: log environment info and IO throughput summary~~ feat: log environment info and IO throughput summary for debugging Apr 7, 2026

aftersnow added 3 commits April 7, 2026 21:51

aftersnow force-pushed the show-env branch from e5ad7d2 to 7dc4574 Compare April 7, 2026 13:54

aftersnow added 2 commits April 7, 2026 22:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: log environment info and IO throughput summary for debugging#490

feat: log environment info and IO throughput summary for debugging#490
aftersnow wants to merge 5 commits intomainfrom
show-env

aftersnow commented Apr 7, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aftersnow commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Example Output

How It Works

Changes

Test Results

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aftersnow commented Apr 7, 2026 •

edited

Loading