Skip to content

Perf: speed up task --list on projects with many includes#2819

Open
romnn wants to merge 4 commits intogo-task:mainfrom
romnn:perf/speed-up-listing-tasks
Open

Perf: speed up task --list on projects with many includes#2819
romnn wants to merge 4 commits intogo-task:mainfrom
romnn:perf/speed-up-listing-tasks

Conversation

@romnn
Copy link
Copy Markdown
Contributor

@romnn romnn commented Apr 30, 2026

I work in an org with a fairly large monorepo and we use task heavily as our cross-language entry point. The taskfile setup looks like this:

  • 379 taskfiles checked in
  • 199 of them transitively loaded from the root (25 top-level includes, recursive from there)
  • 2,662 tasks visible to task --list (3,547 total before filtering internals)
  • 328 merged env: entries and 13 merged vars: at taskfile scope

task --list here takes about 13.7 seconds.

Four commits, applied in order of biggest gain first. Each is scoped to a single change and should be readable on its own — they accumulate, but they're largely independent and can be cherry-picked individually if any one of them seems too risky:

  1. Skip the template engine for strings without {{ (13.7s → 5.0s).
    Most env values are static literals; running them through template.Parse + Execute is pure overhead.
  2. Share one Cache across the whole getVariables rangeFunc loop (5.0s → 3.1s).
    The previous code allocated a fresh templater.Cache per variable, so cacheMap got rebuilt from scratch for every entry.
  3. Share cacheMap directly when the template can't mutate it; pool clones for the cases that can (3.1s → 0.44s).
    The shallow maps.Clone was there to defend against sprig's set/unset writing back into cacheMap. A cheap substring check on the template tells us when that's possible — for everything else (and for nil values, which can't run a template at all) we just hand tpl.Execute the cacheMap directly. When we do need to clone, the clone goes through a sync.Pool.
  4. Cache parsed templates by source string (0.44s → 0.25s).
    template.New("").Funcs(templateFuncs).Parse(text) copies the ~170-entry sprig FuncMap into a fresh template every time. With many tasks processing the same merged env/vars, the same source strings get parsed over and over. Parsed templates are documented as safe for concurrent Execute, so memoizing them in a sync.Map is straightforward.

End-to-end: 13.7s → 0.25s, ~55× faster on the repo above on an 11-core M3 Pro macbook with 18GB RAM.

Correctness

This is best effort. I've validated:

  • All upstream tests pass on Go 1.25.x and 1.26.x.
  • task --list output is byte-identical to v3.50.0 on our repo.
  • Templates that mutate . via sprig's set/unset produce the same output as the system task binary in targeted reproductions.

The risk areas, in case I missed something:

  • The COW decision in commit 3 uses a substring check ("set" anywhere in the template) to decide whether to clone. False positives are fine — they just clone when not strictly necessary. False negatives would silently leak template-side mutations back into cacheMap. I went through the templater's func map and the only top-level dict mutators I found were sprig's set and unset (unset is also caught by the substring scan for "set"). Push/append/prepend mutate slice values, which the original shallow clone never protected against either. If there are mutators I missed — especially any that get added later — this could regress.
  • The shared Cache in commit 2 is per-getVariables-call, not across goroutines. Each task compilation still gets its own cache. But it does mean that if a template mutates cacheMap (i.e. clone path didn't run because the substring check missed), pollution would persist across subsequent variables in the same compilation. The combination of the always-present clone in commit 1 and the COW logic in commit 3 should prevent this in practice — but it's worth flagging.

So: tests pass, behavior matches on our workspace, but I can't 100% guarantee no edge case in some other repo's taskfiles.

On the four-commit structure

The contributing guide prefers a single commit. I deliberately split this into four because each one is a self-contained change with its own measurable win and its own correctness argument. If any single commit turns out to be unsafe in some repo I haven't thought of, you can revert just that one without losing the others. Happy to squash before merge if you'd rather have a single commit.

On the doc comments

The comments I added on the new functions are wordy — I left "why" in there explicitly because I wanted the trade-offs visible. Happy to trim, reword, or drop them entirely.

Also open to feedback on the substring approach in commit 3. A more rigorous version would parse the template AST and look for FuncCall nodes matching set/unset. That felt like over-engineering for a static set of two function names, but if you'd prefer that approach I can add it.

AI disclosure

Per the contributing guide: I used Claude as a profiling/research tool while digging into where the time was actually going and validating correctness against the upstream test suite. The code went through several rounds of review and adjustment by me before landing here.

@trulede
Copy link
Copy Markdown
Contributor

trulede commented Apr 30, 2026

@romnn The first commit as a separate PR would be useful. Please, without AI generated comments.
Also please indicate the processor/ram/os as basis for benchmarks.

Its reasonable to assume, if you make your build infrastructure more modular (Task calls Task), that things would naturally be faster. I also think that simply caching content for task list would actually be much much faster. Here I mean, run Task to generate the list, then put a script frontend (adapted from one of the existing methods), and you would probably get much much faster performance.

@romnn
Copy link
Copy Markdown
Contributor Author

romnn commented Apr 30, 2026

@trulede thanks for the quick feedback!

The numbers are from an 11-core M3 Pro macbook with 18GB of RAM, so its a fast multi-core machine that I feel should be able to list tasks in under a second.

Caching the output of task --list would require fingerprinting all taskfiles in our repo with additional infrastructure to maintain, so I don't think it would be worth it. I agree that "Task calls Task" is likely much faster, but it loses the ability to get the full list of all tasks that are available and do a quick task --list | grep ... right?

I agree that the first commit is the most obvious, but it's still 5 seconds due to the heavy allocation churn. I will open a separate PR #2820 with just the first optimization regardless.

Best, Roman

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants