Skip to content

Add --parallel-io CLI flag for concurrent loop execution#129

Open
vlap wants to merge 3 commits into
uwefladrich:masterfrom
vlap:perf/parallel-loops
Open

Add --parallel-io CLI flag for concurrent loop execution#129
vlap wants to merge 3 commits into
uwefladrich:masterfrom
vlap:perf/parallel-loops

Conversation

@vlap

@vlap vlap commented Jun 15, 2026

Copy link
Copy Markdown

Summary

  • Adds --parallel-io flag to the se CLI that runs loop iterations concurrently using ThreadPoolExecutor
  • Same YAML scripts work unchanged — parallelism is opt-in via CLI invocation (se --parallel-io script.yml)
  • Adds fast-path in Jinja rendering: skips template parsing for strings without { markers
  • Warns when tasks return context updates inside parallel loops (they are discarded)
  • Includes benchmark script and two unit tests

Motivation

EC-Earth4 setup scripts contain many I/O-bound loops (copying inidata, weights, restart files). On HPC parallel filesystems like Lustre, these loops are bottlenecked by per-file latency, not bandwidth. Threading allows multiple file operations to overlap.

Benchmark results

Tested on two HPC Lustre filesystems with the included test-se-run/bench/run-bench.sh:

Platform base.copy base.move
MN5 (BSC), 10 files, 2.6 GB 38% speedup no benefit (same-fs rename)
hpc2020 (ECMWF), 10 files, 3 GB 60% speedup no benefit (same-fs rename)

Design decisions

  • --parallel-io, not --parallel: name communicates the constraint — only I/O-bound, independent iterations benefit.
  • ThreadPoolExecutor: GIL is released during I/O syscalls (shutil.copy2, etc.), so threads provide real concurrency for file operations without the overhead of multiprocessing.
  • No context accumulation in parallel mode: loop iterations are treated as independent. A warning is logged if a task returns a context update inside a parallel loop.

Test plan

  • All 170 existing tests pass
  • Two new tests: test_parallel_loop, test_parallel_loop_with_context_var
  • Benchmarked on MN5 (BSC) and hpc2020 (ECMWF)

🤖 Generated with Claude Code

vlap and others added 3 commits June 15, 2026 20:54
Enable thread-based parallelism for loop iterations via `se --parallel`.
Uses ThreadPoolExecutor to run I/O-bound loop bodies (base.copy, base.move)
concurrently. The same YAML scripts work unchanged — parallelism is
activated purely by the CLI flag.

Also adds a fast-path in Jinja rendering that skips template parsing for
strings without template markers, reducing per-iteration overhead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Self-contained bench script that measures base.copy loop speedup from
thread parallelism on HPC Lustre filesystems. Tested on MN5 (BSC) and
hpc2020 (ECMWF) with 51-58% speedup for I/O-bound copy loops.

Usage: bash test-se-run/bench/run-bench.sh <dir_with_nc_files> [N]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename CLI flag to --parallel-io to clarify it's for I/O-bound loops
- Warn when tasks return context updates inside parallel loops (they are
  discarded since iterations run independently)
- Simplify and add base.move to benchmark script

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vlap vlap changed the title Add --parallel CLI flag for concurrent loop execution Add --parallel-io CLI flag for concurrent loop execution Jun 15, 2026
Comment thread src/scriptengine/jinja.py
Comment on lines +108 to +109
if "{" not in string_arg:
return string_arg

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this little change slipped in from #128 and does not really belong here, does it?

@uwefladrich

Copy link
Copy Markdown
Owner

Hi @vlap,

Thanks a lot for the PR! Parallelisation has been on my wish list from the start of ScriptEngine. In fact, the abstraction of the actual "engine" that runs the scripts has partly been motivated by the option of implementing an advanced engine that executes tasks in parallel in mind. However, it has never been realised.

The bottleneck you describe makes sense to me, given the number of files and characteristics of the file system. So parallelisation seems a reasonable approach.

When SE is executing tasks, an important step is to update the context consistently. Which is why this is done in the engine, not the task code itself. When executing tasks in parallel (in a loop or otherwise), the context update is not trivial. This is the same issue as for shared variables in other parallel languages.

You chose to ignore all context updates from parallel tasks, which is of course an option to handle potentially conflicting updates. However, it also deprives all tasks in a parallel loop to communicate any data, conflicting or not.

Moreover, the parallelisation is controlled by a command line switch, which works script-globally, affecting all loops. This changes semantics quite heavily, in my opinion, as context updates of all tasks in all loops are suddenly lost, compared to a run without --parallel-io. At least that's my understanding of the changes. I wonder if the full set of EC-Earth run scripts would work with this feature switched on?

Last, I wonder why the command line switch is called --parallel-io? The implementation of this feature does not seem to be io-specific? Why not, for example, --parallel-loops?

All this being said, I honestly appreciate your work and I also realise that I should not have the final word, because you and the ECE users are using SE much more than I do (at least at the moment). So my intention is more to spark a discussion within the ECE users of SE about this. Maybe you can ping the right people.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants