Add --parallel-io CLI flag for concurrent loop execution#129
Conversation
Enable thread-based parallelism for loop iterations via `se --parallel`. Uses ThreadPoolExecutor to run I/O-bound loop bodies (base.copy, base.move) concurrently. The same YAML scripts work unchanged — parallelism is activated purely by the CLI flag. Also adds a fast-path in Jinja rendering that skips template parsing for strings without template markers, reducing per-iteration overhead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Self-contained bench script that measures base.copy loop speedup from thread parallelism on HPC Lustre filesystems. Tested on MN5 (BSC) and hpc2020 (ECMWF) with 51-58% speedup for I/O-bound copy loops. Usage: bash test-se-run/bench/run-bench.sh <dir_with_nc_files> [N] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename CLI flag to --parallel-io to clarify it's for I/O-bound loops - Warn when tasks return context updates inside parallel loops (they are discarded since iterations run independently) - Simplify and add base.move to benchmark script Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| if "{" not in string_arg: | ||
| return string_arg |
There was a problem hiding this comment.
I think that this little change slipped in from #128 and does not really belong here, does it?
|
Hi @vlap, Thanks a lot for the PR! Parallelisation has been on my wish list from the start of ScriptEngine. In fact, the abstraction of the actual "engine" that runs the scripts has partly been motivated by the option of implementing an advanced engine that executes tasks in parallel in mind. However, it has never been realised. The bottleneck you describe makes sense to me, given the number of files and characteristics of the file system. So parallelisation seems a reasonable approach. When SE is executing tasks, an important step is to update the context consistently. Which is why this is done in the engine, not the task code itself. When executing tasks in parallel (in a loop or otherwise), the context update is not trivial. This is the same issue as for shared variables in other parallel languages. You chose to ignore all context updates from parallel tasks, which is of course an option to handle potentially conflicting updates. However, it also deprives all tasks in a parallel loop to communicate any data, conflicting or not. Moreover, the parallelisation is controlled by a command line switch, which works script-globally, affecting all loops. This changes semantics quite heavily, in my opinion, as context updates of all tasks in all loops are suddenly lost, compared to a run without Last, I wonder why the command line switch is called All this being said, I honestly appreciate your work and I also realise that I should not have the final word, because you and the ECE users are using SE much more than I do (at least at the moment). So my intention is more to spark a discussion within the ECE users of SE about this. Maybe you can ping the right people. |
Summary
--parallel-ioflag to theseCLI that runs loop iterations concurrently usingThreadPoolExecutorse --parallel-io script.yml){markersMotivation
EC-Earth4 setup scripts contain many I/O-bound loops (copying inidata, weights, restart files). On HPC parallel filesystems like Lustre, these loops are bottlenecked by per-file latency, not bandwidth. Threading allows multiple file operations to overlap.
Benchmark results
Tested on two HPC Lustre filesystems with the included
test-se-run/bench/run-bench.sh:Design decisions
--parallel-io, not--parallel: name communicates the constraint — only I/O-bound, independent iterations benefit.shutil.copy2, etc.), so threads provide real concurrency for file operations without the overhead of multiprocessing.Test plan
test_parallel_loop,test_parallel_loop_with_context_var🤖 Generated with Claude Code