diff --git a/docs/site_build/overview.md b/docs/site_build/overview.md new file mode 100644 index 0000000000..9c2317d327 --- /dev/null +++ b/docs/site_build/overview.md @@ -0,0 +1,45 @@ +# Introduction +This documentation is aimed at HPC sites or other facilities that make EESSI available on their system, but would like to offer additional installations that are performed 'on top' of EESSI (i.e. using dependencies provided by EESSI). + +There are several reasons why, as a site, you may want to offer additional software on top of EESSI. For example: +1. You want to offer software that does is not suitable for upstream deployment in EESSI (e.g. because it is proprietary, or because it is a development build / otherwise very specific build that is not useful for a general audience). +2. You need to make software available on (very) short notice to your users, and cannot wait for it to be deployed in upstream EESSI. +3. You want to retain full autonomy over what gets deployed + +While all of these are valid arguments, note that there is also one major downside to deploying things locally: you loose one of the core benefits of EESSI, namely that it provides _the same software on every system_. The more site-specific installations you have, the more difficult it will be for your users to move their workflows from e.g. their own development machine/cloud environment to your cluster, or scale up to larger clusters. If you're doing site-builds to make software available to your users on short notice, we highly encourage you to _also_ contribute the same software installation in upstream EESSI. This way, once accepted upstream, users that rely on that software retain their 'mobility'. + +# Choosing your approach +There are two approaches to doing site builds, each with their own advantages and disadvantages. + +1. Performing site builds using EESSI-extend on a shared filesystem. +2. Leveraging all of EESSI's tooling for site builds. In this approach, you use the EESSI build bot (`EESSI/eessi-bot-software-layer`), together with the EESSI build scripts (`EESSI/software-layer-scripts`) to build and deploy software into a CernVM-FS repository of your own. Essentially, this means you'll build in a way that is essentially identical to how it is done for upstream EESSI - with the only major difference being the target CernVM-FS repository. + +In both cases, you build 'on top' of EESSI, meaning that dependencies that are already provided by EESSI will not be reinstalled: they will simply be loaded from EESSI. + +Here, we list some advantages and disadvantages to help you choose which approach best suites your requirements. + +## Approach 1: using EESSI-extend on shared FS + +Advantages: +- Easy to get started: no additional setup or knowledge needed +- Automatically optimizes for the host on which you run the installation, and installs in architecture-specific prefix that matches the host architecture. This means you can install optimized software for each of your CPU/GPU architectures in an organized way. + +Disadvantages: +- This is a manual procedure (unless you create your own automation around it). As such, doesn't scale well to installing large amounts of software and/or installing software for many different hardware targets. +- The fact that you get optimized installations means that on a very heterogeneous system, you will have to run the installation many times - once for each architecture on which you want to offer that particular piece of software. +- Shared filesystems (and especially _parallal_ filesystems) are generally ill-suited to serve software. This means start-up time can be quite long (you can find some numbers [here](../training-events/2025/tutorial-best-practices-cvmfs-hpc/performance.md)). + +## Approach 2: leveraging all of EESSI's tooling for site builds + +Advantages: +- Highly automated +- Scalable to many architectures & installations +- Site builds are done based on a list of software in a GitHub repo - making it very transparent what is available / got added on your system +- Share maintenance on the automation with the EESSI community +- End-user look & feel are very similar to EESSI + +Disadvantages +- More setup time +- Requires more extnesive knowledge (CVMFS, EESSI build bot, object store) +- More hardware resources (CVMFS infrastructure, bot infrastructure) +- More components (software/hardware) to maintain diff --git a/docs/site_build/shared_fs.md b/docs/site_build/shared_fs.md new file mode 100644 index 0000000000..1333ed77b7 --- /dev/null +++ b/docs/site_build/shared_fs.md @@ -0,0 +1 @@ +TODO diff --git a/docs/site_build/site_cvmfs.md b/docs/site_build/site_cvmfs.md new file mode 100644 index 0000000000..9b63b76d9c --- /dev/null +++ b/docs/site_build/site_cvmfs.md @@ -0,0 +1,32 @@ +# Leverage EESSI's build procedure for site builds +In this approach, you use the EESSI build bot (`EESSI/eessi-bot-software-layer`), together with the EESSI build scripts (`EESSI/software-layer-scripts`) to build and deploy software into a CernVM-FS repository of your own. Essentially, this means you'll build in a way that is essentially identical to how it is done for upstream EESSI - with the only major difference being the target CernVM-FS repository. + +## Setup steps +What we need: +- Infrastructure for a site-specific CVMFS repository (Stratum 0, Stratum 1, proxies, client configuration) +- An instance of the EESSI build bot +- A bucket in an AWS S3-compatible object store (though you could work around this) +- A GitHub organization on which you can install GitHub Apps +- A GitHub repository within that organization which will be used to list the software you want to build +- Optionally: an automated procedure to ingest tarballs + +This documentation will go through the steps to set each of these up, in order. Since many of these individual steps are documented elsewhere, we will often reference that (and only list a very short summary here). + +### A site-specific CVMFS infrastructure +The recommended CVMFS setup for a site-specific CVMFS repository is: +- A Stratum 0 servers +- Two (or more) Stratum 1 servers +- Two (or more) proxies + +Main reason here is: +- Having two Stratum 1's provides redundancy: if one dies, proxies seamlessly failover to the other one. +- Having two proxies provides both redundancy _and_ load balancing. If one proxy dies, clients failover to the other one. If clients are configured to use the proxies in a [proxy group](https://cvmfs.readthedocs.io/en/2.8/cpt-configure.html#proxy-lists), each client selects a proxy randomly, thus providing load balancing. + +!!! note + + The recommended CVMFS setup requires a fair amount of machines. If this is more than you can afford, there are some tricks you can pull. First, you can combine each proxy with a Stratum 1 on the same machine, only use the proxies for proxy-ing upstream EESSI, and simply have your clients contact your site-specific Stratum 1's directly (without proxy). In this scenario, you can achieve load-balancing by configuring half your clients with `CVMFS_SERVER_URL=";"` and half with `CVMFS_SERVER_URL=";"`, where `instance_1` and `instance_2` are the IPs of your Stratum 1's. Finally, you can even use the Stratum 0 instead of a second Stratum 1. Note that this has security implications, as it means your Stratum 0 needs to be directly accessible to your clients. This is a potential concern: if there are vulnarebilities in the Stratum 0 software, end-users may be able to push (malicious) software in there. + +An extensive [tutorial](https://cvmfs-contrib.github.io/cvmfs-tutorial-2021/) is available that teaches how to setup each of these machines, and how to configure the clients to use the relevant Stratum 1's and proxies. Below, we will summarize some of the key steps, and point out things that are specifically relevant for this setup. + +#### Setting up the Stratum 0 +