Mark Holliman, University of Edinburgh
Euclid is an ESA satellite mission which aims to map dark matter in the universe. The UK involvement in Euclid includes developing the weak lensing algorithm for detecting the distortion of light coming from galaxies as it passes dark matter clumps. The UK also hosts one of 9 Euclid Science Data Centres where telescope images will be analysed to derive science data products. Both of these responsibilities require significant computing resources and need a hybrid HTC/HPC-like environment. We also must adhere to Euclid computing rules which specify OS version, library/compiler versions, and system resources (shared filesystem). To meet these computing needs Euclid UK utilizes cloud computing resources from IRIS (https://www.iris.ac.uk/about/what-is-iris/). With help from StackHPC (https://www.stackhpc.com/), we have built a “Slurm as a service” (Saas) batch cluster which spans multiple data centres yet provides a single operational system for running our simulations and processing our data. This Saas cluster includes a federated CephFS file system visible to all worker nodes across the disparate sites, a critical resource requirement for running Euclid pipeline codes. This talk will explain the tools we use for deploying our distributed cluster infrastructure across cloud platforms, and how it operates and performs for meeting our needs.