Supporting UK Covid-19 surveillance with AWS Step Functions and Fargate at Wellcome Sanger Institute

Sam Proctor, Wellcome Sanger Institute

During the COVID-19 pandemic the Wellcome Sanger Institute has been responsible for providing PHE/UKHSA with timely information on covid lineages. In order to provide the most accurate picture of the pandemic, all samples required frequent re-processing as new lineages were detected. As a proof-of-concept we implemented a hybrid pipeline using Airflow and the AWS cloud to enable at scale processing of all samples, resulting in processing times an order of magnitude less than that of using local infrastructure. Airflow was used to orchestrate tasks running locally whilst AWS Step Functions were used to manage tasks running in the cloud, this combination worked well in practice. Architected to ensure sensitive data remained on local infrastructure. Use of the AWS CDK to author the cloud stack in C# and Python allowed for fast development and ease of environment separation. Docker Images allowed us to use existing code, rapidly deployed into AWS. Whilst making extensive use of AWS Lambda and AWS Fargate eliminated the need to manage clusters. Discussed are the lessons learnt from this project and the benefits that we have seen. It serves as a useful reference for those wishing to undertake a similar project.

Ensuring fairer access and reducing obstacles to research in fixed capacity clouds

Paul Browne, University of Cambridge & Pierre Riteau, Stack HPC

Presenting resources as cloud now provides a familiar access mode for many research disciplines. For large-scale HPC and ML, optimal infrastructure is still best provided as on-premise, often in siloed systems that can create a disjunct in operation or service, but recently, however, such systems can be delivered in a cloud-native form. Hybrid cloud in which on and off premise resources can be exploited promises the best-of-both-worlds as research organisations explore the most cost-effective way of providing computational resources across the full gamut of education and research. In this talk we present an overview of the on-premise Cambridge cloud and how we are presenting clusters via a CaaS portal that can be used to create and manage platforms within multiple clouds, heralding a path to wider exploitation of resources.

King’s CREATE: a new research computing ecosystem and the journey so far

Matt Penn, King’s College London

In summer 2020 the King’s College London e-Research team started reviewing options for replacing an ageing HPC cluster and OpenStack private cloud. Our primary goal for the refresh was to build a tightly integrated ecosystem with a high-degree of flexibility, catering to traditional scheduled HPC and more bespoke virtualised enclaves. Building King’s Computational Research, Engineering And Technology Environment (CREATE, launching Q1 2022) has taken us on a journey involving selection of OpenStack provisioning frameworks, opening a new data centre facility, adoption of Ubuntu and CephFS, institutional MFA integration, re-integration of incumbent storage and compute hardware, re-tooling our approach to software builds and vulnerability management. Our talk will describe our experiences building this ecosystem from the ground up which supports a highly diverse research community with workloads of all shapes and sizes.

The PITHIA-NRF e-Science Centre – towards a Cloud-based Platform to support Ionosphere, Thermosphere, and Plasmasphere Research

Tamas Kiss, University of Westminster

The PITHIA Network of Research Facilities (PITHIA-NRF) project, funded by the European Commission’s H2020 programme, aims at building a distributed network, integrating observing facilities, data collections, data processing tools and prediction models dedicated to ionosphere, thermosphere and plasmasphere research. One of the core components of PITHIA-NRF is the PITHIA-NRF e-Science Centre that supports access to distributed data resources and facilitates the execution of various scientific applications on cloud computing infrastructures. The development is led by the University of Westminster, in strong collaboration with EGI. When designing and implementing the e-Science Centre, we follow a novel approach, based on the dynamic creation and instantiation of cloud-based reference architectures composed of multiple application components or microservices, described in the form of a deployment descriptor, that can be automatically deployed and managed at run-time. A reference architecture can include various components, such as generic or custom GUIs, data analytics, machine learning, simulation or other scientific applications, databases, and any other components that are required to realise a particular user scenario. This presentation focuses on the design principles of the e-Science Centre and demonstrates proof-of-concept cases studies.

Maintaining versioned 3D digital designs using a hybrid and multi-cloud solution

Niall Kennedy, YellowDog

CAE Tech’s OASES3D (Open Architecture Storage and Execution Service for 3D) feasibility project is designing and prototyping a software architecture for maintaining 3D digital design and analysis data in a robust, version-controlled, scalable and future-proof way. The architecture is platform-agnostic, however a reference implementation is demonstrating and validating the feasibility as a core approach for management of design and analysis data at UKAEA (UK Atomic Energy Authority). A key feature of the OASES3D solution is to react to new versions of CAD or other design data by triggering analyses or simulations, so results can be tracked over the history of the design project. For the UKAEA STEP (Spherical Tokomak for Energy Production) project this is important as the design process will be long, complex and involve a large team, with a need for traceability of all decisions made. This dynamic creation of computing tasks results in a need for flexible provisioning of cloud compute resources and task scheduling. The scale of each task is not known in advance, and potential impacts of a design change could include a multitude of tasks. YellowDog has been chosen to provide this elasticity to create cloud compute clusters on demand at any scale, anywhere.

An Introduction to DosNA: Distributed Numpy Arrays for High-performance cloud computing

Gabryel Mason-Williams, Rosalind Franklin Institute

The cloud-primarily deals with data as object stores such as S3; however, HPC data processing is primarily done using filesystems such as HDF5, which can make offloading data to the cloud difficult. DosNa is a python wrapper that can distribute N-dimensional arrays over an Object Store server. The main goal of DosNa is to provide an easy and seamless interface to store and manage N-Dimensional datasets over a remote cloud. It supports S3 and Ceph backends and allows parallelised data access through the MPI engine. Currently, features to allow for converting HDF5 files to DosNa Objects, an API to visualise data, object locking, BLOSC compression, and checksums are underway. This talk introduces DosNa and showcases the current features and what’s to come.

The Genes And Health TRE in the Google cloud

Vivek Iyer, Wellcome Sanger Institute

Genes & Health (G&H) project is a large (c 50,000 donors) population-based cohort of British Pakistanis and Bangladeshis. The project’s goal is to investigate the genetic contribution to common diseases in this community, and to identify rare homozygous gene knockouts and their consequences. This involves jointly analysing genetic data and electronic health record data inside a certified TRE. The project is a collaboration between QMUL, KCL, WSI and a consortium of pharma partners. The G&H TRE is provisioned and administered by us (WSI) entirely in a Google cloud environment built solely for this project. It allows separated bubbles of scientists from different organisations to work securely with virtual desktops on sensitive data, whilst still being able to share selected data between bubbles. The TRE also allows users to run High Performance Compute (HPC) within their secure bubbles. I will sketch why the project chose this TRE, its architecture, the way it’s used, and how we are approaching certification. (Note: The codebase was licensed from U Helsinki and written by Solita for the Finngen project.)

Data Safe Haven Classification and Trusted environments in the cloud: extending a Turing Django based application across multiple institutions

Rebecca Osselton, Newcastle University

Data Safe Havens provide a cloud deployed secure and robust research environment for dataset exploration. This data may be sensitive in nature and the Data Safe Haven gives institutions a trusted environment in which to develop and extend their research. Safe Havens require a level of security classification from where no sensitive data is used, to the highest level of security, such as those needed by governments and defence agencies. The Data Safe Haven Classification app is a web-based Information Governance application that guides stakeholders through a process to determine the correct level of classification. Users have defined roles within the system and must answer a sequence of questions to determine the correct level of security. The app exists independently from the Safe Haven environment and allows the linking of multiple datasets across work packages, giving flexibility to institutions, while holding no sensitive data internally. Work to improve and increase the portability of the classification app is underway with multiple institutions including University College London, Newcastle University, University of Cambridge and the Alan Turing Institute. This presentation will discuss features of the app and challenges in its distribution across different institutions, in terms of technology and policy.

TREEHOOSE: Trusted Research Environment and Enclave Hosting Open Original Scientific Exploration

Simon Li, University of Dundee

Trusted Research Environments (TREs) are critical for enabling research on sensitive data. Traditionally, they require large, up-front capital, investment in specialist infrastructure which can struggle to keep pace with user demands for increased power and flexibility. At the Health Informatics Centre (HIC) we have designed a TRE in the cloud to be able to scale with the additional demands made by complex imaging datasets and machine learning experiments. The implementation required considerable custom work, with a challenging learning curve for operations staff. We are developing an open-source toolkit including code and documentation to streamline the deployment of a public cloud TRE. It will share the knowledge and lessons learnt so far in developing and running the HIC TRE and assist more institutions in making their data securely accessible at scale. This includes processes for management of customised research environments, and examples of taking advantage of specialised cloud services that are challenging in a traditional TRE. The toolkit will enable future federated analytical workflows across TREs, since a common codebase, and ultimately open standards, aids portability of code and reproducibility.