Radoslaw Poplawski and Nick Loman, University of Birmingham
In March 2020, a partnership of academic laboratories and public health agencies launched COG-UK, a nationwide distributed genome sequencing project in order to monitor the evolution of SARS-CoV-2. In order to integrate, store, process and analyse this genomics data stream we deployed CLIMB-COVID on the UKRI-funded CLIMB cloud platform, working with the University of Birmingham’s BEAR project. This project has so far processed over 2 million viral genome sequences in the UK, and contributed to the discovery and detection of important new variants of SARS-CoV-2. In this talk we will describe the development and deployment of the hardware and software elements of the CLIMB-COVID platform and discuss some of the scaling challenges that were faced along the way, as well as describe some of the key scientific and public health outputs that have been generated by this unique surveillance dataset.
Oliver Gray and Przemyslaw Stempor, UK BioBank
UK Biobank is a world-leading biomedical resource with data describing the health, lifestyle, physical characteristics, metabolomics, and genetics of half a million UK participants. The UK Biobank Research Analysis Platform (RAP), powered by DNAnexus and Amazon Web Services (AWS), has been designed to accommodate the UK Biobank resource’s vast and dramatically increasing scale, providing accessibility to the data for researchers around the world. Here, we will provide a brief tour of the RAP and the UK Biobank data, and describe how researchers can use the RAP to achieve their scientific aims. We will also discuss the challenges encountered by ourselves and established users of our data in moving from a download-only system to a cloud-based framework.
Peter Martin, University of Bristol
Driven by step-change advances in cloud computing, the Internet of Things (IoT), and microcontroller technologies, progress in Machine Learning (ML) over the past decade has served to pioneer an increasing number of news technologies; from self-driving (or ‘driverless’) cars to enhanced weather forecasting, and from targeted produce advertising to self-cleaning houses. However, such vast and ever-growing computational intelligence to analyse, interpret, and streamline the potentially vast radiological monitoring dataset that is/could be continually collected using multiple survey ‘nodes’ as part of the UK’s national nuclear safety and security has yet to be applied. Presently, individual detection events are each investigated, as no wider “situational context” to their occurrence is applied – this is hence inefficient, costly and time-consuming as well as blind to small-scale/transient variations (and slow increases in activity) that may otherwise be missed in a large and unwieldy dataset. Work at the University of Bristol has sought to work alongside current academic and industrial collaborations to develop an Artificial Intelligence (AI) and ML system for the enhanced processing and evaluation of “Big Data” derived from such a large (and potentially unlimited) number of mobile (and fixed-position) radiological monitoring devices to yield a more informed detection response, therefore enhancing the UK’s current national radiological surveillance provision.
Cameron Kyle-Davidson, University of York
When radiologists evaluate mammograms images from the left and right breasts are shown concurrently. Radiologists remain capable of detecting abnormalities even up to three years prior to onset of cancer, and when said mammograms are presented rapidly. However, if the normal mammogram contralateral to the abnormal mammogram is replaced by that of another woman, this ability suffers a performance decrease. Evidently, a global signal that signals abnormality exists and is dependent on both mammograms. We investigated whether the effect also appears in a pre-trained neural network mammography model. Further we explored the effect of bilateral differences by developing and training a neural network model which can reliably detect whether a set of mammograms is composed of images taken from the same woman, or two different women. Detection of bilateral asymmetry remains even when mammograms are balanced by size and age; indicative that a “symmetry signal” exists and is relevant for breast cancer detection. We pilot off-site cloud GPU resources for both training and inference of the neural networks, which would have been intractable on our local hardware. In addition, we develop a semi-autonomous mammography dataset cleaning pipeline that can take advantage of high-cpu count cloud machines; through multithreaded image processing.
Matt Pryor, Stack HPC
Matt Pryor, John Garbutt, Matt Anson, StackHPC
Recent years have seen increasing divergence from the traditional HPC model, with researchers keen to take advantage of new and rapidly developing tools such as Jupyter Notebooks, Dask, Apache Spark and Kubeflow while still maintaining the ability to run existing codes in a traditional batch environment, all without sacrificing performance. The explosion of tools and platforms, coupled with the fact that many of these platforms also need to be customised for each use-case, places a heavy burden on the operators of traditional HPC systems where individual platforms are deployed and maintained by the operator on behalf of users. We demonstrate here how the Azimuth portal is able to reduce time-to-science and operational overhead by providing researchers with self-service access to HPC and machine learning platforms via a simple and intuitive user interface. Azimuth builds on work done at JASMIN, with funding from the IRIS collaboration, to present users with a catalogue of customisable platforms that they can deploy into their cloud allocation. Leveraging cloud-native technologies and automation, these platforms can be deployed on virtual machines or in Kubernetes clusters and are able to take advantage of hardware acceleration such as GPUs or RDMA networking without explicit configuration from the user. The Azimuth portal is in use at several IRIS sites, and is providing platforms for projects including the SKA.
Mike Jones, Independent Researcher
Tom Green, Cardiff University
We outline the service work underway at Cardiff University to inform the transition of our research computing community to the cloud. With the cloud now recognised as an attractive solution to bursting out on-premise HPC capacity, this project is both exploring such options while looking to provide a more thematic approach to cloud sourcing. Following a procurement for cloud services, including a performance evaluation exercise, AWS was selected to provide the initial resource for this pilot project. We discuss the methodology used in mapping the thematic usage profile of our 21,000 core on-premise cluster to assess the available environments. This usage varies from the compute intensive workloads associated with the Physical Sciences & Engineering community to the more data intensive workloads from Biology & Life Sciences. The cost and performance attributes of such workloads, based on a variety of use cases, is set to quantitatively inform the future procurement of HPC services. To help with project management and cost control, we have selected the Ronin user-interface that also permits management of the resources available to users. On completion of this project we will be better positioned to direct users from a cost and performance perspective in a future featuring a hybrid cloud and on-premise HPC service.
Steven Chapman, University of Bath
Jay DesLauriers, University of Westminster
As once cloud-hesitant industries are encouraged to move to the cloud to run computational workloads, shortfalls in technical skills and cloud knowledge become apparent. Manufacturing is one such industry. Over the past five years, the CPC at Westminster has participated in several European projects investigating Manufacturing in the Cloud. The most recent of these, DIGITbrain (digitbrain.eu), aims to build a platform for Manufacturing-as-a-Service and support the industry with running workloads and accessing Digital Twins on top of cloud infrastructure. The platform will feature open-source DevOps tools such as Kubernetes, Terraform and Ansible, integrated together inside the MiCADO Execution Engine (micado-scale.eu). End-users of the platform need not be familiar with these tools’ domain-specific languages or the cloud platforms and middleware that support their workloads. Instead, users will provide values for pre-defined metadata fields that will describe the microservices, models, data and infrastructure that make up and support their workloads. This metadata will be automatically compiled down to an intermediary language based on the OASIS TOSCA Specification (Topology and Orchestration Specification for Cloud Applications). The intermediary language can be interpreted by MiCADO and transformed into formats understood by orchestration tools such as Kubernetes and Terraform, which will execute the users’ workloads.