Towards HPC resiliency, breakout and temporary transition in the Cloud

Cliff Addison, University of Liverpool

This talk was motivated by the need to move an HPC system from one data centre to another and the extended downtime that users have to experience. A solution is to provide some form of HPC system in the cloud. There are some immediate challenges. You need to provide access to all users. That means that some way of reproducing on-campus authentication and authorisation is needed. Users will be accustomed to a particular range of packages being available; mirroring the login and node images therefore is necessary. Similarly, it probably is desirable for at least some user data to be copied over automatically onto the cloud HPC platforms. Large-scale cloud computing can become expensive, therefore some appreciation of budget and of what type and number of cloud HPC nodes are cost-effective is a pre-requisite. This talk is ideal for those looking at public cloud for Disaster Recovery, are wanting to know how public cloud supports on-premises expansion, and those interested in multi-cloud solution architecture.