It’s been a few months since our November workshop so there’s been some time to digest and reflect on some of the common themes emerging. Having attended a couple of other conferences and workshops from my community (AGU and AMS) it’s interesting to compare.
Firstly, it was great to see such a variety of application areas represented. For this our second annual workshop, we opened it for the submission of abstracts and this made such a difference. There was a great response, Life sciences having the margin on other domain areas. We had 160 register and 120 on the day. It was fantastic to have the Crick as a venue. It worked really well.
The first session looked at applications of hybrid and public cloud. Two really interesting use cases (Edinburgh and NCAS, NERC) looked at trying out HPC workloads on public cloud. This raised issues around comparative performance and costs between public cloud and on-prem HPC facilities.
On AWS, Placement Groups allow instances to put close to one another to improve inter-node communication for MPI-based workloads. This showed comparable performance with Archer (UK national supercomputer) for smaller workloads but clearly there was some limit as this tailed off as the number of nodes increased whereas Archer performance continued linearly with increase in scale. This tallies with what I’ve seen anecdotally at the AMS conference where there seemed to be on the one hand increasing uptake of public cloud for use with Numerical Weather Prediction jobs (which need MPI). However, this seemed to be being done for smaller scale workloads where they can stay within the envelope of the node affinity features available.
Another theme was portability – what kind of approaches can be used engineer workloads so that they can be easily moved between providers. Andrew Lahiff from STFC, presented a very different use case showing how container technologies can be used to run for Particle Physics, cases where there the focus is high-throughput rather than HPC requirements and so much more amenable to cloud. This work has been done a part of a pilot for the Cloud Working Group to specifically investigate how containers and container orchestration technology can be used to provide an abstraction layer for cloud interoperability. A really nice slide showed Kubernetes clusters running on Azure and Google cloud managed from the same command line console app. Dario Vianello’s talk (EMBL-EBI) showed how an alternative approach using a combination of Ansible and Terraform can be used to deploy workloads across multiple clouds.
It was great to have talks from hyper-scale cloud providers AWS, Azure and Google. The scale in hyper-scale is as ever impressive as is the pace of change in technology: very interesting to see Deep Learning applications driving the development of custom hardware – TPUs and FPGAs. Plans underway to host data centres in the UK will ease uptake. OpenStack Foundation and Andy McNab‘s talks showed examples of federation across OpenStack clouds.
In the private cloud session, Stig Tefler gave a nice illustration of network performance for VMs showing how a number of aspects of virtualisation can be changed or removed to progressively improve network performance towards line rate. Alongside talks on private cloud, the parallel session looked at Legal, Policy and Regulatory Issues a critical issue for adoption of public cloud. Steven Newhouse gave some practical insights from a recent cloud tender for EMBL. There is clearly need for further work around these issues so that the community can better informed about choices. This is something that the working group will be taking forward.
For this workshop, we experimented with an interactive session – bringing together a group of around 20 delegates to work together on some technical themes agreed ahead of time including bulk data movement and cloud-hosting of Jupyter notebooks. There was plenty of useful interaction and discussion but we will need to look at the networking provision for the next time to ensure groups can get on with technical work on the day.
We discussed next steps in the final session. There is clearly interest in taking particular areas forward from the meeting: focus groups on technical areas like HTC and use of parallel file systems with cloud or organised around specific domains within the research community. Training figured also, in the form of a cloud carpentry course so that researchers can more readily get up and running using cloud. Looking forward, in each of these case we’re looking for discrete activities with an agreed set of goals and something to deliver at the end. Where possible we’re seeking to support relevant work that is already underway and initiate new work where there are perceived gaps. We will be looking at running smaller workshops targeted at specific themes in the coming months as a means to engage and disseminate some of this work.
Phil Kershaw, STFC & Cloud-WG chair