WORKSHOP: December 2015

IMG_20151201_100351

This inaugural workshop brought together a range of e-infrastructure providers from across the research councils in the UK and the university sector and examples of prominent international collaborations such as CERN and the SKA.    As such it provided an informative snapshot of the current state of cloud adoption for research in the UK context.

The sessions were divided up into the broad topics private cloud deployment, federation and use of public cloud and experiences with cloud-hosted applications.  Some strong common themes emerged.  For the most part, organisations are in the early stages of adoption with cloud.  There was some novel use of public cloud which I’ll return to later but private and community cloud deployment predominated.   This was perhaps unsurprising in that the workshop was representing the larger research infrastructure providers.  What are the drivers for use of cloud?   The maturing of OpenStack as a viable open source solution for private cloud is one.  I was struck by the ranges of groups and activities represented at an impromptu workshop over the summer organised to bring together research groups with an interest in OpenStack.  The landscape has changed rapidly over the last couple of years.  STFC’s Scientific Cloud and JASMIN are notable exceptions though using OpenNebula and VMware respectively.  STFC’s cloud grew out of a project to support the CERN LHC (Large Hadron Collider) Tier 1 at Rutherford Appleton Laboratory and has been successfully used as an applications development environment and resource to burst into for batch compute.

Application of cloud is being driven by the challenges of scale that mean traditional models for infrastructure provision are becoming no longer tenable.  In the presentation from CERN for example we heard that they expect 100s PB per year within the next ten years.  We heard (John Taylor, University of Cambridge) how the SKA is developing a tiered data delivery architecture to cope with the huge volumes of data expected with cloud as the core model for data dissemination to the user community. However the issue of scale does not apply to data volumes alone.  As the presentation from the Wellcome Trust Sanger Institute (Tim Cutts) highlighted, that there are requirements to address the needs of multiple communities and sectors including industry and academia and international project consortia.  Here many of the core characteristics of cloud are attractive – multi-tenancy, flexibility, scaling and accounting capabilities neatly fits with the collaborative nature of the research community and its changing needs.   

CVJuS2aWwAIVtNO

A key challenge for these new private cloud deployments is not necessarily the cloud technology itself but where it and the new modes of operation it brings intersect with more established infrastructure and the specialist needs of many types of scientific workload.  eMedlab is a good example with the requirement for high-memory nodes important for biomedical applications.  This and other projects have had to be delivered in aggressive timescales from initial procurement through to deployment and first operations.  Use of global file systems is widespread in the research community and integration between these and cloud was a common theme across many of the presentations.  They can provide incredible i/o performance for scientific workloads but the associated POSIX model of file permissions limit the ability to scale numbers of users and groups and is incompatible with many of the new possibilities cloud enables.  Simon Thompson in his CLIMB presentation pointed out the issue of trying to connect untrusted VMs with a POSIX file system that has a global uid/guid namespace.  Object store technologies are being explored for use with clouds in the research community but they require a shift in thinking for the user community and impose the burden of porting legacy code.

Jan van Eldick reported in his talk that CERN have used to OpenStack to consolidate computing provision spanning so-called ‘pets’ – hosts with long running critical services to ‘cattle’ – short-lived VMs for batch compute.  Tuning of the KVM hypervisor has significantly reduced virtualisation overhead though they are considering MaaS (Metal as a Service).   CERN have federated their cloud across two sites (CERN and Wigner, Budapest) using OpenStack cells but the data volumes expected in coming years (~100PB/year) may require bursting to public cloud.  Another cloud venturing into federated capability is CLIMB.  It’s an ambitious project for the microbial bioinformatics community, with OpenStack deployments across four universities: Birmingham, Cardiff, Swansea and Warwick.  Early operational experience has demonstrated the demand for such a system but also some of the associated challenges such as how to get best utilisation of compute across sites and across varying spec hypervisors.  Single sign-on (SSO) and high-availability capbility for management interfaces are first goals for federation.   David Chadwick presented on work to integrate ABFAB (Application Bridging for Federated Access Beyond web), protocol-independent SSO into OpenStack’s security component, Keystone.  An IETF protocol, ABFAB allows federated access over Eduroam infrastructure.  This integration work provide an insight into OpenStack’s current SSO capability.

IMG_5881

The two following presentations explored how to provide a platform for research users approaching from different angles.  The first example is from JASMIN  a data hosting and analysis facility funded by NERC to serve the needs of the environmental sciences research.   The ESA (European Space Agency) OPTIRAD project, required a collaborative research environment to help with sharing of data and algorithm development from satellite-derived observations of the earth.  Based on positive experiences of one of the project partners, UCL, with the IPython Notebook, the team decided to roll out a hosted Notebook service on JASMIN’s cloud.  The Notebook provides a web-based interface enabling users to code, document and share algorithm implementations in Python and scale out processing and storage using the extended resources available from the cloud.   With relatively modest resources use of the off-the-shelf open source software was key: JupyterHub provided a means to host notebooks in a multi-user environment and demonstrated the potential of new container technologies such as Swarm and Docker.  STFC Scientific Computing are also exploring the use of containers with Apache Mesos an exciting new technology which may be disruptive to or complement cloud.

The second example of a research platform is e-Science Central which aims to build an environment for researchers to do their science without the need to be system administrators, essentially using a SaaS approach to abstract away the complexities of the underlying hosting infrastructure.   Simon Woodman shared from a number of years experience that Newcastle have had working in this area.  There are tensions between service provision and research. – Although only one deployment was envisioned originally, it’s required the provisioning of many more to meet individual needs.  Newcastle has also been running a Cloud Innovation Centre working with regional firms especially SMEs to assist them in the use of cloud.   Resources are allocated using a combination of OpenStack for private cloud and Azure.  There are issues to tackle, for example long term data storage required by institutions doesn’t have direct comparator with public cloud and capex operating models dont necessarily fit with opex funding required for public cloud.

Eduserv, a not-for-profit organisation working in the public sector, provided another example of a hybrid cloud model using a combination of AWS and private cloud for customers in the public sector.    The original development was too complicated and there was a need to understand the fundamental difference between virtualisation and cloud.  The move to AWS has yielded operational benefits with the access to interfaces and APIs for management of services.   Matt Johnson shared how this has allowed aspects of the infrastructure operation to be recorded as code and for the system to become self-documenting.  Their experience has been that there have been legal and policy benefits to moving to AWS and that public cloud is at least as secure as on-premise.   Cambridge University are also exploring hybrid cloud but in their case for HPC workloads.  The AWS-Zenotech-HPCS proof-of-concept will investigate practicality, cost and performance for bursting from Cambridge’s HPC Services to a VPC (Virtual Private Cloud) on Amazon.

Working undertaken by UCL’s Cosmology group provides an example of what can be achieved by jumping into direct use of public cloud and notably using specific resources tailored to the problem domain – low latency and HTC capabilities from Azure and AWS and Google’s Deepmind machine learning suite.  The applications include study of exoplanet spectra, modelling the creation of universes and astrochemistry.  The data volumes involved mean that traditional techniques become impracticable with scale. The ability to automate procedures and expand compute resources are essential requirements.   The projects have been successful but uncover issues to consider for the future such as how to bootstrap cloud-hosting in the first place, who are the contacts, what is the process?  Also the issues of data movement between on prem. and cloud and how to effectively cost for long term sustainability.

In summary it’s an exciting time of change to observe: growth in adoption of cloud is fast.  Private cloud is being applied widely.  UK e-infrastructure providers are transitioning to cloud service models underpinned by specialised high-performance compute, networking and storage solutions.  There challenges reconciling the cloud model with some of the traditional technologies particularly POSIX-based global file systems.  Hybrid looks likely to become more commonplace as private cloud use matures and the needs to scale drive expansion.  There are some great examples of public cloud use for research but guidance is needed on when, where and how to best utilise it and provide sustainable solutions.