Autoscaling, Reservation, Contention and Preemption – the Coral Reef Cloud

Pierre Riteau, StackHPC Ltd

Typically, OpenStack clouds use quotas to manage the sharing of resources. Quotas really only limit the resources any one project can consume. Statically dividing up your cloud between all the projects using the cloud is time consuming, and leads to heavy underutilisation of resources, while at the time often starving some groups of users of the resources they need. We explore how reservations can be used by multiple projects to negotiate when they can get access to a large chunk of resources at some point in the future, and furthermore, how preemptible workloads can help increase utilisation by making use of the gaps between reservations. We look at how DIRAC and vCycle are proving to be useful tools to create preemptible resources. We describe how computing platforms such as Slurm can be extended to provide demand-driven infrastructure autoscaling. Looking to the future, we explain how auto-scaling platforms, such as Kubernetes and Slurm, can scale down and release unused resources back to OpenStack, to be used by preemptible workloads, such that auto-scaling platforms can get their resources back when then need to scale back up.