Session 2a: Private and Community Cloud and OpenStack

Stig:

Allows different types of workload, e.g. Spark, Hadoop, elastic – reconfigure to high performance access to data
Multi-chassis lags is a “bit warty”, layer 2 to allow RDMA with RoCE
PB of storage, initially networks were very slow with VXLAN
SR-IOV is needed for very high performance using VLAN
Openstack dissolves knowledge of physical system, can’t explain hardware locality or numa
Mellanox tweaks, power saving off, not really gave much benefit, kernel update and hyper-threading off, gives performance but erratic, cpu pinning helps with noise
Containing openstack services in a pinned environment and NUMA pass through increases

Bear cloud:

Importance of user portals – self-service : Galaxy, GitLab, Jupyter
How to cloud enable current workloads
Mellanox ConnectX-4 VPI
Water cooling
100G infiniband
SpectrumScale (formerly GPFS) – s/w defined – optimise placement of files

  • Cinder and Glance integration
    Copy on write snapshot – perf iss.s with this
  • QCow2 as an alternative

How to manage multi-user env with global file sys

  • GPFS assumes client is fully trusted
    • Clients need to be precreated (in order to add GPFS access)!
  • Manilla (part of OpenStack) as an alternative – but only works on flat networks
  • Fudge NFS?
    • Containerize? – not possible
    • Write perf tailed off after 6 clients

Native GPFS

  • Read signif worse than write
  • RabbitMQ

Underpins all of OpenStack

  • Interesting challenges with HA

Ceilometer

  • Above 10 meters it breaks
  • With 100 hypervisors, reporting fails

 

Sanger:

Run various openstack releases during PoC -> production
Arista switches support VXLAN encoding, use double encapsulation due to ML2 plugin issues.
Encouraging users to use good practice to develop images with a set of tools to support this, e.g. packer which helps to give confidence of cross deployment stability
CI in gitlab and Test kitchen allows testing in various environments
Repository images need to be pulled from fixed source for critical components, and test needed to validate if a specific version needed