An Introduction to DosNA: Distributed Numpy Arrays for High-performance cloud computing

Gabryel Mason-Williams, Rosalind Franklin Institute

The cloud-primarily deals with data as object stores such as S3; however, HPC data processing is primarily done using filesystems such as HDF5, which can make offloading data to the cloud difficult. DosNa is a python wrapper that can distribute N-dimensional arrays over an Object Store server. The main goal of DosNa is to provide an easy and seamless interface to store and manage N-Dimensional datasets over a remote cloud. It supports S3 and Ceph backends and allows parallelised data access through the MPI engine. Currently, features to allow for converting HDF5 files to DosNa Objects, an API to visualise data, object locking, BLOSC compression, and checksums are underway. This talk introduces DosNa and showcases the current features and what’s to come.

One Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s