Supporting UK Covid-19 surveillance with AWS Step Functions and Fargate at Wellcome Sanger Institute

Sam Proctor, Wellcome Sanger Institute

During the COVID-19 pandemic the Wellcome Sanger Institute has been responsible for providing PHE/UKHSA with timely information on covid lineages. In order to provide the most accurate picture of the pandemic, all samples required frequent re-processing as new lineages were detected. As a proof-of-concept we implemented a hybrid pipeline using Airflow and the AWS cloud to enable at scale processing of all samples, resulting in processing times an order of magnitude less than that of using local infrastructure. Airflow was used to orchestrate tasks running locally whilst AWS Step Functions were used to manage tasks running in the cloud, this combination worked well in practice. Architected to ensure sensitive data remained on local infrastructure. Use of the AWS CDK to author the cloud stack in C# and Python allowed for fast development and ease of environment separation. Docker Images allowed us to use existing code, rapidly deployed into AWS. Whilst making extensive use of AWS Lambda and AWS Fargate eliminated the need to manage clusters. Discussed are the lessons learnt from this project and the benefits that we have seen. It serves as a useful reference for those wishing to undertake a similar project.

One Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s