|2 Aug 2022
Cancer is the leading cause of death globally, with nearly 10 million deaths per year. Rare diseases impact more than 400 million people worldwide, and 95 percent don’t have an approved treatment. In the vast majority of cases, both cancer and rare disease are diseases of the genome, caused by mono or polygenic variations. Organizations around the world are turning to genetics as the key to diagnosing and treating patients.
While each individual has a unique genetic code, researchers require robust cohorts of data from sick and healthy patients alike to identify similarities and differences in disease-causing regions of the genome. In all corners of the globe, governing bodies, research organizations, and corporations have established population-wide genomics projects designed to increase understanding of disease origins, identify new treatments, and drive genomics from research practice into healthcare settings.
Genomics England (GEL) was formally established in July 2013 as part of the 65th birthday celebrations of the National Health Service (NHS). Wholly owned by the Department of Health and Social Care, GEL was tasked with a flagship project to sequence 100,000 whole genomes from NHS patients with rare diseases and their families, as well as patients with common cancers. After the successful completion of the pilot project in 2018, the NHS announced that it would partner with GEL and the UK Biobank to sequence up to 5 million genomes in 5 years and make the data available for research.
To make genomic healthcare a reality, GEL is transitioning from project to platform, using Amazon Web Services (AWS) tools to give researchers reliable, comprehensive, and privacy-compliant access to these massive datasets. Through secure collaboration and analysis, this initiative will inform diagnoses, drive drug development, and unlock the future of precision medicine.
Enabling Scalability for Growing Genomic Datasets
Through the 100,000 Genomes Project alone, GEL amassed 50 petabytes of data – about three times the size of the entire Library of Congress. Seeking to make the data accessible to the research community, GEL is in the process of migrating its data to AWS to enable democratized access.
“We knew that putting the data generated from the 100,000 Genome Project into the hands of the research community would play a vital role in accelerating scientific breakthroughs, and we’re working to migrate our data to AWS to do just that,” said Peter Sinden, GEL’s chief information officer.
To generate a more comprehensive understanding of patient genomics, the organization will integrate “long-read” genome formats alongside the current “short read” format. Long-read genomes contain around five times the data of short read and this will give researchers more information about each part of the genome they study, potentially uncovering nuances that might have gone unnoticed before.
“As we continue to progress our work and generate more robust data sets, access to elastic storage and compute services will enable our organization and the research community to access and analyze data securely and cost-effectively,” added Sinden. ”By hosting on AWS, we can democratize access to our data. All researchers need is a small budget to fund compute costs and access to a computer.”
Turning Science into Healthcare through Technology
GEL’s mission is to create a scalable and durable data infrastructure that can evolve in tune with scientific and technological advances. Security, compliance, and democratized access were integral to the research platform, and GEL selected AWS Partner Lifebit to develop the platform on the AWS. Together with AWS, they have created a Trusted Research Environment (TRE) that will allow researchers to work collaboratively and glean insights from genomic data using advanced cloud computing tools.
“Our goal is to enable the ecosystem to deliver data-driven healthcare and conduct genomic research, and AWS enables us to securely achieve this mission,” said Sinden. “With access to the latest GPUs and other services, we can push the technological envelope and accelerate the adoption of genomics into healthcare.”
GEL is working with AWS Professional Services and AWS Partner Kainos as part of the AWS Migration Acceleration Program (MAP). By migrating petabytes of genomic data and on-premises research environments to AWS, GEL can help accelerate scientific outcomes. Using analytics and tooling in the cloud also makes the data more secure. Genomic data stored in Amazon Simple Storage Service (Amazon S3) offers 99.999999999% durability.
For genomic analyses and related computing needs, GEL is using Amazon Elastic Compute Cloud (Amazon EC2), unlocking reliable, resizable compute capacity in the cloud. This allows researchers and data scientists to adjust their compute capacity on demand, which is more cost-effective. Sinden also pointed out that operating on a pay-as-you-go basis is ideal for academic and government-funded research because it allows for flexibility to accommodate fluctuations in grant funding. Building on AWS with the help of its partners, GEL optimized its high-performance computing architecture for both cost and speed, allowing researchers to perform common tasks in just 23 seconds that previously took 25 hours.
Building the Future of Cloud Genomics on AWS
While many big data problems involve managing a large number of small files, genomics analysis usually involves a relatively small number of extremely large files. As a result, cloud genomics requires unique data distribution models. GEL is working with AWS to use compression technologies and other advanced tools to optimize cloud storage and analysis of genomic data based on the field’s specific needs. This industry-leading venture will pave the way for efficient, research-friendly genomic data management in the years to come.