Large Scale Genomics Datasets in the UK

5 min readFeb 28, 2020


We previously wrote about the enormous impact that the UK Biobank (UKBB) is having to translational research. Although the scientific outputs from this project are now beginning to stack up — as of February 2020, 1053 scientific papers had been published using the resource — the project was originally established almost 15 years ago, so these are the fruits of many years of labour.

The UKBB took several years of negotiation and planning to set up, followed by a further four years of participant engagement and recruitment, and then additional time for the subsequent data generation. From developing appropriate participant consent processes, to working out how best to make data available to as many groups as possible, there was a great many ethical, regulatory and fiscal challenges in setting up the UKBB. But these set the foundations for additional innovative biomedical research projects in the UK and further afield.

In this post I want to briefly highlight how the biomedical innovation landscape in the UK has been shaped by a series of forward looking political strategies which have explicitly aimed to maintain the country’s place at the forefront of health research and development. Implementing data-driven approaches to healthcare, including those involving genomic technology, will require significant readjustment and engagement, and in several ways the UK is leading the thinking on how to implement such systems.

The UK’s regulatory environment and state-sponsored National Health Service (NHS) mean that it’s well placed to implement broad-reaching policies that clearly aim to benefit the whole nation, and so it’s a particularly interesting nation to follow. On top of this, as we’ll find out, the UK has grand ambitions for the NHS to be the world’s leading genomic healthcare system, so it’s exciting to follow for those of us focused on developing products for translating research into healthcare technology.

The UK Life Science Industrial Strategy

In 2017 a major new strategy was published for the life science sector in the UK. Led by Professor Sir John Bell — the UK’s ‘Life Sciences Champion’ — this strategy acknowledged the UK’s strong scientific base in the life sciences and made a number of recommendations aimed at maximising its future translational potential. These recommendations were based around five key themes: continued support for UK science; an industrial growth environment; strengthening industry and academic collaboration with the NHS; a better and more integrated use of data in healthcare; and development of a sufficient skills base.

The strategy document is well worth a read, but a key component of it is that industry, academia and the NHS should increasingly work together to share advances in scientific research and translate these into healthcare. Although pursuing the strategy will involve a broad spectrum of initiatives from ensuring and increasing funding in healthcare innovation, to upskilling the healthcare workforce — the subject of last year’s excellent Topol Review, Preparing the healthcare workforce for the digital future— an important aspect of the strategy involves bringing together and sharing of different types of healthcare-relevant data.

Accessible national scale genomic data

Even though the strategy was published only two and half years ago, it didn’t come out of the blue: a number of foundational proof-of-principle projects were already underway. If we take data as an example, in addition to the UK Biobank which was started back in 2007, in 2013 the UK government incorporated Genomics England. A company wholly owned by the Department of Health and Social Care (a UK government ministry), Genomics England was set up to deliver the 100,000 Genomes Project, an ambitious undertaking to sequence 100 thousand whole genome sequences from NHS patients with rare diseases and some common cancers.

With translation at its core, Genomics England aims to bring sequencing technology into the NHS and, as a company, was specifically designed to be flexible to the market and to work with researchers and companies to share and work on the genomic data it produced.

With 100,000 genomes now sequenced, in 2018 UK Secretary of State for Health, Matt Hancock, announced that government intended to sequence an additional 1 million whole genomes, split across the NHS (via Genomics England) and UKBB, and he also announces that from 2019 all seriously ill children with a suspected genetic disorder, including cancer, would be offered whole genome sequencing by the NHS.

The NHS is building a genomics-based service to translate genomics into the healthcare practice. The NHS Genomics Medicine Service, working closely with Genomics England will work to identify potential patients and, importantly, to develop appropriate methods to feedback results through clinicians and genetics councillors.

As CEO of a company focused on developing polygenic risk score (PRS) technology for use across a range of scales, from the population to the individual, I’m also incredibly excited about recent announcements by the UK government of the Accelerating the Detection of Disease (ADD) project. This initiative aims to perform millions of PRS on individuals with the option of re-contact so that the results can be explained and integrated into disease prevention and early identification strategies.

Fostering an innovative environment

These national projects, which are being driven by the UK Life Sciences Industrial Strategy, are part of a bigger political effort to build a more efficient healthcare system in the UK. The establishment of several large-scale genomics projects show that the UK is leading the charge for a data driven, genomics-informed NHS.

Imitation is the sincerest form of flattery

The UKBB wasn’t the first national biobank, but because of its size, the breadth of the data generated and the openness of its data sharing policies, it’s one of the most high profile. Nevertheless, the idea of a national biobank that stores multiple ‘big’ data types for healthcare which can be securely shared across academia and industry is catching on. Amongst others, several European countries already have or are building biobanks (e.g. FinnGen in Finland, and the Estonian biobank, which has been around of at least 20 years), and even further afield in China, Japan and, importantly, in Nigeria. From a scientific point of view this is welcome, as a key challenge for generating accurate PRS is data from the population in which that PRS is being performed. We’re looking forward to working to make our PRS applicable to populations outside of western Europe.

You can learn more about how Allelica is using large genomic databases to develop PRS technology to be utilised at scale on our website.

Originally published at on February 28, 2020.




Allelica is a Software Genomics Company developing algorithms and digital tools to accelerate the integration of Polygenic Risk Score in the clinical practice