The IMPUTE Module: Enriching Low-Coverage and Microarray data

Fast and scalable genome imputation of Low-Coverage WGS and Microarray data in just minutes

Allelica’s PRS pipeline is built around a set of interconnected modules. Each module is designed to give the user enough control to run it as needed whilst leaving all the hard computational work behind the scenes. Which modules a user needs will depend on their specific needs, but the complete PRS pipeline has been developed to be able to run a full workflow on a starting dataset containing just two things: genomic data from your sample population and a set of summary statistics from a genome-wide association study (GWAS) for a disease or trait of interest. (Users interested in computing a PRS for an individual using already available PRS can use the PREDICT module with input genomic data only.)

Why do we need to impute genetic data?

Because genomic data input into a PRS pipeline will often not contain all the genetic variants listed in GWAS summary statistics, the first step of many analyses will be to impute missing genetic variants. But why do we need to do this?

Imputation maximises the power of genetic data

Despite the availability of WGS technology, the vast majority of large genetics projects generate data on individuals using genotyping chips or low-coverage WGS. However, even if these approaches only generate data at a few hundred to a few million genetic variants per individual, we can use imputation to fill in the gaps between these variants.

Allelica’s IMPUTE module

In addition to the diversity of the reference panel, the quality of imputation is influenced by several factors. These include the frequency of the variants that are present in the populations, and the method that is used to impute, amongst other things.

Allelica is a Software Genomics Company developing algorithms and digital tools to accelerate the integration of Polygenic Risk Score in the clinical practice