How to apply polygenic risk scores across diverse populations

Allelica
5 min readJul 30, 2024

--

George Busby, Allelica CSO & Co-Founder

Polygenic risk scores (PRS) predict the likelihood of developing complex diseases based on an individual’s genetic makeup. However, their application across different genetic ancestries presents numerous challenges. These difficulties arise primarily from the differences in genetic ancestry — a term distinct from ethnicity — and the biases associated with imbalances that are inherent in current genetic research datasets.

At Allelica, we are committed to ensuring that the PRSs that we develop for clinical use are optimized for individuals of diverse genetic ancestries. In this article we are going to highlight the challenges of applying PRSs across diverse populations and what can be done to overcome these challenges when developing a clinical PRS test.

Categories of racial descriptors have changed over time due to shifts in scientific, political and social thinking about race and ethnicity. (source NHGRI Website: https://www.genome.gov/about-genomics/policy-issues/population-descriptors-in-genomics)

Genetic Ancestry vs. Ethnicity

First, we need to establish some terminology that will allow us to be on the same page when referring to ancestry.

Several authors have articulated the concepts of genetic ancestry and ethnicity in the context of population descriptors for genomics and health research (Bentz et al 2023; Khan et al 2022; Lewis et al 2022; Mathieson and Scally 2020; Wagner et al 2023; NHGRI website). These are important concepts to get right because we know that disease risk is influenced both by genetic factors relating to ancestry, but also social determinants of health which can be linked to an individual’s ethnicity.

Genetic ancestry refers to the genetic makeup inherited from one’s ancestors, tracing back to ancient populations. It is determined by the genetic variations present in one’s DNA. Some have argued, correctly, that what we mean when we say genetic ancestry is more accurately described as genetic similarity. So when we talk about genetic ancestry, we are referring to the similarity between people’s genomes that is due to shared ancestral history.

A representation of global genetic diversity in the 1000 Genomes Project (Auton et al 2015).

So for the purposes of this current discussion, genetic ancestry means genetic similarity.

In contrast to the inherently genetic description of ancestry presented above, ethnicity is a social construct, encompassing cultural, linguistic, and historical factors. While there may be overlaps, genetic ancestry and ethnicity are not synonymous.

For example, individuals with African descent who currently reside in the United States often self-identify as “African Americans” or “Black” to use the current US racial classification system. These labels acknowledge both that these individuals have recent African ancestry, but are also American. They also represent cultural and social distinctions and are associated with a rich cultural heritage as well as a far more troubling history of systematic discrimination.

Whilst tempting to assume that individuals who self identify as Black are equivalent to individuals with African genetic ancestry, these two labels are incompatible. Black individuals can have a range of diverse genetic ancestries, both from within Africa but also from other non-African populations with black skin, and may also have more recent non-African genetic ancestry from European populations as a result of interracial marriages in the past. So we can’t assume that anyone that identifies as Black has 100% African genetic ancestry.

At this point, we also want to acknowledge that labelling genetic ancestry by the continent in which most individuals with that ancestry currently reside is problematic and imprecise. If we look at a global scale, genetic variation is continuous. Africa is a hugely diverse continent both ethnographically and genetically. So there is no single African genetic ancestry.

However, given that our current focus is on adapting and optimizing genetic tests for individuals on the basis of inherent genetic similarities present as a result of shared ancestral history, we believe that a pragmatic but flexible approach to ancestry definitions is necessary.

Challenges in PRS Application

So how does variation across populations due to genetic ancestry and ethnicity differences lead to challenges in applying PRSs?

1. Population Bias: PRSs developed using European datasets cannot be naively used to estimate risk in diverse populations. Distributions of PRS values vary across populations, and extensive validation and calibration are needed in ancestry-specific populations to ensure that PRS have utility.

2. Genetic Diversity: Different populations have unique genetic architectures. Variants associated with disease in one ancestry might be different in another. Variants can also have different impacts and frequencies in different populations. This diversity requires further analysis to ensure the transferability of PRS across populations.

3. Linkage Disequilibrium: The patterns of linkage disequilibrium (LD), the non-random association of alleles at different loci, vary between populations. PRS developed from European data might not capture the LD structure in African or Asian populations accurately.

4. Environmental and Lifestyle Factors: These factors, often intertwined with genetic data, can influence disease risk differently across populations, affecting the validity of PRS when applied universally.

Addressing the Challenges

To make PRS applicable to all genetic ancestries, several strategies and novel approaches have been developed:

1. Diverse Genomic Data: Increasing the representation of diverse populations in genomic studies is crucial. Efforts like the All of Us Research Program have built a diverse genetic database to enhance PRS accuracy across different ancestries.

2. Transethnic Genome-Wide Association Studies (GWAS): Conducting GWAS across multiple ancestries can identify common and ancestry-specific variants, improving the transferability of PRS .

3. Multi-Ancestry PRS Models: Developing models that incorporate data from various ancestries can enhance the predictive power of PRS. Techniques like meta-analysis of GWAS data from different populations are being employed.

4. Machine Learning Approaches: Advanced algorithms can integrate genetic data from diverse populations, learning to adjust for population-specific patterns in genetic variation.

5. Functional Genomics: Understanding the biological mechanisms behind genetic variants can help identify which variants are likely to be causal across populations, aiding in the refinement of PRS .

Allelica’s ancestry-first approach to developing and implementing clinical PRSs

Allelica’s multiancestry PRSs are developed to overcome the challenges of applying genetic tests to diverse populations.

  1. We utilize diverse datasets.
  2. We perform ancestry-informed finemapping,
  3. We optimize different PRS models for different groups
  4. We build ancestry specific risk distributions to accurately validate and translate a PRS to specific populations and admixed individuals.

You can read much more detail about these developments in our recent papers (Busby et al 2023; Busby et al under review).

Conclusion

The application of polygenic risk scores across different genetic ancestries raises complexities due to population biases, genetic diversity, and differing environmental influences. However, by diversifying genomic data, employing multiancestry models, and utilizing advanced machine learning and functional genomics, Allelica has developed accurate and universally applicable PRS. These efforts ensure that the benefits of genetic research and personalized medicine are equitably distributed across all populations.

--

--

Allelica

Allelica is a Software Genomics Company developing algorithms and digital tools to accelerate the integration of Polygenic Risk Score in the clinical practice