PRS-powered disease risk reporting

Predicting disease risk with PRS

A PRS is a single number, a statistic that is calculated by adding up the effect sizes for all the variants in a PRS panel where an individual has at least one risk allele. Although there are multiple breast cancer PRS, and multiple ways of developing a new breast cancer PRS, we’re going to assume that we have a score (such as ours) that has been validated and tested on independent datasets, and that has a robust association with breast cancer risk.

The distribution of PRS in a population is normally distributed

PRS are generated by adding up the effects of hundreds to millions of alleles at common variants spread across the genome. When applied to a population and then plotted, the result is a normal distribution. Most people will have some risk alleles with a positive effect, resulting in a score somewhere in the middle of the distribution. There will be a smaller number of individuals with either lots of risk alleles and therefore very high scores, or fewer than average risk alleles leading to lower scores.

We calculated Maria’s raw PRS for breast cancer and compared it to a reference distribution (left). To enable interpretation and to define quantiles of the PRS distribution, the PRS distribution is standardized to have a mean of 0 and standard deviation of 1 (right). From her standardized score, we can see that Maria’s PRS is 1.4 standard deviations away from the average score.
The reference population is divided into percentiles to identify where Maria’s PRS lands. (In this chart we show only where the deciles fall.) Her PRS, marked by the red line, is in the 91st percentile of the distribution.

Developing ancestry-specific reference populations

One of the main factors to consider when building a reference population is to ensure that the genetic ancestry of the population matches that of the individual(s) for whom you are planning to predict risk. A reference population needs to have the necessary genetic and clinical data needed to build a prediction model that can be used on new individuals like Maria. It’s also helpful — although not essential — that the PRS we’re using was also developed in an ancestry group that matches Maria’s genetic ancestry.

PRS percentile to relative and absolute risk

If Maria’s genetic ancestry is European, then we can define the reference population based on many thousands of women who are European that we have data for. A good example of such a dataset is the UK Biobank. Because we know about disease in this reference population, we can identify Maria’s relative risk by comparing disease outcomes in women with her PRS to those with either average or lower scores.

PRS distributions can be used to translate specific risk scores to risk. On the left the risk relative to the remainder of the population is shown for the tail of the distribution. On the right, Maria’s PRS percentile translates to an almost 23% absolute lifetime risk of Breast Cancer, based on a risk model from a reference population

Reporting risk

So, now we know Maria’s PRS percentile, having potentially adjusted her ancestry (in the case that she is not European) and have assessed her relative and absolute risk of disease by comparing her PRS to an ancestry-specific reference distribution. The final step of the process is to feed back Maria’s breast cancer risk. There are a variety of ways that this can be done. While it’s possible to simply provide someone their PRS percentile, which implicitly provides some information about how their score relates to others, as we mentioned at the top — from a clinical point of view — this provides no value. And, we can anyway do much better by providing an assessment of absolute risk that provides actionable information to empower individuals to actively mitigate their risk.

  • Is the report physician or patient-focused? This will affect the language and tone of any resulting communication and define the level of educational content that might be required.
  • Can absolute risk, incorporating clinical factors, PRS and potentially rare pathogenic variation, be communicated?
  • Are there clear risk thresholds from disease guidelines that can be used to align genomics-integrated predictions of risk with current standard of care?
  • Are there recommendations that can be applied to help the individual mitigate any extra risk?

A guidelines-first approach to defining risk thresholds

At Allelica, our risk reports take a guidelines-first approach to defining high risk. According to the American Cancer Society, a lifetime risk of breast cancer between 20% and 25% is defined as high risk. Given a population prevalence of around 10–12% for breast cancer, these rates are between 2 and 3 times the ‘average’ risk of the population. Other authors have suggested that — as a rule of thumb — a factor that increases your risk of disease by at least 2 times might be described as high risk. While this is a relative measure, made by comparing a group against either an average individual or the remainder of the population, it provides a benchmark that we can use with PRS — and more sophisticated PRS integrated models involving other risk factors — to provide estimates of risk.

Maria’s PRS puts her above the guideline high risk threshold (20%).
An example of Allelica’s Breast Cancer risk report incorporating PRS, rare pathogenic mutations and family history of disease.


Genetic information from PRS have the potential to inform risk management approaches across populations and for individuals. We have known for a long time that the genetic risk of complex disease is the result highly penetrant rare pathogenic mutations as well as polygenic variants. Until now the assessment of this polygenic risk has not been possible. PRS provide one tool for assessing that genetic component, and are becoming increasingly powerful in their predictive performance.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store


Allelica is a Software Genomics Company developing algorithms and digital tools to accelerate the integration of Polygenic Risk Score in the clinical practice