Imputation is a statistical technique that allows you to “guess” parts of your genetic code that were not read in a common test (such as ancestry tests). Instead of sequencing all 3 billion letters of your genome, which would be extremely expensive, the computer uses reference panels to fill in the gaps.
- The Reference Library (The Template)
For the system to know what to complete, it needs “maps” of real humans whose entire genome has already been mapped.
- The 1000 Genomes Project (2,504 people) was the pioneer.
- The HRC (Haplotype Reference Consortium) has raised the level to approximately 32,000 people.
- TOPMed is one of the most robust databases available today, with over 97,000 complete sequences.
The larger the panel, the greater the genetic diversity available. This makes it much easier to find someone in the world who has similar “blocks” of DNA to yours, increasing the accuracy of the prediction.
- How does the process work in practice?
Imagine that your exam only read a few points (called rsIDs). The system looks at these points and searches for identical matches in the reference panels:
- Block A: If in your markers rs123, rs1234, and rs12345 you follow an identical sequence to that of Human #1,233 on the panel, the system assumes that you also inherited the letters that are between those points. It uses that human as a “template” for that region.
- Block B : Further on, at markers rs345, rs3456, and rs34567, its pattern changes and becomes similar to that of Human #3. The AI switches references and completes what is missing with its sequence.
- The Result:
Your imputed genome ends up being a digital patchwork. The system identifies which “neighbor” on the reference panel you fit into for each segment of the chromosome. The result is a highly accurate estimate (often with over 99% accuracy) of your complete genome, allowing the identification of disease risks or traits that the original test simply missed.
