BIN300 W10: Genotype imputation

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/13

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

14 Terms

1
New cards

what is genotype imputation

estimating missing data

<p>estimating missing data</p>
2
New cards

applications of imputatoin

SNPchip gneoytping reuslts in some no-calls (5%), impute the missing genotypes

use different SNPchips from different venders, denser SNPchip comes on the market, do I need to re-genotype all the previously genotyped aniamls, not necessary to to imputation

new cheaper SNPchip comes on the marker, many SNPs overlap but not all, impute the missing SNPs, both ways: SNPs on old-chip → impute on new chip, SNPs on new chip → impute on old chip

3
New cards

use expensive dense chip on

important ancestors of populations

4
New cards

use cheap parse SNP chip on

production animals

0 impute missing genotypes of the sparse chip, impute all genotyped animals up to high density, impute all genotyped animals up to whole genome sequence

example in fish breeding

genotpe all parents with dense SNP (200~generation)

genotype all offspring and test-sibs with sparse chip (many thousands/generation)

  • impute genothypes of offspring up to high density

5
New cards

general method

training set animals: high density genotypes

  • no/few missing genotypes

  • e.g. 1000-bull genomes: to impute whole-genome-sequences in cattle

imputation set: animals whose genotypes need to be imputed

some methods use pedigree

  • reduces the set of relevant haplotypes (to those within the pedigree)

    • however: mostly imputation methods are well able to find the most relevant haplos, pedigree does not help (much)

      most genotype imputation is entierly based on linkage disequilibrium

6
New cards

2-step approach

  1. phasing: which alleles are next to eachother on chromosome

  2. imputation of missing genotypes

often: prephased/imputed reference population (used for training)

one-step simultaneous phasing and imputatoin is more accurate, but computationally more costly (speed is important especially with WGS (whole genome sequence) data)

<ol><li><p>phasing: which alleles are next to eachother on chromosome</p></li><li><p>imputation of missing genotypes</p></li></ol><p>often: prephased/imputed reference population (used for training)</p><p>one-step simultaneous phasing and imputatoin is more accurate, but computationally more costly (speed is important especially with WGS (whole genome sequence) data)</p><p></p>
7
New cards

haplotype library approach

split chromosome into pieces/segments

set up library of haplotypes for each segment (prephased reference)

match sample

how large should the segments be → art of imputation

  • need some overlap between the segments: how much overlap

8
New cards

Hidden Markov Model (HMM)

HMM: estimate probabilities for unoversved (hidden) states (S)

  • states are the haplotypes on previous slide

  • which haplotype does the animal have at its paternal chrom, and which at its maternal chrom, states can change (somewhat) from one position to the next due to recombination, new haplotype

observations are the genotypes which are sometimes missing (G)

estimating probabilities ofr animal i

where G = all (known) genotypes at all loci at chromosome; summation is over all possible states at all loci on chromosome (only computational possible by forward-backward algorithm)

<p>HMM: estimate probabilities for unoversved (hidden) states (S)</p><ul><li><p>states are the haplotypes on previous slide</p></li><li><p>which haplotype does the animal have at its paternal chrom, and which at its maternal chrom, states can change (somewhat) from one position to the next due to recombination, new haplotype </p></li></ul><p>observations are the genotypes which are sometimes missing (G)</p><p>estimating probabilities ofr animal i</p><p></p><p>where G = all (known) genotypes at all loci at chromosome; summation is over all possible states at all loci on chromosome (only computational possible by forward-backward algorithm)</p><p></p>
9
New cards

HMM

Haplo is a amosaic of reference haplos

HMM: finds mosaic given some/few known marker genotypes (minimises number of crossovers between reference haplos)

<p>Haplo is a amosaic of reference haplos</p><p>HMM: finds mosaic given some/few known marker genotypes (minimises number of crossovers between reference haplos)</p>
10
New cards

Imputatoin software output

knowt flashcard image
11
New cards

measuring imputation accuracy

discordance between estimated and true genotype, highly dependent on frequency, SNP with freq(A)=.01 is easy to impute (low MAF (MinorAlleleFrequency)): P(AA)=0.98 Ifrom Hardy Weinberg Freque)

squared correlation (r²) between estimated and true allele score

  • allele score is 0,1,2 for genotypes 00, 01, 11

  • less dependent on allele frequency

  • hard to get high r² for low MAF SNPs

mask some known genotypes and impute them

  • i.e. some SNPs on sparse-cip in imputation set

12
New cards

genotype imputation generally based on LD

Haplotype library method

hidden markov model method

13
New cards

useful for

imputing occasional missing genotype, imputing between different versions of SNPchips, systematically imputing low density genotypes up to high density, possible up to sequence data

14
New cards

output

genotype and allele probabilities, phased genotyeps, genotype calls