L2: Applying massively parallel sequencing detecting and interpreting genome variation

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/50

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

51 Terms

1
New cards

Applications of massively parallel sequencing

  1. Making de novo genome sequencing feasible

  2. Sequencing cancer cell genomes

  3. Sequencing clinical isolates to identify causative pathogens

  4. Metagenomics

  5. Archaeological genomics

  6. Phylogenomics

  7. RNA-seq

<ol><li><p>Making de novo genome sequencing feasible</p></li><li><p>Sequencing cancer cell genomes</p></li><li><p>Sequencing clinical isolates to identify causative pathogens</p></li><li><p>Metagenomics</p></li><li><p>Archaeological genomics</p></li><li><p>Phylogenomics</p></li><li><p>RNA-seq</p></li></ol><p></p>
2
New cards
  1. Making de novo genome sequencing feasible

  • combining PacBio (long reads) with Illumina (short reads, higher throughput)

with

  • Paired end reads

3
New cards
  1. Sequencing cancer cell genomes

  • to identify changes likely to be

    • causitive, monitor cancer progression and tumour ecosystem

    • classify cancer for therapy

4
New cards
  1. Sequencing clinical isolates to identify causative pathogens

  • tracing their spread by their nucleotide divergence

  • e.g: → how new SARS-CoV2 variants have been identified and tracked

    • if no infrastrcuture to courier large numbers of samples from the field to sequencing centres

      • →→portable nanopore sequencing allows speed and portability

5
New cards
  1. Metagenomics

Sequencing of ecosystems

  • e.g soil, sea, gut contents

  • Single-molecule-based sequencing potentially yields a ‘metagenome’

    • i.e parasites, viruses, bacteria that colonise the host tissue from which DNA is extracted

note: single-molecule sequencing is highly sensitive to contamination by DNA of researchers, samplers etc

6
New cards
  1. Archaeological genomics

  • on minute amounts of DNA

    • e.g Neanderthals

7
New cards
  1. Phylogenomics

e.g

  • 10,000 vertebrate genomes, including distant branches and endangered species→ 260 to date

  • 70,000 eukaryotes in GB 12 marine worms, 11 lepidotera, 10 mountain bryophytes, 9 Oxfordshire earthworms, 8 diverse diptera, 7 types of apple, 6 algae cultures, 5 festive fucoids, 4 fungi, 3 coastal lichens, 2 chordates

  • GOAL→ 1.5 M known eukaryotes over 10 years

<p>e.g</p><ul><li><p>10,000 vertebrate genomes, including distant branches and endangered species→ 260 to date </p></li><li><p>70,000 eukaryotes in GB 12 marine worms, 11 lepidotera, 10 mountain bryophytes, 9 Oxfordshire earthworms, 8 diverse diptera, 7 types of apple, 6 algae cultures, 5 festive fucoids, 4 fungi, 3 coastal lichens, 2 chordates</p></li><li><p>GOAL→ 1.5 M known eukaryotes over 10 years</p></li></ul><p></p>
8
New cards
  1. RNA-seq

  • profiling gene expression by sequencing reverse-transcribed RNA populations

9
New cards

Population genome sequence projects: cost of sequenceing large numberes of individuals

  • although gradually decreasing

  • still expensive

10
New cards

Population genome sequence projects:

  1. 1000 genomes consortium

  2. UK10K consortium (2015)

  3. NHS England Genomics England

<ol><li><p>1000 genomes consortium</p></li><li><p>UK10K consortium (2015)</p></li><li><p>NHS England Genomics England</p></li></ol><p></p>
11
New cards
  1. 1000 genomes consortium

  • after this came a number of 10k genome consortia

  • now NHS england has a 100k genome consortium

  • there are others worldwide

<ul><li><p>after this came a number of 10k genome consortia</p></li><li><p>now NHS england has a 100k genome consortium</p></li><li><p>there are others worldwide</p></li></ul><p></p>
12
New cards
  1. UK10K consortium (2015)

  • identifies rare variants in health and disease

<ul><li><p>identifies rare variants in health and disease</p></li></ul><p></p>
13
New cards
  1. NHS England Genomics England

  • 100,000 genomes

<ul><li><p>100,000 genomes</p></li></ul><p></p>
14
New cards

Genomes represented

  • over-representation of white/European genomes

  • misses the full range of human genetic variation

    • especially in Africa where modern humans originated

    • and where genetic diversity is highest

<ul><li><p><strong>over-representation</strong>&nbsp;of white/European genomes</p></li><li><p>misses the full range of human genetic variation</p><ul><li><p><strong>especially in Africa</strong>&nbsp;where modern humans originated</p></li><li><p>and where genetic diversity is<strong> highest</strong></p></li></ul></li></ul><p></p>
15
New cards

Single nucleotide polymorphisms: SNPs→ how to find

  • single next-generation sequencing uses single molecules

    • heterozygous SNPs appear as different bases in different reactions

<ul><li><p>single next-generation sequencing uses single molecules</p><ul><li><p>heterozygous SNPs appear as different bases in different reactions</p></li></ul></li></ul><p></p>
16
New cards

What are polymorphisms

  • alleles present at a frequency that is too high to be recent mutation

  • before elimation by negative selection

  • → rule of thumb is a frequency of >1%

17
New cards

Alleles that are rarer than this may be…

  • recessive lethals

  • → will eventually be eliminated

  • (rare SNPs)

18
New cards

Alleles at higher frequency…

  • more likely to be selectively neurtral

  • their frequency can incrase or decrease by ‘genetic drift’

    • (rare advatageous SNPs)

19
New cards

SNPs causing amino acid substiution

  • about 1/300

  • → most are silent and probably neutral

<ul><li><p>about 1/300</p></li><li><p>→ most are silent and probably neutral</p></li></ul><p></p>
20
New cards

By seqeuncing many individuals

  • can make extensive SNP catalogs

but

  • may still miss SNPs from ethnic groups not sequenced

    • e.g African populations stil undersmapled!

e.g: 150000 human genomes from diverse UK populations reveal:

  • 600M SNPs

    • (1 nucleotide in 5 in the genome)

<ul><li><p>can make<strong> extensive</strong>&nbsp;SNP catalogs</p></li></ul><p>but</p><ul><li><p>may still miss SNPs from ethnic groups not sequenced</p><ul><li><p>e.g African populations stil undersmapled!</p></li></ul></li></ul><p></p><p>e.g: 150000 human genomes from diverse UK populations reveal:</p><ul><li><p>600M SNPs</p><ul><li><p>(1 nucleotide in 5 in the genome)</p></li></ul></li></ul><p></p>
21
New cards

SNP catalogs from diverse populations reveal…

  • highest variation (heterozygous sites per kb) in AFRICA

    • Decreasing with distance from Africa

  • THEREFORE: supporting the region as the one where modern humans originated

    • (or at least most of their genomes)

22
New cards

Origin of SNPs

  1. population sequencing of human pedigrees suggests a mutation rate of 1.3 × 10^-8 nucleotide substitutions per bp per gen

  2. Multiple mutation rate per bp with size of the human diploid genome

    • → 70 new mutations per generation per diploid genome:

  3. this is a tiny % of the 2M-3M SNPs in most humans (number estimated from heterozygosity)

    • BUT: a continual source of new variation

<ol><li><p>population sequencing of human pedigrees suggests a mutation rate of 1.3 × 10^-8 nucleotide substitutions per bp per gen</p></li><li><p>Multiple mutation rate per bp with size of the human diploid genome</p><ul><li><p>→ 70 new mutations per generation per diploid genome:</p></li></ul></li><li><p>this is a<strong> tiny</strong>&nbsp;% of the 2M-3M SNPs in most humans (number estimated from heterozygosity)</p><ul><li><p><strong>BUT:</strong> a continual source of new variation</p></li></ul></li></ol><p></p>
23
New cards

Note:

  • With 8 billion humans on the planet, their pan-genome carries ~560 billion new mutations per generation

  • i.e. each basepair in the genome is mutated on average ~180 times globally per generation

24
New cards

Mutation rate male vs female germline

  • x4 higher in male than female germline

  • increases by about 2 mutations per year with the combined age of each parent

    • mostly depending on father’s age

<ul><li><p>x4 higher in male than female germline</p></li><li><p>increases by about 2 mutations per year with the<strong> combined age</strong>&nbsp;of each parent</p><ul><li><p>→<strong> mostly depending on father’s age</strong></p></li></ul></li></ul><p></p>
25
New cards

What can we learn from single-nucleotide variation?: Large-scale SNP detection

i.e post-discovery by hybridisation

  • calssical genetic mapping identifies rare high-risk disease alleles

    • if pedigrees are available

  • BUT: poor at identifying loci that make a small contribution to disease with heterogenous genetic and environmental causes:

    • diabetes, multiple sclerosis, schizophrenia, bipolar disorder, asthma, heart disease, Parkinson’s disease

<p>i.e post-discovery by hybridisation</p><ul><li><p>calssical genetic mapping identifies rare high-risk disease alleles</p><ul><li><p>if pedigrees are available</p></li></ul></li><li><p>BUT: poor at identifying loci that make a small contribution to disease with heterogenous genetic and environmental causes:</p><ul><li><p>diabetes, multiple sclerosis, schizophrenia, bipolar disorder, asthma, heart disease, Parkinson’s disease</p></li></ul></li></ul><p></p>
26
New cards

Can SNPs explain qunaitiative genetic variation in complex (non-Mendelian) traits?

knowt flashcard image
27
New cards

Hypothesis

  • Many complex conditions or diseases are caused by alleles of numerous genes

    • each gene or allele making a small effect to the phenotype

How to identify these genes

28
New cards

Test: how to identify these genes

  • are any SNP alleles more likely to be associated with a given disease than you would expect if they co-occurred only at random?

29
New cards

Test: selection of SNPs for chip analysis:

  1. Common SNPs→ not rare ones, to maximise variation from limited sample size

  2. International HapMap Project

    • genotyped millions of SNPs from hundreds of mother-father-child trios from diverse populations to identify ‘haplotypes’

<ol><li><p>Common SNPs→ not rare ones, to maximise variation from limited sample size</p></li><li><p>International HapMap Project</p><ul><li><p>genotyped millions of SNPs from hundreds of mother-father-child trios from diverse populations to identify&nbsp;‘haplotypes’</p></li></ul></li></ol><p></p>
30
New cards

What are haplotypes

  • genome regions where neiughbouting SNPs show ‘linkage disequalibirium’

    • i.e non-random association

<ul><li><p>genome regions where neiughbouting SNPs show&nbsp;‘linkage disequalibirium’</p><ul><li><p>i.e non-random association</p></li></ul></li></ul><p></p>
31
New cards

Why are haplotypes useful

  • not necessary to genotype every SNP

    • WHY: although haplotypes are not perfectly maintaine and gradually decay by recombination over many generations

      • a single SNP from a haplotype still identifies which haplotype is present in many individuals

<ul><li><p>not necessary to genotype every SNP</p><ul><li><p>WHY: although haplotypes are not perfectly maintaine and gradually decay by recombination over many generations</p><ul><li><p>a single SNP from a haplotype<strong> still identifies</strong>&nbsp;which haplotype is present in<strong> many</strong>&nbsp;individuals</p></li></ul></li></ul></li></ul><p></p>
32
New cards

Result

  • potential to detect many SNPs that each make a small contribution to disease susceptibility

  • genome-wide association studies (GWAS)

<ul><li><p>potential to detect many SNPs that each make a small contribution to disease susceptibility</p></li><li><p>→<strong> genome-wide association studies (GWAS)</strong></p></li></ul><p></p>
33
New cards

Some maths of association studies

There is a SNP in the angiogenin gene with two alleles, T and G 

  • In an affected sample of 364 patients suffering from ALS (motor neurone disease): G has an allele frequency of 21% (155/728), T has a frequency of 79% (573/728)

  • In a matched unaffected sample of 299 individuals: G has an allele frequency of 14% (83/598), T has a frequency of 86% (515/598)

<p>There is a SNP in the angiogenin gene with two alleles, T and G&nbsp;</p><ul><li><p>In an affected sample of 364 patients suffering from ALS (motor neurone disease): G has an allele frequency of 21% (155/728), T has a frequency of 79% (573/728) </p></li><li><p>In a matched unaffected sample of 299 individuals: G has an allele frequency of 14% (83/598), T has a frequency of 86% (515/598)</p></li></ul><p></p>
34
New cards

Odds Ratio

  • powerful way to compare the relative risk for ALS conferred by the G allele

Odds Ratio = (155/573 ÷ 83/515) = 1.7

<ul><li><p>powerful way to compare the relative risk for ALS conferred by the G allele</p></li></ul><p><strong>Odds Ratio = (155/573 ÷ 83/515) = 1.7</strong></p><p></p>
35
New cards

How to test if this association with G with ALS is signficiant?

Chi-squared test:

  • highly significant P-value <0.001

  • i.e: the probability that this finding is due to rrandom sampling variation is less than 1 in a thousand

<p><strong>Chi-squared test</strong>:</p><ul><li><p>highly significant P-value &lt;0.001</p></li><li><p>i.e: the probability that this finding is due to rrandom sampling variation is<strong> less than 1 in a thousand</strong></p></li></ul><p></p>
36
New cards

In conclusion:

  • the association of the G allele with the disease is significant

  • however→ even this P value needs caution…

<ul><li><p>the association of the G allele with the disease is<strong> significant</strong></p></li><li><p>however→ even this P value needs caution…</p></li></ul><p></p>
37
New cards

How to interpret association

  • look at the Manhattan plot

<ul><li><p>look at the Manhattan plot</p></li></ul><p></p>
38
New cards

Pitfalls of large-scale SNP analysis

  1. If testing a single SNP→ a P value of <0.001 may appear highly signficiant

    • but if testing multiple SNPs→ may be an artefact of random variation!

    • → with 1M SNPs→ 1000 would show this P-value with random sampling

      • be scored as false positives (Type I statistical error)

  2. Perhaps angiogenin is causative?→ BUT might just be closely linked to a causative gene?

  3. An artefacts of non-random sampling

    • e.g a disease more common in particular ethnic groups will show associations with alleles of other loci that are also enriched in the same ethnic group

    • → if the control and affected groups are not matched for ethnicity

<ol><li><p>If testing a<strong> single</strong>&nbsp;SNP→ a P value of &lt;0.001 may appear highly signficiant</p><ul><li><p><strong>but</strong> if testing multiple SNPs→ may be an artefact of random variation!</p></li><li><p>→ with 1M SNPs→ 1000 would show this P-value with random sampling</p><ul><li><p>be scored as <strong>false positives</strong> (Type I statistical error)</p></li></ul></li></ul></li><li><p>Perhaps angiogenin is causative?→ BUT might just be closely linked to a causative gene?</p></li><li><p>An artefacts of non-random sampling</p><ul><li><p>e.g a disease more common in particular ethnic groups will show associations with alleles of other loci that are <strong>also</strong> enriched in the <strong>same</strong> ethnic group</p></li><li><p>→ if the control and affected groups are <strong>not</strong> matched for ethnicity</p></li></ul></li></ol><p></p>
39
New cards

Caveats

  • P values used as critieria for statistical significance must be extremely stringent

    • Even then→never trust an association until it is independently replicated

but how meaningful are strong p values but mall effects anyway?

<ul><li><p>P values used as critieria for statistical significance must be <strong>extremely stringent</strong></p><ul><li><p>Even then→<strong>never</strong> trust an <strong>association</strong> until it is independently replicated</p></li></ul></li></ul><p></p><p><em>but how meaningful are strong p values but mall effects anyway?</em></p>
40
New cards

If an SNP shows irrefutable association with disease→ what does this mean?

  • Most ‘causative’ SNP are not protein coding→ come affect expression of adjacent genes

  • Most are of no used for predicition and therapy→ due to low odds ratios

  • HOWEVER: groups of genes affecting similar pathways/processes offer clues to disease mechanisms

    • e.g many GWAS hits in multiple sclerosis are at immunity genes

      • this is only the start of a very long process of biological study

41
New cards

Heritability

  • genetic contribution to variation 

    • estimated e.g from twin studied

42
New cards

Dark matter genetic contributions on SNP chips? compared to heritbability

  • genetic contributions of SNPS found in GWAS add up to much less than known heritability

→ There must be some kind of ‘dark matter’ additional genetic contributions that are not SNP chips

<ul><li><p>genetic contributions of SNPS found in GWAS add up to <strong>much less than known heritability</strong></p></li></ul><p>→ There must be some kind of ‘dark matter’ additional genetic contributions that are not SNP chips</p>
43
New cards

 What could this dark matter be?

  1. Copy number variants

  2. Omnigenic hypothesis

    • larger samples can identify ever more genes with ever smaller effects

    • but: what is the point in knowing that anything can be affected by everything???

  3. Rare alleles of large effect?

    • Not represented in SNP screens→ may have large effects

    • recessive disease-causing alleles are now easily identified by exome sequencing

<ol><li><p>Copy number variants</p></li><li><p>Omnigenic hypothesis</p><ul><li><p>larger samples can identify ever more genes with ever smaller effects</p></li><li><p><em>but: what is the point in knowing that anything can be affected by everything???</em></p></li></ul></li><li><p>Rare alleles of large effect?</p><ul><li><p>Not represented in SNP screens→ may have large effects</p></li><li><p>recessive disease-causing alleles are now easily identified by <strong>exome sequencing </strong></p></li></ul></li></ol><p></p>
44
New cards

Exome sequencing→ to find rare alleles

  • exome sequencing of affected individuals and immediate family

    • exome?→ 2-3% of the genome that codes for exons

    • BUT CAN”T INTRONS AFFECT GENE EXPRESSION AND DISEASE?

  • Yields propotionately more severe mutations (affecting coding sequence)→ than the whole genome

    • for a fraction of the cost and time

  • NEED: carful interpretation to infer that any mutations identified as causal

<ul><li><p>exome sequencing of affected <strong>individuals and immediate family</strong></p><ul><li><p>exome?→ 2-3% of the genome that codes for exons</p></li><li><p><em>BUT CAN”T INTRONS AFFECT GENE EXPRESSION AND DISEASE?</em></p></li></ul></li><li><p>Yields propotionately more severe mutations (affecting coding sequence)→ than the whole genome</p><ul><li><p>for a <strong>fraction of the cost and time</strong></p></li></ul></li><li><p>NEED: carful interpretation to infer that any mutations identified as<strong> causal</strong></p></li></ul><p></p>
45
New cards

Rare recessive alleles – the genetic “dark matter” not detected by GWAS?: how many recessive disease-causing mutations do we carry

  • Typically (i.e heterozygous) → 150 loss-of-function variants 

    • frameshifts, stops, splice sites disruptions

  • Most are common enough (>0.5%) to be neutral

  • BUT→ 10-20 of these are rare variants

    • i.e likely to be selected against ad pathogenic if homozygous

found in the 1000 genomes project consortium

<ul><li><p>Typically (i.e heterozygous) → 150 loss-of-function variants&nbsp;</p><ul><li><p>frameshifts, stops, splice sites disruptions</p></li></ul></li></ul><ul><li><p>Most are common enough (&gt;0.5%)<strong>&nbsp;</strong>to be<strong> neutral</strong></p></li><li><p><strong>BUT→</strong> 10-20 of these are<strong> rare variants</strong></p><ul><li><p>i.e likely to be selected against ad pathogenic if homozygous</p></li></ul></li></ul><p><em>found in the 1000 genomes project consortium</em></p><p></p>
46
New cards

What do homozygotes for rare disease alleles reveal about disease mechanisms and underlying biology?

→ like screens in model organism

  • reveal more so than the small effect sizes in brute-force GWAS

    • depends on humans who are homozygous or ‘compound heterozygous’ (two different alleles in the same individual) for loss of function alleles and understanding their phenotyps

  • Ever more ‘human knockouts’ are emerging

<p>→ like screens in model organism</p><ul><li><p>reveal<strong> more</strong>&nbsp;so than the small effect sizes in brute-force GWAS</p><ul><li><p>depends on humans who are homozygous or&nbsp;‘compound heterozygous’ (two different alleles in the same individual) for loss of function alleles and understanding their phenotyps</p></li></ul></li><li><p>Ever more ‘human knockouts’ are emerging</p></li></ul><p></p>
47
New cards

More human knockouts are emerging via

  1. exome sequencing of patients with persistent undiagnosed disease and immediate families

    • 30-50% of such individuals have good candidate loss-of-function rate alleles for their disease

  2. Phenotypically unbiased sequencing of populations with higher levels of homozygosity:

    • populations with not many founders or children of consanguineous parents

      • consanguineous→ relating to or denoting people descended from the same ancestor

<ol><li><p>exome sequencing of patients with persistent undiagnosed disease and immediate families</p><ul><li><p>30-50% of such individuals have good candidate loss-of-function rate alleles for their disease</p></li></ul></li><li><p>Phenotypically unbiased sequencing of populations with higher levels of homozygosity:</p><ul><li><p>populations with<strong> not many founders</strong>&nbsp;or children of consanguineous parents</p><ul><li><p>consanguineous→ relating to or denoting people descended from the same ancestor</p></li></ul></li></ul></li></ol><p></p>
48
New cards

Some examples of this

  1. From 2636 Icelandic human genomes, ~8% homozygous loss-of-function for at least one gene

  2. From 3222 Pakistani British exomes, a subset were children of first-cousin marriages; 41% of these were homozygous loss-of-function for at least one gene, most without obvious illness

  3. Screening 874 disease genes in ~500,000 genomes showed 13 individuals homozygous for 8 severe Mendelian conditions, but no symptoms

<ol><li><p>From 2636 Icelandic human genomes, ~8% homozygous loss-of-function for at least one gene  </p></li><li><p>From 3222 Pakistani British exomes, a subset were children of first-cousin marriages; 41% of these were homozygous loss-of-function for at least one gene, most without obvious illness  </p></li><li><p>Screening 874 disease genes in ~500,000 genomes showed 13 individuals homozygous for 8 severe Mendelian conditions, but no symptoms </p></li></ol><p></p>
49
New cards

Human knockouts: other than human knockouts with severse effects as expected…

  • others are surprisingly mild

    • any who consent to follow-up studies are a valuable resource for biology and medicine

50
New cards

Example of these mild knockouts

APO3C→ loss-of-function individuals:

  • have lowered triglycerise lipoproteins in blood

    • will they be less susceptible to cardiovascular disease?

51
New cards
  • So is APO3C a plausible therapeutic target?

  • perhaps genetics may improve on the limited success of big pharma to deliver novel medicines in recent decades???

    • (rather than incrementally improved)