L3: Applying massive parallel sequencing (Pt2)

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/51

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

52 Terms

1
New cards

Evolution and SNPs: SNP life cycles

SNPs are born by mutation and may drift in frequency of face selection over their evolutionary life cycle

General tendancies:

  1. Neutral SNPs (most SNPs)→ drift up or down in frequency

    • eventually facing loss or fixation

  2. Deleterious SNPs→ usually remain rare

    • until lost by selection

  3. Advantageous SNPs→ usually increase in frequency

    • → towards fixation

Note: they are all subject to random fluctuations which can be signficiant e.g in small populations

N here is the population size (se E and B notes for how this equation is derived

<p>SNPs are born by mutation and may drift in frequency of face selection over their evolutionary life cycle</p><p><em>General tendancies:</em></p><ol><li><p>Neutral SNPs (most SNPs)→ drift up or down in frequency</p><ul><li><p>eventually facing loss or fixation</p></li></ul></li><li><p>Deleterious SNPs→ usually remain <strong>rare</strong></p><ul><li><p>until lost by selection</p></li></ul></li><li><p>Advantageous SNPs→ usually <strong>increase</strong> in frequency</p><ul><li><p>→ towards fixation</p></li></ul></li></ol><p></p><p><em>Note: they are </em><strong><em>all</em></strong><em> subject to random fluctuations which can be signficiant e.g in </em><strong><em>small populations</em></strong></p><p></p><p><em>N here is the population size (se E and B notes for how this equation is derived</em></p>
2
New cards

Gene divergence

  • sequence divergence between species from a common ancestor

    • which gives rise to ‘orthologs’

      • (as opposed to paralogs→ form by gene duplication within a species)

3
New cards

Divergence rate reflects…

  • a molecular clock:

    • Allows estimations of:

      1. divergence time 

      2. phylogenetic relationships

4
New cards

Clock speed can vary among

  1. proteins

  2. species

THERFORE→ can result in misleading trees→ if interpretation is too naïve

5
New cards

How to detect positively selected alleles that drive evolutionary change?

Some causative changes should show traces of positive selection:

  1. higher rate of change or polymorphism for ‘non-synonymous’ positions that cause amino acid changes

    • compared to ‘silent’ nucleotide changes→ e.g 3rd codon positions

  2. selective sweep→ new mutation is selected for and linked polymorphisms  are swept (or hitchhike) along with it

    • see examples below

<p>Some causative changes should show traces of <strong>positive selection:</strong></p><ol><li><p><strong>higher rate</strong> of change or polymorphism for&nbsp;<strong>‘non-synonymous’</strong> positions that cause amino acid changes</p><ul><li><p>compared to&nbsp;‘silent’ nucleotide changes→ e.g 3rd codon positions</p></li></ul></li><li><p><strong>selective sweep</strong>→ new mutation is selected for and<strong> linked polymorphisms&nbsp;</strong>&nbsp;are<strong> swept</strong>&nbsp;(or hitchhike) along with it</p><ul><li><p>see examples below</p></li></ul></li></ol><p></p>
6
New cards
  1. Examples of sweeps from human-chimp comparisons

  • immune proteins in host-parasite ‘arms-races’

  • olfactory receptors

<ul><li><p>immune proteins in<strong> host-parasite</strong>&nbsp;‘arms-races’</p></li><li><p>olfactory receptors</p></li></ul><p></p>
7
New cards

Other evidence: archaeological DNA shows recent positive selection

  1. e.g 1→ alleles that give lighter skin colour→ 10k-20k years ago

    • affecting some genes independently in Europe and East Asia

  1. e.g 2→ adult persistence of lactase→ <3k years ago

<ol><li><p>e.g 1→ alleles that give lighter skin colour→ 10k-20k years ago</p><ul><li><p>affecting some genes<strong> independently in Europe and East Asia</strong></p></li></ul></li></ol><ol start="2"><li><p>e.g 2→ adult persistence of lactase→ &lt;3k years ago</p></li></ol><p></p><p></p>
8
New cards

Other sweeps in human populations

  1. Immune loci→ specific disease resistance

  2. Altitude→ genes controlled by HIF (hypoxia-nduced factor)

    • selected haplotype in Tibetans appears to originate from Denisovan hominins

  3. Arcitic diet: decreased fatty acid import into mitochondria

<ol><li><p>Immune loci→ specific disease resistance</p></li><li><p>Altitude→ genes controlled by HIF (hypoxia-nduced factor)</p><ul><li><p>selected haplotype in Tibetans appears to originate from Denisovan hominins</p></li></ul></li><li><p>Arcitic diet: decreased fatty acid import into mitochondria</p></li></ol><p></p>
9
New cards

Nucleotide variation: sum up of SNPs

  • SNP are reflection of out individuality and a snap shot of evolution  in action

  • Common SNPs→ useful tool for GWAS and mapping inheritance of common traits

    • BUT: account for only a fraction of hertibability

  • Rare alleles→ from large banks of exome and genome sequences

    • coupled with phenotpyic data

    • → VALUABLE source of genetic data on human traits and diseases

  • Selected differences between different human populations→ informative about the genetics of human physiology

  • HOWEVER: there are still large gaps in resources in non-white populations

10
New cards

Structural variants in genomes and how they evolve:

Copy number variants:

  • these are duplications and deletions of DNA

11
New cards

How do copy variants arise

  1. Polypoidisation

  2. Unequal Crossovers

12
New cards
  1. Polyploidisation

  • duplicates every gene in the genome→ whole genome duplication (WGD)

    • E.g in yeast→ had such duplication and THEN loss of 90% of duplicated genes

    • E.g→ evidedence for similar eveents in telost fish

13
New cards
  1. Unequal crossovers

  • give rise to tandem gene families

    • perhaps initially between two copies of repetitive sequence

    • THEN

    • between copies of the duplicated genes

  • Allows for→ frequent changes in copy number during evolution

<ul><li><p>give rise to<strong> tandem gene families</strong></p><ul><li><p>perhaps initially between two copies of repetitive sequence</p></li><li><p>THEN</p></li><li><p>between copies of the duplicated genes</p></li></ul></li><li><p>Allows for→ frequent changes in <strong>copy number</strong> during evolution</p></li></ul><p></p>
14
New cards
  1. Unequal crossing over→ between short repetitive elements

  1. when repeats are in the same orientation→ a deletion and a duplication result

  2. recombination between inverted copies of a repeat in the same chromosome result in→  an inversion

<ol><li><p>when repeats are in the same orientation→ a deletion and a duplication result</p></li><li><p>recombination between inverted copies of a repeat in the<strong> same</strong>&nbsp;chromosome result in→&nbsp; <strong>an inversion</strong></p></li></ol><p></p>
15
New cards
  1. Unequal crossing over→ between tandemly duplicated genes

  • increases or decreases copy number

  • A decrease can:

    • remove a functional copy

    • is often deleterious

  • Crossover within a gene:

    • swaps the N and C termini of the two copies of the gene affected

<ul><li><p>increases or decreases copy number</p></li><li><p>A decrease can:</p><ul><li><p><strong>remove</strong>&nbsp;a functional copy</p></li><li><p>is often deleterious</p></li></ul></li><li><p>Crossover within a gene:</p><ul><li><p>swaps the N and C termini of the two copies of the gene affected</p></li></ul></li></ul><p></p>
16
New cards

Detecting counting and characterising strucutural variants: originally identified from phenotypes

Although given the frequency of indels and CNVs→ much probably without major phenotypes there are still some CNVs which have phenotypic consequences

  1. Thalassemias→ unequal crossovers in globin genes)

  2. Colourblindness→ unequal crossovers in red and green opsin genes

    • themselves originated only recetly as a duplication in primates

  3. Prader-Willi syndrome→ duplication of chromosome 15 that cause intellectual disability

The more carefully we look for structural variants, the more we have found

<p><em>Although given the frequency of indels and CNVs→ much probably without major phenotypes there are</em><strong><em> still</em></strong><em>&nbsp;some CNVs which have phenotypic consequences</em></p><ol><li><p>Thalassemias→ unequal crossovers in globin genes)</p></li><li><p>Colourblindness→ unequal crossovers in red and green opsin genes</p><ul><li><p>themselves originated only recetly as a duplication in primates</p></li></ul></li><li><p>Prader-Willi syndrome→ duplication of chromosome 15 that cause intellectual disability</p></li></ol><p><em>The more carefully we look for structural variants, the more we have found</em></p><p></p>
17
New cards

Other ways of selecting structural changes→ sequencing a few individuals

  • detected hundreds of structural changes relative to the ‘reference’ sequence

  • 90% insertions or deletions (indels)

Ways to do this:

  1. Local read frequency (illumina) in short-read genome sequencing

  2. Long-read sequencing (nanopore)

18
New cards
  1. Local read frequency (illumina) in short-read genome sequencing

  • long read frequency improves resolution and detects more variants

  • e.g in 2500 whole human genomes

    • each individual showed 1500 heterozygous and 2 homozygous deletions

19
New cards
  1. Local read frequency (illumina) in short-read genome sequencing: OVERALL what has been found

  • 5800 homozygous deletions in this study covered 240 genes

    • dispensable for human life

      • → Time will tell how these individuals manage without them

Graph:

  • Blue line→ control

  • Green to the bottom→ homozygous

  • Green to halfway to the bottom→ heterozygous

<ul><li><p>5800<strong> homozygous deletions</strong>&nbsp;in this study covered<strong> 240 genes</strong></p><ul><li><p>dispensable for human life</p><ul><li><p><em>→ Time will tell how these individuals manage without them</em></p></li></ul></li></ul></li></ul><p></p><p><em>Graph:</em></p><ul><li><p>Blue line→ control</p></li><li><p>Green to the bottom→ homozygous</p></li><li><p>Green to halfway to the bottom→ heterozygous</p></li></ul><p></p>
20
New cards
  1. Long read sequencing (Nanopore)

  • Identified even more variation:

    • e.g in 3622 Icelanders, >22k structurual variatns per individual 

      • (13k insertions, 9k deletions)

      • A few are associated with phenotypes→ e.g low LDL, height)

      • Most aren’t

<ul><li><p>Identified<strong> even more variation</strong>:</p><ul><li><p>e.g in 3622 Icelanders, &gt;22k structurual variatns per individual&nbsp;</p><ul><li><p>(13k insertions, 9k deletions)</p></li><li><p>A few are associated with <strong>phenotypes</strong>→ e.g low LDL, height)</p></li><li><p><strong>Most aren’t </strong></p></li></ul></li></ul></li></ul><p></p>
21
New cards

OVERALL: the bottom line

We have a lot of structural variation:

  • affecting much more of our genome than SNPs

    • Some of it can cause genetic disease

    • BUT

    • A surprising amount DOES NOT

<p><strong>We have a lot of structural variation:</strong></p><ul><li><p>affecting much more of our genome than SNPs</p><ul><li><p>Some of it can cause<strong> genetic disease</strong></p></li><li><p>BUT</p></li><li><p><strong>A surprising amount DOES NOT</strong></p></li></ul></li></ul><p></p>
22
New cards

Evolutionary consequences of this

  • increased gene number

23
New cards

Increased gene number: fate

  • Homologous genes formed by duplication in the same species are PARALOGOUS GENES

Have two (or three?) fates:

  1. Redundancy of function→ allows one to mutate to non-functional (pseudogenes) over several Myr

  2. New/ more specialised functions→ increase developmental and physiological complexity→ multigene families

  3. Increased gene dosage→ e.g rRNA, histones

  4. Duplications are a source of developmental and physiological complexity

    • e.g gobin and Hox genes

  5. Duplication of protein domains→ new protein architectures with combinations of domains

  6. Truly new domains appear occasionally (but hard to verify)

<ul><li><p>Homologous genes formed by duplication in the<strong> same species</strong>&nbsp;are<strong> PARALOGOUS GENES</strong></p></li></ul><p>Have two (or three?) fates:</p><ol><li><p>Redundancy of function→ allows one to mutate to non-functional (pseudogenes) over several Myr</p></li><li><p>New/ more specialised functions→ increase developmental and physiological complexity→<strong> multigene families</strong></p></li><li><p>Increased gene dosage→ e.g rRNA, histones</p></li><li><p>Duplications are a source of developmental and physiological complexity</p><ul><li><p>e.g gobin and Hox genes</p></li></ul></li><li><p>Duplication of protein domains→ new protein architectures with combinations of domains</p></li><li><p>Truly new domains appear occasionally (but hard to verify)</p></li></ol><p></p>
24
New cards

About 1000 types of domains…

  • give rise to diversity of proteins we see today by

  • duplication and divergence

25
New cards

VNTRs

  • Variable number tandem repeats

    • → short repeats with higher copy number

Types:

  1. satellite sequences

  2. Minisatellites

  3. microsatellites

<ul><li><p>Variable number tandem repeats</p><ul><li><p>→ short repeats with higher copy number</p></li></ul></li></ul><p>Types:</p><ol><li><p>satellite sequences</p></li><li><p>Minisatellites</p></li><li><p>microsatellites</p></li></ol><p></p>
26
New cards
  1. Satellite sequences, features

  • WHAT: tandem repeats, generally heterochromatic, condensed

  • WHERE: most in centromeric regions

    • some in telomeric regions

  • SIZE: vary from few bp→ several hundred

  • COPY NUMBERS: 104→ 106

  • HOW MUCH OF GENOME: 10-40% of the genome

    • humans→ 10%

<ul><li><p>WHAT: tandem repeats, generally <strong>heterochromatic</strong>, condensed</p></li><li><p>WHERE: most in <strong>centromeric</strong> regions</p><ul><li><p>some in <strong>telomeric</strong> regions</p></li></ul></li><li><p>SIZE: vary from few bp→ several hundred</p></li><li><p>COPY NUMBERS: 10<sup>4</sup>→ 10<sup>6</sup></p></li><li><p>HOW MUCH OF GENOME: 10-40% of the genome</p><ul><li><p>humans→ 10%</p></li></ul></li></ul><p></p>
27
New cards
  1. Why is there so much satellite DNA? 

Questions to ask:

  • does it have a function?

  • Selfish?→ replicates at expense of the rest of the genome or the organism

  • Tolerated?

  • Junk? no function

  1. What Drosophila chromosomes with large satellite deletions suggest:

    • not much (if any) is needed

  2. Mammalian centromeres evidence?

    • long repeat arrays (unlike yeast)

    • BUT→ if one array is deleted→ an array with different sequence can substitute as a new centromere

28
New cards
  1. Mini/micro satellites compared to satellites

  • more dispersed through the genome

29
New cards
  1. Minisatellites

  • COPY NUMBER: 3-100 copies

  • LENGTH: 30bp

  • WHERE: 103 locations per genome

30
New cards
  1. Microsatellites

  • COPY NUMBER: 10-50 copies

  • LENGTH: 2-6 bp

  • WHERE/HOW MANY IN MAMMAL GENOME: only every few kb→ >105 per mammal genome

31
New cards

PCR analysis of these shows: (and applications)

  1. REPEAT NUMBER:High degrees of polymorphism in repeat number→ allows DNA fingerprinting

  2. INSTABILITY OF NO.: High rate of instability in repeat number→ 0.1-1% per locus per generation

    • a bias towards increased repeat number in succeeding generations

  3. DIVERGENCE: A small degree of sequence divergence between repeats 

    • for minisatellites

    • Expected→ since homogenisation of an array is not instantaneous

  • as the microsttelites do not really change too much over generations→ can be used for DNA fingerprinting→ because it forms different polymorphisms

<ol><li><p>REPEAT NUMBER:High degrees of<strong> polymorphism in repeat number</strong>→ allows DNA fingerprinting</p></li><li><p>INSTABILITY OF NO.: High rate of instability in repeat number→ 0.1-1% per locus per generation</p><ul><li><p>a bias towards<strong> increased</strong>&nbsp;repeat number in succeeding generations</p></li></ul></li><li><p>DIVERGENCE: A small degree of sequence<strong> divergence</strong>&nbsp;between repeats&nbsp;</p><ul><li><p>for<strong> minisatellites</strong></p></li><li><p>Expected→ since homogenisation of an array is not<strong> instantaneous</strong></p></li></ul></li></ol><p></p><ul><li><p>as the microsttelites do not really change too much over generations→ can be used for DNA fingerprinting→ because it forms different polymorphisms</p></li></ul><p></p>
32
New cards

Mechanisms of tandem repeat dynamics

  1. Unequal crossing over

  2. Replication slippage 

  3. Selection

<ol><li><p>Unequal crossing over</p></li><li><p>Replication slippage&nbsp;</p></li><li><p>Selection</p></li></ol><p></p>
33
New cards
  1. Unequal crossing over

  • no bias towards gain or loss

<ul><li><p>no bias towards gain or loss</p></li></ul><p></p>
34
New cards
  1. replication slippage

  • Could give either gain or loss of repeats

  • In Vivo→ there seems to be a bias towards gain

    • WHY: from stable 5’ ‘flaps’ hard to remove from okazaki fragments

<ul><li><p>Could give either gain or loss of repeats</p></li><li><p>In Vivo→ there seems to be a bias towards gain</p><ul><li><p>WHY: from stable 5’&nbsp;‘flaps’ hard to remove from okazaki fragments</p></li></ul></li></ul><p></p>
35
New cards
  1. Selection

  • most eukaryotic genomes can tolerate long repeat arrays

BUT

  • at some length→ there must be selection against long arrays

    • this will reduce average repeat number

<ul><li><p>most eukaryotic genomes can tolerate<strong> long repeat arrays</strong></p></li></ul><p>BUT</p><ul><li><p>at some length→ there must be<strong> selection against long arrays</strong></p><ul><li><p>this will reduce<strong> average repeat number</strong></p></li></ul></li></ul><p></p>
36
New cards

What this diagram shows: Replication slippage

  1. Daughter strand looping out→ due to mispairing

  2. Unpaired 5’ end of a daughter strand in an okaski fragment

  3. a stable G rich non-Watson-Crick secondary structure

    • more stable than pairing of the entire okazaki fragment

      • fragment to the template

      • i.e→ in yeast→ when there is a mutation in the FLAP-endonuclease

<ol><li><p>Daughter strand looping out→ due to mispairing</p></li><li><p>Unpaired 5’ end of a daughter strand in an <strong>okaski</strong> fragment</p></li><li><p>a <strong>stable</strong> G rich <strong>non-Watson-Crick</strong> secondary structure</p><ul><li><p>more stable than pairing of the <strong>entire okazaki fragment</strong></p><ul><li><p>fragment to the template</p></li><li><p>i.e→ in yeast→ when there is a mutation in the FLAP-endonuclease</p></li></ul></li></ul></li></ol><p></p>
37
New cards

Deleterious effects of repeat expansion→ pathogenic microsatellites example:

Huntington’s Disease (HD)

38
New cards

What is HD gene

  • has a CAG repeat

  • encodes a polyQ stretch in the protein

    • prone to aggregate as polyQ length expands

    • forms neurotoxic proteins

    • disrupt many cellular processes

    • aggregate

    • toxic to cells

note: polyQ is poly-glutamine

<ul><li><p>has a CAG repeat</p></li><li><p>encodes a polyQ stretch in the protein</p><ul><li><p>prone to aggregate as polyQ length expands</p></li><li><p>forms neurotoxic proteins</p></li><li><p>disrupt many cellular processes</p></li><li><p>aggregate</p></li><li><p>toxic to cells</p></li></ul></li></ul><p><em>note: polyQ is poly-glutamine</em></p><p></p>
39
New cards

HD gene in most humans vs affected

  • Normal→ 10-35 CAG copies of CAG

  • Affected→ >40

40
New cards

What does more repeats mean

  • more severe degeneration

41
New cards

Repeat severity over generations

  • increase through generations

  • until onset is early and early death leads to dying out of the expansion

42
New cards

Why can it take decade for HD to emerge, if expanded polyQ is toxic to cells?

Evidence from: single cell sequencing of neurons from fresh postmortem brain:

  • somatic repeat expansion:

    1. CAG repeat length grows slowly with age

    2. but only becomes toxic above repeat numbers of 150 AND critical for neuronal survival above 350

  • Single cell sequencing shows:

    • distribution of repeat lengths in patient neuronal populations

      • even as high as 500 copies

<p>Evidence from: single cell sequencing of neurons from fresh postmortem brain:</p><ul><li><p><strong>somatic repeat expansion:</strong></p><ol><li><p>CAG repeat length grows slowly with age</p></li><li><p>but only becomes<strong> toxic</strong>&nbsp;above repeat numbers of<strong> 150 AND&nbsp;</strong>critical for neuronal survival above <strong>350</strong></p></li></ol></li><li><p>Single cell sequencing shows:</p><ul><li><p>distribution of repeat lengths in patient neuronal populations</p><ul><li><p>even as high as<strong> 500 copies</strong></p></li></ul></li></ul></li></ul><p></p>
43
New cards

Overview of CNV/indel and repeat behaviour

  1. copy number variants are polymorphic→ a few have phenotypes

  2. Repetitive DNA is dynamic in populations→ with a balance between expansion and loss

  3. Variation among individuals a snapshot of evolution in action→ as for SNPs

44
New cards

Human genetic Variation and Racism

  • Human genetic variation→ not about racisim

    • only about understanding out genomes, their consequences for biology and human evolution

  • Racism→ social value judgment→ BUT gets dressed up as smattering of biology

45
New cards

How can racism influence how human genetics is studied (example)

  1. Low representation→ of fenetic diversity among the human ‘pangenome’ (although improving)

  2. Analysing human genetic variation:

    • without informed involvement,

    • and prospect of benefits to

      • →  the diversity of groups participating

46
New cards

Eugenics

  • that improving the human gene pool outranks the worth of certain human individuals

    • usually not from the same ethnic, society or white racial group the eugenics is from

<ul><li><p>that improving the human gene pool outranks the worth of certain human individuals</p><ul><li><p>usually not from the same ethnic, society or white racial group the eugenics is from</p></li></ul></li></ul><p></p>
47
New cards

Consequences of eugenics

  • discriminatory social policy

  • through forced sterilsation

  • involuntary euthanasia→ mutder

  • genocide

48
New cards

Eugenics still active

  • 2 child benefits policy

  • views of the current US administration on autism or infectious disease susceptibility

  • white supremacy

  • IQ

49
New cards

IQ

  • has a genetic component like all human traits

BUT

  • to measure pparant IQ racial differences and conclude this is genetic is gross extrapolation

50
New cards

Example of IQ

  • Richard Lynn:

    • Irish low average IQ 

    • seen as an impediment to advancement of the Irish population

    • BUT

    • mysteriously climbed by 10% points in a generation

51
New cards

Extremely dangerous consequences

  • ‘certain groups must be left to fend for themselves because nothing can be done for them and the rest of us matter more’

Why should IQ be associated with skin colour any more than with hair or eye colour?

52
New cards

Overall, the more you understand genome variation, its cuases and consequences

  • the more obvious will be the flaws in ‘scientific’ racism and eugenics