1/34
cab week 2
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Sanger sequencing
di-deoxy variants of the nucleotide bases ATCG and radiolabelled them to find out
next generation sequencing
high throughput: NGS enables simultaneous sequencing of millions of DNA fragments, drastically increasing the vol of data generated compared to traditional methods like Sanger sequencing
clonal amplification: techniques like bridge PCR or emulsion PCR are used to amplify DNA fragments, ensuring sufficient signal for detection during sequencing
parallel processing: DNA fragments are sequenced in parallel, allowing for rapid analysis of entire genomes or transcriptomes
wide applications: NGS used in diverse fields, including medical diagnostics, evolutionary biology, microbiome analysis and personalised medicine
cost and accuracy: while initial equipment costs are high, NGS provides relatively low per-base sequencing costs and high accuracy, especially with deep sequencing
high throughput
simultaneous sequencing: NGS can sequence millions of DNA fragments at the same time, sunlike Sanger, which processes one fragment at a time
large data output: generates massive amouynts of sewqunecing data in a single run, ideal for whole genome
efficiency: shorter time
scalable: can be used for small scale or large scale
clonal amplification
emulsion PCR: DNA fragments attach to beads and are amplified with oil droplets, creating identical copies of each fragment
bridge PCR: DNA fragments bind to primers on glass slide, forming clusters of DNA through PCR repetitions
signal detection: amplification ensures a detectable signal from sequencing reactions as unamplified single molecules are too faint to read
accuracy boost: amplified DNA clusters reduce sequencing errors by generating stronger and clearer signals
parallel processing
flow cells or chips: NGS spreads DNA fragments across a flow cell or chip, enabling simultaneous sequencing of each fragment
millions of reactions: multiple sequencing reactions occur in parallel, increasing speed and throughput
automated analysis: machines handle most of the sequencing process, allowing researchers to focus on interpreting data
real time monitoring: advanced detectors, like cameras or pH sensors track sequencing reactions in real time
applications
medical diagnostics: used for identifying genetic mutations, tumour profiling and personalised medicine
evolutionary studies: helps trace genetic relationships and evolutionary history by comparing genomes
microbiome analysis: enables study of microbial diversity and function within ecosystems or the human body
forensic biology: assists in solving crimes by identifying individuals through DNA evidence
cost and accuracy
lower cost per base: sequencing costs have dropped significantly, making genome sequencing affordable for routine use
deep sequencing: reads each part of the genome multiple times, ensuring high accuracy by reducing random errors
error detection: dual strand sequencing and improved chemistry help minimise mistakes
initial investment: while equipment is expensive, high throughput and scalability reduce long term costs for researchers
dideoxynucleosides are the key to Sanger sequencing
DNA polymerase joins nucleotides by a condensation reaction between one phosphate and one OH group
Sanger method
primer can be used to direct DNA polymerase to begin synthesising DNA strands from a specific location in target DNA
if di-deoxy nucleotides were incorporated into the reaction mix (dTTP) the polymerase would stall and fall off when it reaches T in the sequence
dNTPs cause chain termination
DNA polymerase stalls and falls off whenever it incorporates a dNTP, releasing a short, aborted chain
if you run 4 separate reactions, (one with a dATP, one with dCTP and one with dGTP), you can find out where As, Cs, Ts and Gs in short sequence by separating these aborted chains by electrophoresis
Sanger sequencing products were first run on gels
reaction products run on agarose/polyacrylamide gels
each band represents one point at which chain has terminated
band pattern indicates the sequence
good compared to earlier techniques
radiolabelled nucleotides are used to visualise the gel band patterns
dNTPs are radiolabelled with radioactive phosphorus (32P)
toxic and difficult to work with
gels need to be overlaid with x-ray film overnight
using 4 different fluorescent tags enable single lane sequencing
eventually realised that nucleotides could be fluorescently labelled instead of radiolabelled
non toxic and faster to read
no requirement for radiation protection or X-ray film
if 4 different fluorescent tags were used for ATCG, the sequencing reaction products could be separated in just one lane (rather than 4)
the separation of sequencing products moved on from gels to capillaries
companies soon built machines that separated DNA in capillaries, rather than gels and used lasers to detect the terminated chains as they moved past detector in real time
accelerated progress, up to 1000 bp per reaction and multiple reactions can be measured at once on the same machine (multiple capillaries)
sequencing the first human genome
started in 90s
3 billion bases: too long to sequence directly
genome broken down into bacterial artificial chromosomes (BACS)
each BAC: 150,000 bp
BAC sequences then aligned based on overlap
called shotgun sequencing
requires lots of cloning
amplification: how emulsion PCR works
genomic DNA is sheared randomly to create mixture of short genome fragments
short DNA pieces are called adapters which are like primers and are ligated to each fragment end
2 different adapters are ligates to each end of the fragment
how emulsion PCR works
billions of tiny plastic beads are mixed with labelled DNA fragments
each bead is coated in primer matching adaptor sequence
mixture is dilutes to the point where there is just one DNA fragment per bead
mixture is vibrates in oil to form emulsion of tiny droplets (each containing 1 bead and 1 molecule of DNA0
mixture containing droplets is subjected to standard PCR thermal cycling
causes each bead to become covered in thousands of copies of same DNA sequence
each of these segments is 400-700 bp long
this amplification is required to generate sufficient DNA molecules so that they can be detected by fluorescence which is basis of most high throughput sequencers
how bridge PCR works
amplification can also be achieved on a glass slide using bridge PCR
short adapter sequences ligated to both ends of the DNA fragments bind to corresponding primers on a slide
PCR cycling is used to form dsDNA bridges with nearby adaptors for the other end of the target DNA
this process is repeated until localised clusters form on glass slide, each cluster containing hundreds of copies of same DNA sequence
HTS sequences millions of clonal clusters at the same time
once the millions pf clusters have been formed on glass slides/beads, they are spread evenly over glass slide or chip
every bead or cluster is then monitored by a sensitive camera/pH detecting chip
single type of nucleotide is washed over the whole chip causing a signal to be released from each cluster, then repeated with a different nucleotide
millions of beads or clusters can be sequenced in parallel (therefore, term: high throughput)
current high throughput DNA sequencing (HTS) technologies
Illumina: reversible dye terminator sequencing
Roche 454: pyrosequencing
Ion torrent: pH detection
Oxford Nanopore: single molecule real time sequencing
Pacific Bioscience: single molecule real time sequencing
Illumina sequencing by synthesis
bridge PCR using 2 adapters is used to prepare clusters on glass slides
primer is added to bind to the first adaptor sequence to enable DNA polymerase extension
fluorescently labelled nucleotides (A T C or G) are flowed across flow cell
modified so only one nucleotide can be added to growing chain at a time
DNA polymerase adds one nucleotide to each chain
Illumina photographs 40 mill clusters per flow cell
camera photographs entire flow cell and records colour of each cluster under fluorescence excitation
colour indicates what type of nucleotide (A T C G) was last to bind to that cluster
if next base in sequence was T, whole cluster would be red (since red T was latest nucleotide added)
another solution is flowed across cell, causing fluorescent tags to be cleaved from nucleotides
removes all colour from each cluster, enables another nucleotide to be added in next round
Illumina sequences each strand once in each direction
one the read from the first adaptor is finished, the complementary strand is washed away
process is then repeated in the same way, but starting from the other primer (second adaptor)
Illumina sequencing checks the sequence in both directions
Roche 454 sequencing
Roche 454 sequencing begins with cluster formation on plastic beads which are then spread onto a flow cell
1 type of nucleotide is flowed over the flow cell at a time
when DNA polymerase incorporates a nucleotide, pyrophosphate is released
converted to ATP, which is then used by an enzyme to produce a flash of light
camera records the flash of light
magnitude of flash of light indicates how many nucleotides were added
unbound nucleotides are degraded by apyrase and washed away
next type of nucleotide is washed over the flow cell and the process repeats
method is also called pyrosequencing
Ion Torrent sequencing works
ion torrent sequencing also uses beads trapped in tiny wells of a flow cell
at the bottom of the flow cell, silicon chip acts as a tiny pH meter
when a nucleotide binds, H+ is released and this changes the pH in each well
magnitude of the change in pH indicates how many of that type of nucleotide were added each time
unbound nucleotides are washed away and the next type of nucleotide is flowed across the flow cell
repetition of this ATCG cycle reveals changes in pH and sequence of the cluster on each bead in each well
this method struggles to accurately sequence long repeats of the same nucleotide (like pyrosequencing)
pros and cons of each method
the 3 HTS methods discussed to produce read lengths of 400-700 bp
these must then be aligned with a reference genome to figure out where they fit in
they are not very useful for genomes that have not been sequenced before
resequencing vs. de novo sequencing
resequencing: resequencing is the term for sequencing a member of a species of which other members have already been sequenced. Reads need only be aligned to a reference genome, so need only be several hundred bp long
de novo sequencing: is the term used to sequence a genome from scratch. It is much more time-intensive and costly, and requires reads at least 1,000 bp long, which is partly why the first human genome was so expensive.
computers are used to re-assemble the reads against a reference genome
most high throughput sequencing technologies require lots of computer time to re-assemble a genome sequence from many thousands of short reads by alignment with a reference genome
definitions of depth and coverage
most high throughput sequencing platforms generate millions of short reads
aligned toa reference genome by homology
the same part of a genome can be sequenced many times by different fragments
referred to as the depth
a genome may be sequenced to 30x depth
increasing depth increases the sequence accuracy enormously
depth and coverage
not all of the genome is equally easy to sequence, some regions are easy and some are hard
only 90-95% of genome is fully sequenced and to good depth
called coverage
why genome sequencing is useful
medical diagnosis
rare mutations, tumour profiling, personalised med
biotechnology
discovering new genes, developing useful constructs
forensic biology
identifying suspects from DNA samples
virology
new viruses, diagnosis, monitoring of recombination
biological systematics
enormously useful for studies of evolution and relatedness
biomedical research
new gene discovery, microbiome
GWAS for diseases
common variants have been identified which are associated with risk of numerous common diseases
useful to be able to sequence the genomes of 1000s of people
improves personalised med
RNA sequencing
RNA req uses HTS to monitor a cell’s transcriptome
can give us unparalleled insight into exactly what type of program a cell is running
bisulphite sequencing for epigenetics
epigenetics marks on DNA control cell differentiation
methyl-cytosine is a key epigenetic mark
treatment of DNA with bisulphite converts cytosine residues to uracil, but leaves 5 methylcytosine resides unaffected
HTS of bisulphite treated DNA can be used to discover where cytosine has been methylated in the genome