GENOMICS 2.4: Genome sequencing

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/28

There's no tags or description

Looks like no tags are added yet.

Last updated 10:25 PM on 3/18/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

29 Terms

New cards

Which is the oldest sequencing method? (first generation)

Sanger sequencing, still used today, based on PCR

New cards

How does Sanger sequencing work?

You start with:
1. dsDNA tempate
2. Primer that binds to it
3. DNA pol
4. dNTPS
5. A small amount of ddNTP labeled with fluorescence → you run 4 reactions, one for each ddNTP
DNA pol extends the primer, until by chance it adds a ddNTP: it stops the reaction → it will generate many fragments of different lenghts
Separate the fragments by size doing capillary electrophoresis with laser detection → short fragments run faster while long ones slower
Before: gel electrophoresis
Read the sequence from smallest to largest

<ol><li><p>You start with:</p><ol><li><p>dsDNA tempate</p></li><li><p>Primer that binds to it</p></li><li><p>DNA pol</p></li><li><p>dNTPS</p></li><li><p>A small amount of ddNTP labeled with fluorescence → you run 4 reactions, one for each ddNTP</p></li></ol></li><li><p>DNA pol extends the primer, until by chance it adds a ddNTP: it stops the reaction → it will generate many fragments of different lenghts</p></li><li><p>Separate the fragments by size doing capillary electrophoresis with laser detection → short fragments run faster while long ones slower<br>Before: gel electrophoresis</p></li><li><p>Read the sequence from smallest to largest</p></li></ol><p></p>

New cards

What is a read?

It is the typical maximum nt that can be read in a single experiment → depends on the sequencing technology used

New cards

What is the sequence reads lenght of Sanger?

800 bp (between 500 and 900 bp)

New cards

Explain the steps of Sanger Sequencing Workflow for genomic libraries:

Wet lab Workflow:

DNA extraction
Fragment genomic DNA
Clone fragments in vectors and transform in bacteria
PCR amplification
Sanger Sequencing

Dry lab workflow:

Analysis through bioinformatics: read → contig → genome

<p>Wet lab Workflow:</p><ol><li><p>DNA extraction </p></li><li><p>Fragment genomic DNA</p></li><li><p>Clone fragments in vectors and transform in bacteria</p></li><li><p>PCR amplification</p></li><li><p>Sanger Sequencing</p></li></ol><p>Dry lab workflow:</p><ol start="5"><li><p>Analysis through bioinformatics: read → contig → genome</p></li></ol><p></p>

New cards

What instrument do we use to do high-throughput sequencing with the Sanger method?

We use Automated Sequencers → e.g. ABI Prism 3730 DNA Analyzer

The human genome is 3300 Mb → sanger was used to sequence the human genome, it would take a lot of time to do one by one (not possible for the time it took them)
Thanks to automated robots, it made possible the use of microtrays of 96 wells → 96 reacttions at a time, 20 plates for day, 2 Mb per day → to sequence the whole genome it would take 1650 days, if more machines were used in parallel it would take less

New cards

What is the raw data that comes from the sequencing robot?

A chromatogram

New cards

What is the process to go from read sequenced into assembled genomes?

Sanger Automated Sequencing
Reading Sequence traces
Contig Assembbly
Genome Assembly

New cards

What is automated base calling? (Reading Sequence traces)

It is a software that detects which base represents each peak of the chromatogram → transforms chromatogram to a DNA sequence

The software looks at:

Distance between peaks
Local minima (where one peak ends and another begins)
Peak assignment
Identify double peaks → it can mean lot of things, e.g. SNPs

It defines a quality score for each position (error rate accepted 0,01)

You have to eliminate the first 30 bp (lots of noise) and you have to remove after 800 bp (there is a decline in quality)

<p>It is a software that detects which base represents each peak of the chromatogram → transforms chromatogram to a DNA sequence</p><p>The software looks at:</p><ul><li><p>Distance between peaks</p></li><li><p>Local minima (where one peak ends and another begins)</p></li><li><p>Peak assignment</p></li><li><p>Identify double peaks → it can mean lot of things, e.g. SNPs</p></li></ul><p>It defines a quality score for each position (error rate accepted 0,01)</p><p>You have to eliminate the first 30 bp (lots of noise) and you have to remove after 800 bp (there is a decline in quality)</p>

New cards

What does the software decide when there are double peaks?

The software will respond with N → you have to run a new experiment to identify it

New cards

Explain contig assembly:

You go from multiple alignments of sequence reads to assembly of a contiguous DNA sequence (contig)
During this step ambiguities will be resolved and we will reach a sequence consesnus
- N spots will be solved by comparing all the data → if you have 3 Gs and 2 Ts, you will decide N is G
- Take into account quality scores

<ul><li><p>You go from multiple alignments of sequence reads to assembly of a contiguous DNA sequence (contig)</p></li><li><p>During this step ambiguities will be resolved and we will reach a sequence consesnus</p><ul><li><p>N spots will be solved by comparing all the data → if you have 3 Gs and 2 Ts, you will decide N is G</p></li><li><p>Take into account quality scores</p></li></ul></li></ul><p></p>

New cards

After contig assembly, there will be —

Genome assembly

New cards

What are the callenges when assembling a complex genome?

Tandemly repeated DNA and genome-wide repeats

New cards

What are the different strategies for genome sequencing?

Hierarchical sequencing → International Human Genome Sequencing Consortium
Shotgun sequencing → private iniative: Celera (Craig Venter)

<ul><li><p>Hierarchical sequencing → International Human Genome Sequencing Consortium</p></li><li><p>Shotgun sequencing → private iniative: Celera (Craig Venter)</p></li></ul><p></p>

New cards

Do hierarchical sequencing and shotgun sequencing use different technologies?

No, they both use Sanger sequencing

New cards

What are the steps of Hierarchical Sequencing?

It’s a map based, clone contig, clone by clone strategy → very laborous

Fragment genome in big pieces
Generate BAC library
Physicial or genetic mapping → order the BACs
1. Markers can be physical or genetic
2. If the markers are STS (physical marker)
  - Chromosome walking by:
    - hybridation: Use the STS as a probe → find BACs that contain it → take a new STS from the end → repeat.
    - PCR: Use STS‑specific primers → test each BAC → positives overlap → take a new STS from the end → repeat.
  - Fingerprinting:
    - You digest each BAC with a restriction enzyme
    - If two BACs share many fragment sizes → they overlap
    - If they share none → they don’t
  - Both can be used together
Fragment BACs in smaller pieces (2 kb) to use Sanger (only the gene of interest, not the whole backbone)
- forward and reverse primers (800 bp each → 1,6 kb ideally).
- the enzyme restriction sites flanking the gene of interest are the same, so you can use the same primers
Sanger Sequencing → different reads generated (forward and reverse reads)
Assemble reads into contigs
Assemble contigs into the whole genome

New cards

What happens if there is a gap between your contigs? You know that contig 1 and 2 are continuous, but there is a sequence in the middle missing (the insert was bigger than 1,6 kb, the primers did not clone all)

You have a sequence gap: you have to design insert primers and try to find the sequence in the library to fill in the gap.

New cards

What if the missing sequence is not in your library?

You have a physical gap: you have to start over since the cloning with plasmids → you do a new library with another vector (normally a lambda phage vector) and use again the insert primers to try to find the sequence

<p>You have a physical gap: you have to start over since the cloning with plasmids → you do a new library with another vector (normally a lambda phage vector) and use again the insert primers to try to find the sequence</p>

New cards

What are the advantages of hierarchical sequencing?

Facilitates the correct assembly of tandem repeat sequences
Coverage is uniform across the genome due to sequencing ordered clones

CHAT

"Facilita el ensamblaje correcto de secuencias repetidas en tándem."
- Explicación: El genoma está lleno de regiones repetitivas, como si el libro tuviera párrafos que se repiten casi idénticos en diferentes lugares. Si intentas leer el libro entero de una sola vez (como en la secuenciación de genoma completo), cuando encuentres esos párrafos repetidos, no sabrás a qué parte del libro pertenecen.
- La ventaja jerárquica: Como ya has ordenado los "capítulos" (clones) antes de leerlos, sabes exactamente en qué parte del libro se encuentra ese párrafo repetido. Es mucho más fácil no perderse.
"La cobertura es uniforme en todo el genoma debido a la secuenciación de clones ordenados."
- Explicación: Cuando creas el mapa de "capítulos", te aseguras de que no te falta ninguno y que todos están en orden. Al secuenciar, te comprometes a leer cada "capítulo" por completo.
- La ventaja: Esto garantiza que vas a leer absolutamente todo el libro de principio a fin, sin saltarte páginas. No hay zonas del genoma que se lean más que otras; la "cobertura" es pareja.

New cards

What are the disadvantages of hierarchical sequencing?

Only minimal clones are sequenced → you don’t sequence the entire BAC library, only the minimal set that covers the genome (low coverage: how many times, on average, each base of the genome is sequenced)
Large initial labor-intensive investment in generating a genomic library and ordering clones to create contig assembly

CHAT

"Cobertura: Solo se secuencia la colección mínima de clones que representan el genoma."
- Explicación: Siguiendo la analogía, solo imprimes y lees un único ejemplar de cada "capítulo". Esto es eficiente en teoría, pero...
- El problema: Si al leer un capítulo (secuenciar un clon) tienes una duda o hay un error, no tienes otro ejemplar del mismo capítulo para contrastar. En métodos más modernos, tienes muchas copias del libro entero, lo que permite corregir errores por comparación.
"Gran inversión laboriosa inicial en la generación de una biblioteca genómica y el ordenamiento de clones para crear un mapa de clones contiguos."
- Explicación: Este es el punto clave. Antes de poder siquiera empezar a "leer" el ADN, tenías que hacer un trabajo titánico y lento.
- El problema: Crear la biblioteca de clones (fotocopiar los capítulos) y luego ordenarlos (descifrar el índice del libro) llevaba meses o años de trabajo en el laboratorio, con un coste muy elevado. Era como tener que construir el laboratorio y los instrumentos antes de poder empezar el experimento.

New cards

Explain Shotgun Sequencing:

Whole-genome shearing in 2-3 kb fargments
Cloning in a plasmid library
Sequencing with 5-10 fold redundancy (coverage: how many times, on average, each base of the genome is sequenced.)
Assembly requires high computational capacity

New cards

Why does whole-genome shotgun sequencing need high computational sequencing?

2 Mb/day: human genome in about 16500 days → 165 day if 100 sequencers

That is why Celera used lots of machines running in parallel

New cards

Why was the assembly a two-step process in whole-genome shotgun sequencing?

The shotgun strategy was not very good for repetive DNA (they did not have physical maps)

Initial phase: successful assembly of up to 90% of the genome, repetitive sequences were left out
Finishing phase: very laborious, it requires new genomic libraries to solve:
- Assembly of repetitive sequences
- Closing sequence and physical gaps → more dificult because they are bigger gaps

New cards

How did they verify the assembly of repetitive sequences in the assembly of shotgun sequencing?

They used public information from the other project to design forward and reverse primers of the repeats, if they did not find the product in the final assembly it meant that they had lost that sequence

<p>They used public information from the other project to design forward and reverse primers of the repeats, if they did not find the product in the final assembly it meant that they had lost that sequence</p>

New cards

What is a scaffold?

In genomic mapping, a series of contigs that are in the right order but not becessarily connected in one continous stretxh of sequence

New cards

How did they solve gaps in the assembly of shotgun sequencing?

Sequence gaps closed by further sequencing of individual clones
- A region is present in the plasmid library
- But you didn’t sequence enough clones
- Or the reads didn’t overlap enough
- Or the region is hard to sequence (GC‑rich, repeats, etc.)
Physical gaps are difficult to close: require analysis of 100 kb fragment library → they needed to generate BAC libraries to try and close the gaps
- A region of the genome is missing entirely from the plasmid library
- BACs (100–300 kb inserts) are much more stable and represent the genome more completely.

<ul><li><p>Sequence gaps closed by further sequencing of individual clones</p><ul><li><p><span>A region <em>is</em> present in the plasmid library</span></p></li><li><p><span>But you didn’t sequence enough clones</span></p></li><li><p><span>Or the reads didn’t overlap enough</span></p></li><li><p><span>Or the region is hard to sequence (GC‑rich, repeats, etc.)</span></p></li></ul></li><li><p>Physical gaps are difficult to close: require analysis of 100 kb fragment library → they needed to generate BAC libraries to try and close the gaps</p><ul><li><p><span>A region of the genome is <strong>missing entirely</strong> from the plasmid library</span></p></li><li><p>BACs (100–300 kb inserts) are much more stable and represent the genome more completely.</p></li></ul></li></ul><p></p>

New cards

What are the advantages of whole-genome shotgun sequencing?

Much faster in assembling a draft with 90% of the genome

New cards

What are the limitations of whole-genome shotgun sequencing?

Requires high computational capacity
Completeness of the genome: the fisnishing phase is very labor intensive → BAC libraries must be generated to close physical gaps
Accuracy: coverage is not uniform across the genome due to the random nature of the process

New cards

Comparison between Hierarchical shotgun sequencing and Whole genome shotgun sequencing: