1/42
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
large scale sequencing
random fragmentation ( cut dna and add tags for genome lib)
sequence fragments
asemble (resequence) —> de novo
gap filling/error check
redundant = more accurate
what is required for DNA sequencing
DNA
primer
nucleotides
dna poly
nucleotide analogs (terminators) ddNTPs
de novo assembly
build genome w out reference
redundancy
reduces sequencing erros
improves assebmly
helps resolve repeates
DNA sequencing - SANGER (requirements)
interrupt dna synthesis OR chain termination
use mixture of ddNTPs and dNTPs
higly accuate
need dna template, primer, nucleotides, and dna polymerase
nucleotide analogs (terminators) stop the reaction at each base
dNTPs
normal nucleotides
- have 3’ OH group
ddNTPs
terminator sequences
- lack 3’ OH group (have H insted)
DNA sequencing - SANGER (mechanism)
ddNTPs lack a 3′-OH group
Once incorporated → chain termination
Produces fragments that differ by exactly one base
ddNTP vs dNTP
ddNTP = terminate DNA synthesis bc lack 3-OH group
dNTP = continued DNA chain elongation bc they HAVE 3-OH group
sanger pros and cons
Result
A mixture of DNA fragments of varying lengths
Fragment length identifies base position
Strengths
Extremely low error rate (<0.1%)
Gold standard for sequence confirmation
Ideal for single-gene studies
Limitations
Low throughput
Short read lengths (<800 bp)
**good for confirming sequences
NGS Next gen sequencing
Extremely high throughput (~1.8 trillion bases / 72 hrs)
Produces >1 billion reads
Read length: ~150 bp (up to ~100 kb with Nanopore)
Higher error rate (~0.26–15%)
Used for:
Whole-genome resequencing
Population genomics
Metagenomics
***SCALE AND DISCOVERY
Oxford Nanopore (MinION)
Very long reads (up to ~100 kb)
Higher error rate than other methods
Useful for:
Genome assembly
Structural variants
Scaffolding
FOR NUC ACID
***ACCURACY & CONFIRMATION
de Bruijn Graphs (Short Reads)
Reads are broken into k-mers
Overlapping k-mers form a graph
Paths through graph = assembled sequence
Efficient but sensitive to sequencing errors
stages of genome assembly
Reads – raw sequencing output
Contigs – continuous assembled sequences
Scaffolds – ordered contigs with gaps (“NNNN”)
Chromosome-level assembly – near-complete genome
Reference genome – official, annotated version used by the community
contig N50
60 kbp
The shortest contig length such that 50% of the genome is contained in contigs of that size or larger
sequence length of the shortest contig at 50% of the total genome
BUSCO
Benchmarking Universal Single Copy Orthologs
Measures genome completeness
>95% score = high-quality assembly
genome annotation tools
Phylosift
Blast2GO
Use annotated reference genomes as a guide
Comparative genomics (evolutionary conservation)
Transcript (RNA-seq) evidence + Protein homology
**provides a roadmap for the genome
databases
Primary databases: raw data archives (e.g., sequence reads)
Secondary databases: curated, interpreted data
Relational databases: link information across databases
Local databases: built from your own NGS data (next gen sequencing)
genetic database landmarks
1965: Atlas of Protein Sequences (Dayhoff)
Protein Information Resource (PIR)
EMBL (1982)
Human Genome Project
GenBank + DDBJ + EMBL collaboration
NCBI Entrez → integrated database access
genbank fila anatomy
Header – metadata, organism, accession number
Features – genes, CDS, regulatory elements
Nucleotide sequence
FASTA Format
> header line
Sequence lines below
Simple, widely used for alignment and analysis
database errors
sources
PCR introduces mutations
PCR amplifies the wrong organism
Gene families → homology confusion
Incorrect taxonomic assignment
PCR
lab tech that makes millions of copies of specific DNA segments (photocopies)
FASTA N
any nucleotide
FASTA R
purine
A
G
FASTA Y
pyrimidines
C
T/U
FATSA -
gaps of intermediate length
FATSA *
translation stop
**amino acids
linear model of progressive evolution
(old view): evolution was once thought of as a straight line from “simple → complex.
linear model modern view
genomes change in a branching, tree-like pattern, reflecting common ancestry and divergence over time
**evo represed w phylo trees not ladders
newick nomenclature
A text-based format for representing phylogenetic trees.
Uses parentheses and commas to show branching relationships.
Common in computational biology and tree-building software.
dichotomy
Modern phylogenetics focuses on relationships and ancestry, not ranking organisms
types of phylo trees
cladogram
phylogram
dendrogram
cladogram
Shows branching order only.
Branch lengths have no meaning.
Emphasizes shared ancestry.
phylogram
Branch lengths are proportional to the amount of evolutionary change.
Reflects genetic distance
Dendrogram (Ultrametric tree)
All tips are the same distance from the root.
Assumes a molecular clock (equal rates of evolution)
homology
Similarity due to shared ancestry.
Example: shared genes between cats and whales despite different functions.
Function ≠ homology; ancestry matters more than what the gene currently does.
homoplasy
Similarity not due to shared ancestry.
Arises from convergent evolution or reversals.
gene dupe & homology
Gene duplication complicates homology because multiple copies exist.
After duplication, copies can evolve independently and take on new functions.
orthologs
paralogs
xenolog
Example: Human Hemoglobin (Hb) Gene Family
Originated through gene duplication events.
Different Hb genes are paralogs.
Functional diversification allows different hemoglobins to act at different developmental stages (e.g., fetal vs adult Hb).
ortholog
Genes in different species that diverged via speciation.
Usually retain similar function.
gene copy
paralog
gene copy
genes related by duplication within a genome.
Often evolve new or specialized functions.
xenolog
gene copy
Genes acquired via horizontal gene transfer.