BS2040 Bioinformatics Lecture Notes
Genomes, Transcriptomes, and Proteomes
Genomes: provide sequence data to support transcriptomics and proteomics, which have technical limitations.
Genome Size Variation
Genome size varies greatly across organisms.
Yeast: 12 Mb, 6,000 genes
Fruit Fly: 180 Mb, 13-14,000 genes
Human: 3,000 Mb, 25-30,000 genes
Genetic vs. Physical Maps
Genetic maps: organization of abstract concepts (genes).
Physical maps: correspondence of maps to physical entities (chromosomes).
Linkage Analysis and Genetic Mapping
Linked loci are inherited together; unlinked loci assort independently.
Recombination frequency measures genetic distance.
Genetic Mapping and Polymorphisms
Genetic mapping relies on polymorphic DNA variants.
RFLPs, SSLPs, SNPs
Variants must be common, show Mendelian inheritance, have low mutation rates, and be easily typed.
Physical Mapping Techniques
Restriction mapping
Fluorescence in situ hybridization (FISH)
Sequence tagged site (STS) mapping
Restriction Mapping
Restriction digests can be complete or partial.
Partial digestion provides additional information to resolve alternative maps.
FISH (Fluorescent In Situ Hybridization)
Localizes DNA sequences on chromosomes using fluorescent probes.
High labor, low throughput; resolution issues with repetitive DNA.
STS (Sequence Tagged Site) Mapping
Uses unique sequences detectable by DNA hybridization or PCR.
Closely linked markers share more fragments.
Prokaryotic Genomes
Generally smaller than eukaryotic genomes (~5Mb) with great variety.
Organized in a single, circular chromosome (nucleoid).
Bacterial Chromosome Organization
High packing density achieved through supercoiling.
DNA gyrase and topoisomerase control supercoiling.
HU tetramers package DNA.
Archaea use histone-like proteins instead of HU.
Genetic Organization in Prokaryotes
Operons define functional units.
Co-transcribed genes have common regulation.
Genome Fluidity
Plasmids are exchanged between bacteria.
DNA exchange occurs via transformation, transduction, and conjugation.
Lateral gene flow affects species concepts.
Carl Woese and 16S rRNA
16S Ribosomal RNA sequences used to build a tree of all life.
Metagenomics
Sequencing DNA from environmental samples.
Provides unbiased sampling of microorganisms, including unculturable ones.
Microbiomes
Microbial communities in specific environments (e.g., human gut, skin).
Analyzed using 16S metagenomics and community sequencing.
Eukaryotic Genome Organization
Linear chromosomes; variation in number and size.
Genome Size Trends in Eukaryotes
Varies significantly; not always correlated with complexity (C-value paradox).
Genome Complexity
Hybridization kinetics (Cot analysis) defines repetitive and unique sequences.
Repetitive sequences contribute to genome size variation.
Genome Sequencing Strategies: Top-Down vs. Bottom-Up
Top-Down (Hierarchical Shotgun): Break large sequences, then fragment, sequence and assemble.
Bottom-Up (Whole Genome Shotgun): Fragment whole genome, sequence, then assemble.
Shotgun Sequencing
Random fragmentation of DNA into overlapping fragments.
Relies on coverage (5-10X is best).
Genome Annotation
Converting assembled sequences into genomic landscapes.
Involves de novo and comparative feature identification.
Gene Identification Challenges
Eukaryotes: low gene density, large intergenic regions.
Complex genes: many exons over large DNA regions.
Alternative splicing.
Tools for Gene Identification
Signal sensors: identify short sequence motifs (promoters, start/stop codons).
Content sensors: detect extended sequence motifs (CpG islands).
Gene Prediction Software
Ab initio methods: use sensor and content features.
Homology search methods: check similarity to known genes (BLASTP).
Machine learning approaches (Neural Networks, Hidden Markov Models).
Saccharomyces cerevisiae Genome
First draft completed in 1996.
Compact genome with few introns and little repetitive DNA.
Drosophila melanogaster Genome
First metazoan WGS genome (2000), supported by BAC-based physical map.
Human Genome
3.2 Gbp, 25-30,000 genes, only 2% exons.
Many gene isoforms (alternative splicing).
~50% repeats.
Human Repetitive DNA
Tandem repeats: microsatellites, telomeric repeats, minisatellites, major satellites.
Interspersed repeats: transposable elements (LINEs, SINEs), duplicons.
Human Transposons
LINEs and SINEs (especially Alu) dominate.
DNA transposons are ancient and minor.
Comparative Genomics
Aligning entire genomes to find conserved regions (synteny).
Identifies functional elements (promoters, enhancers, ncRNA genes).
ENCODE Project
Encyclopedia of DNA Elements.
Aims to annotate functional elements in the human genome.