BS2040 Bioinformatics Lecture Notes

Genomes, Transcriptomes, and Proteomes

  • Genomes: provide sequence data to support transcriptomics and proteomics, which have technical limitations.

Genome Size Variation

  • Genome size varies greatly across organisms.

    • Yeast: 12 Mb, 6,000 genes

    • Fruit Fly: 180 Mb, 13-14,000 genes

    • Human: 3,000 Mb, 25-30,000 genes

Genetic vs. Physical Maps

  • Genetic maps: organization of abstract concepts (genes).

  • Physical maps: correspondence of maps to physical entities (chromosomes).

Linkage Analysis and Genetic Mapping

  • Linked loci are inherited together; unlinked loci assort independently.

  • Recombination frequency measures genetic distance.

Genetic Mapping and Polymorphisms

  • Genetic mapping relies on polymorphic DNA variants.

    • RFLPs, SSLPs, SNPs

  • Variants must be common, show Mendelian inheritance, have low mutation rates, and be easily typed.

Physical Mapping Techniques

  • Restriction mapping

  • Fluorescence in situ hybridization (FISH)

  • Sequence tagged site (STS) mapping

Restriction Mapping

  • Restriction digests can be complete or partial.

  • Partial digestion provides additional information to resolve alternative maps.

FISH (Fluorescent In Situ Hybridization)

  • Localizes DNA sequences on chromosomes using fluorescent probes.

  • High labor, low throughput; resolution issues with repetitive DNA.

STS (Sequence Tagged Site) Mapping

  • Uses unique sequences detectable by DNA hybridization or PCR.

  • Closely linked markers share more fragments.

Prokaryotic Genomes

  • Generally smaller than eukaryotic genomes (~5Mb) with great variety.

  • Organized in a single, circular chromosome (nucleoid).

Bacterial Chromosome Organization

  • High packing density achieved through supercoiling.

  • DNA gyrase and topoisomerase control supercoiling.

  • HU tetramers package DNA.

  • Archaea use histone-like proteins instead of HU.

Genetic Organization in Prokaryotes

  • Operons define functional units.

  • Co-transcribed genes have common regulation.

Genome Fluidity

  • Plasmids are exchanged between bacteria.

  • DNA exchange occurs via transformation, transduction, and conjugation.

  • Lateral gene flow affects species concepts.

Carl Woese and 16S rRNA

  • 16S Ribosomal RNA sequences used to build a tree of all life.

Metagenomics

  • Sequencing DNA from environmental samples.

  • Provides unbiased sampling of microorganisms, including unculturable ones.

Microbiomes

  • Microbial communities in specific environments (e.g., human gut, skin).

  • Analyzed using 16S metagenomics and community sequencing.

Eukaryotic Genome Organization

  • Linear chromosomes; variation in number and size.

Genome Size Trends in Eukaryotes

  • Varies significantly; not always correlated with complexity (C-value paradox).

Genome Complexity

  • Hybridization kinetics (Cot analysis) defines repetitive and unique sequences.

  • Repetitive sequences contribute to genome size variation.

Genome Sequencing Strategies: Top-Down vs. Bottom-Up

  • Top-Down (Hierarchical Shotgun): Break large sequences, then fragment, sequence and assemble.

  • Bottom-Up (Whole Genome Shotgun): Fragment whole genome, sequence, then assemble.

Shotgun Sequencing

  • Random fragmentation of DNA into overlapping fragments.

  • Relies on coverage (5-10X is best).

Genome Annotation

  • Converting assembled sequences into genomic landscapes.

  • Involves de novo and comparative feature identification.

Gene Identification Challenges

  • Eukaryotes: low gene density, large intergenic regions.

  • Complex genes: many exons over large DNA regions.

  • Alternative splicing.

Tools for Gene Identification

  • Signal sensors: identify short sequence motifs (promoters, start/stop codons).

  • Content sensors: detect extended sequence motifs (CpG islands).

Gene Prediction Software

  • Ab initio methods: use sensor and content features.

  • Homology search methods: check similarity to known genes (BLASTP).

  • Machine learning approaches (Neural Networks, Hidden Markov Models).

Saccharomyces cerevisiae Genome

  • First draft completed in 1996.

  • Compact genome with few introns and little repetitive DNA.

Drosophila melanogaster Genome

  • First metazoan WGS genome (2000), supported by BAC-based physical map.

Human Genome

  • 3.2 Gbp, 25-30,000 genes, only 2% exons.

  • Many gene isoforms (alternative splicing).

  • ~50% repeats.

Human Repetitive DNA

  • Tandem repeats: microsatellites, telomeric repeats, minisatellites, major satellites.

  • Interspersed repeats: transposable elements (LINEs, SINEs), duplicons.

Human Transposons

  • LINEs and SINEs (especially Alu) dominate.

  • DNA transposons are ancient and minor.

Comparative Genomics

  • Aligning entire genomes to find conserved regions (synteny).

  • Identifies functional elements (promoters, enhancers, ncRNA genes).

ENCODE Project

  • Encyclopedia of DNA Elements.

  • Aims to annotate functional elements in the human genome.