Genomes and Their Evolution

Lecture Presentations by Nicole Tunbridge and Kathleen Fitzpatrick on Genomes and Their Evolution

© 2021 Pearson Education, Inc.

What is a Genome?

  • A genome is the complete set of DNA in an organism, including all of its genes.

Questions Explored by Sequencing and Comparing Genomes

  • What are the functions of the human genome?
  • How do genomes differ in number of genes?
    • Noncoding DNA accounts for 98.5% of the human genome, while genes for proteins and RNA comprise 1.5%.
    • Examples of gene counts across species:
    • Escherichia coli: 4,400 genes
    • Homo sapiens (humans): 21,300 genes
    • Zea mays (corn): 32,000 genes
  • What do gene sequences tell us about evolutionary relationships between species?
    • Comparison of gene sequences can illuminate the evolutionary connections among species.
  • How do genomes evolve over time?
    • Example:
    • Elephant shark: characterized as having a slower-evolving genome
    • Mouse: comparison suggests a relatively faster-evolving genome
    • Tiger tail sea horse: presents a faster-evolving genome than others.

Concept 21.1: The Human Genome Project

  • The Human Genome Project initiated in 1990 and published in 2006.
  • Key contributions:
    • Development of faster, less expensive sequencing techniques.
    • Establishment of genomic study as genomics, the study of whole sets of genes and their interactions.
    • Introduction of bioinformatics, the application of computational methods to store and analyze biological data.

Sequencing the Human Genome

  1. The genome sequencing relied on pooled DNA from a few individuals.
  2. Following sequencing, scientists reviewed the results and agreed on a reference genome, representing the best complete sequence of a species.

Whole-Genome Shotgun Approach

  1. Cut the DNA into overlapping fragments that are short enough for sequencing.
  2. Clone the fragments in plasmids or other vectors.
  3. Sequence each fragment.
  4. Use computer software to order the sequences into one comprehensive sequence.
    • Example sequence fragments:
      • CGCCATCAGT
      • AGTCCGCTATACGA
      • ACGATACTGGT
      • Resulting sequence: CGCCATCAGTCCGCTATACGATACTGGT…

Centralized Resources for Analyzing Genome Sequences

  • Bioinformatics resources provided by various institutions:
    • National Library of Medicine (NLM) and National Institutes of Health (NIH) manage the National Center for Biotechnology Information (NCBI).
    • European Molecular Biology Laboratory.
    • DNA Data Bank of Japan.
    • BGI in Shenzhen, China.
  • The NCBI database is known as GenBank.
    • As of August 2019, GenBank included sequences of 214 million fragments totaling 366 billion base pairs.
    • A widely used software tool on NCBI is BLAST (Basic Local Alignment Search Tool), allowing users to compare DNA sequences with all sequences in GenBank.

Example: BLAST Comparison

  • User input of sequences like:
    • ATGTTTTCCGGTGGCGGCGGCCCGCTGTCCCCCGGAGGAAAGTCGG…

Systems Biology

  • Proteomics is the vast field studying large sets of proteins and their properties.
  • A proteome is defined as the entire set of proteins expressed by a cell or group of cells.
  • Systems Biology aims to compile catalogs of genes and proteins to focus on their functional integration within biological systems.

Concept 21.3: Variability in Genomes

  • Genomes vary widely in size, number of genes, and gene density.
    • Thousands of genome sequences have been completed, with many more in progress.

Genome Size Comparisons

  • Bacteria and archaea typically range from 1 to 6 million base pairs (Mb).
  • Eukaryotic genomes are generally larger, with most plants and animals exceeding 100 Mb; humans have approximately 3,000 Mb.
  • Within domains, there is no systematic correlation between genome size and phenotype.
  • Gene counts:
    • Bacteria and archaea: 1,500 to 7,500 genes
    • Unicellular fungi: ~5,000 genes
    • Multicellular eukaryotes: up to 40,000 genes
    • Number of genes is not directly correlated to genome size.

Genome Sizes and Estimated Gene Numbers

OrganismSize (Mb)Number of GenesGenes per Mb
Haemophilus influenzae1.81,700940
Escherichia coli4.64,400950
Archaeoglobus fulgidus2.22,5001,130
Methanosarcina barkeri4.83,600750
Saccharomyces cerevisiae126,300525
Utricularia gibba8228,500348
Caenorhabditis elegans10020,100200
Arabidopsis thaliana12027,000225
Drosophila melanogaster16514,00085
Daphnia pulex20031,000155
Zea mays (corn)2,30032,00014
Ailuropoda melanoleuca2,40021,0009
Homo sapiens (humans)3,00021,3007
Paris japonica~149,000NDND

*(ND = Not Determined)

Notable Findings:

  • Estimated gene counts for:
    • C. elegans: 20,100 genes within a genome of 100 Mb.
    • Drosophila melanogaster: 14,000 genes in a 165 Mb genome.
    • Unexpectedly, predictions for the human genome suggested 50,000 to 100,000 genes, while actual counts are around 21,300.

Concept 21.4: Noncoding DNA in Multicellular Eukaryotes

  • The human genome consists of 98.5% noncoding DNA, which does not produce proteins.
  • Of the genome:
    • Gene regulatory sequences make up 5%
    • Introns make up around 20%
  • Types of noncoding DNA include:
    • Pseudogenes: Nonfunctional former genes bearing accumulated mutations.
    • Repetitive DNA: Present in multiple copies throughout the genome.

Composition of Human Genome

  • Protein Coding Regions: 1.5%
  • Regulatory Sequences: 5%
  • L1 Sequences: 17%
  • Repetitive DNA (including transposable elements and related sequences): 44%
  • Introns: ~20%
  • Unique Noncoding DNA: 15%
  • Alu Elements: 10%
  • Simple Sequence DNA: 3%
  • Large-segment Duplications: 5-6%

Transposable Elements and Their Role

  • Present in both prokaryotes and eukaryotes, transposable elements can shift locations within the genome.
    • Approximately 75% of human repetitive DNA comprises these elements.
  • Eukaryotic transposable elements categorized into:
    • Transposons: Move via a DNA intermediate and require a transposase enzyme.
    • Retrotransposons: Move via an RNA intermediate and utilize reverse transcriptase.

Mechanism of Transposon Movement

  1. Transposon is copied.
  2. New copy of transposon is inserted.
  3. Resulting mobile copy of transposon is established.

Mechanism of Retrotransposon Movement

  1. Synthesis of a single-stranded RNA intermediate.
  2. Reverse transcriptase synthesizes the first strand of DNA.
  3. Synchronization of the second DNA strand synthesis.

Other Repetitive DNA Elements in Humans

  • Repetitive DNA constitutes about 14% of the human genome.
    • Includes 5-6% involving long sequence duplications.
    • Simple Sequence DNA contains many copies of short tandemly repeated sequences.

Definition of Short Tandem Repeats (STRs)

  • STRs are repeating units of 2 to 5 nucleotides that can vary in repeat numbers among different sites or individuals.
    • Application: DNA Fingerprinting utilizes STR variations for identification purposes.

Concept 21.5: Genome Evolution Mechanisms

  • Mutation: Serves as the basic mechanism driving genomic change and contributing to evolution.
    • Early life forms possessed only essential genes for survival and reproduction.
    • Genomic sizes generally increased through evolutionary times.

Polyploidy and Genome Changes

  • Polyploidy is a condition where genetic accidents in meiosis lead to one or more extra sets of chromosomes.
    • Such variations can persist if the organism survives and reproduces adequately.

Chromosome Structure Alteration

  • Humans possess 23 pairs of chromosomes; chimpanzees have 24 pairs.
    • A significant event post-divergence from a common ancestor involved the fusion of two ancestral chromosomes in the human lineage.

Genome Comparisons for Evolutionary Insight

  • Comparing genomes between closely related species can illuminate recent evolutionary history.
  • Analyzing genomes of distantly related species helps trace evolutionary history.
  • Evolutionary relationships among species can be depicted using tree-shaped diagrams.

Developmental Biology and Genome Conservation

  • Evolutionary Developmental Biology (evo-devo) studies the developmental processes across multicellular organisms.
    • Genomic insights indicate that minor variations in gene sequences or their regulation can result in major morphological differences.
    • Homeotic Genes in Drosophila melanogaster encode for body segment identities of the fly and contain a conserved 180-nucleotide sequence termed the homeobox or Hox genes.

Visual Representation of Gene Conservation

  • Figures representing adult fruit flies and embryos illustrate conserved developmental processes among species such as flies and mice.
  • Evolutionary implications of slight changes in regulatory sequences can lead to significant alterations in body forms.