Genomes and Their Evolution
Lecture Presentations by Nicole Tunbridge and Kathleen Fitzpatrick on Genomes and Their Evolution
© 2021 Pearson Education, Inc.
What is a Genome?
- A genome is the complete set of DNA in an organism, including all of its genes.
Questions Explored by Sequencing and Comparing Genomes
- What are the functions of the human genome?
- How do genomes differ in number of genes?
- Noncoding DNA accounts for 98.5% of the human genome, while genes for proteins and RNA comprise 1.5%.
- Examples of gene counts across species:
- Escherichia coli: 4,400 genes
- Homo sapiens (humans): 21,300 genes
- Zea mays (corn): 32,000 genes
- What do gene sequences tell us about evolutionary relationships between species?
- Comparison of gene sequences can illuminate the evolutionary connections among species.
- How do genomes evolve over time?
- Example:
- Elephant shark: characterized as having a slower-evolving genome
- Mouse: comparison suggests a relatively faster-evolving genome
- Tiger tail sea horse: presents a faster-evolving genome than others.
Concept 21.1: The Human Genome Project
- The Human Genome Project initiated in 1990 and published in 2006.
- Key contributions:
- Development of faster, less expensive sequencing techniques.
- Establishment of genomic study as genomics, the study of whole sets of genes and their interactions.
- Introduction of bioinformatics, the application of computational methods to store and analyze biological data.
Sequencing the Human Genome
- The genome sequencing relied on pooled DNA from a few individuals.
- Following sequencing, scientists reviewed the results and agreed on a reference genome, representing the best complete sequence of a species.
Whole-Genome Shotgun Approach
- Cut the DNA into overlapping fragments that are short enough for sequencing.
- Clone the fragments in plasmids or other vectors.
- Sequence each fragment.
- Use computer software to order the sequences into one comprehensive sequence.
- Example sequence fragments:
- CGCCATCAGT
- AGTCCGCTATACGA
- ACGATACTGGT
- Resulting sequence: CGCCATCAGTCCGCTATACGATACTGGT…
- Example sequence fragments:
Centralized Resources for Analyzing Genome Sequences
- Bioinformatics resources provided by various institutions:
- National Library of Medicine (NLM) and National Institutes of Health (NIH) manage the National Center for Biotechnology Information (NCBI).
- European Molecular Biology Laboratory.
- DNA Data Bank of Japan.
- BGI in Shenzhen, China.
- The NCBI database is known as GenBank.
- As of August 2019, GenBank included sequences of 214 million fragments totaling 366 billion base pairs.
- A widely used software tool on NCBI is BLAST (Basic Local Alignment Search Tool), allowing users to compare DNA sequences with all sequences in GenBank.
Example: BLAST Comparison
- User input of sequences like:
- ATGTTTTCCGGTGGCGGCGGCCCGCTGTCCCCCGGAGGAAAGTCGG…
Systems Biology
- Proteomics is the vast field studying large sets of proteins and their properties.
- A proteome is defined as the entire set of proteins expressed by a cell or group of cells.
- Systems Biology aims to compile catalogs of genes and proteins to focus on their functional integration within biological systems.
Concept 21.3: Variability in Genomes
- Genomes vary widely in size, number of genes, and gene density.
- Thousands of genome sequences have been completed, with many more in progress.
Genome Size Comparisons
- Bacteria and archaea typically range from 1 to 6 million base pairs (Mb).
- Eukaryotic genomes are generally larger, with most plants and animals exceeding 100 Mb; humans have approximately 3,000 Mb.
- Within domains, there is no systematic correlation between genome size and phenotype.
- Gene counts:
- Bacteria and archaea: 1,500 to 7,500 genes
- Unicellular fungi: ~5,000 genes
- Multicellular eukaryotes: up to 40,000 genes
- Number of genes is not directly correlated to genome size.
Genome Sizes and Estimated Gene Numbers
| Organism | Size (Mb) | Number of Genes | Genes per Mb |
|---|---|---|---|
| Haemophilus influenzae | 1.8 | 1,700 | 940 |
| Escherichia coli | 4.6 | 4,400 | 950 |
| Archaeoglobus fulgidus | 2.2 | 2,500 | 1,130 |
| Methanosarcina barkeri | 4.8 | 3,600 | 750 |
| Saccharomyces cerevisiae | 12 | 6,300 | 525 |
| Utricularia gibba | 82 | 28,500 | 348 |
| Caenorhabditis elegans | 100 | 20,100 | 200 |
| Arabidopsis thaliana | 120 | 27,000 | 225 |
| Drosophila melanogaster | 165 | 14,000 | 85 |
| Daphnia pulex | 200 | 31,000 | 155 |
| Zea mays (corn) | 2,300 | 32,000 | 14 |
| Ailuropoda melanoleuca | 2,400 | 21,000 | 9 |
| Homo sapiens (humans) | 3,000 | 21,300 | 7 |
| Paris japonica | ~149,000 | ND | ND |
*(ND = Not Determined)
Notable Findings:
- Estimated gene counts for:
- C. elegans: 20,100 genes within a genome of 100 Mb.
- Drosophila melanogaster: 14,000 genes in a 165 Mb genome.
- Unexpectedly, predictions for the human genome suggested 50,000 to 100,000 genes, while actual counts are around 21,300.
Concept 21.4: Noncoding DNA in Multicellular Eukaryotes
- The human genome consists of 98.5% noncoding DNA, which does not produce proteins.
- Of the genome:
- Gene regulatory sequences make up 5%
- Introns make up around 20%
- Types of noncoding DNA include:
- Pseudogenes: Nonfunctional former genes bearing accumulated mutations.
- Repetitive DNA: Present in multiple copies throughout the genome.
Composition of Human Genome
- Protein Coding Regions: 1.5%
- Regulatory Sequences: 5%
- L1 Sequences: 17%
- Repetitive DNA (including transposable elements and related sequences): 44%
- Introns: ~20%
- Unique Noncoding DNA: 15%
- Alu Elements: 10%
- Simple Sequence DNA: 3%
- Large-segment Duplications: 5-6%
Transposable Elements and Their Role
- Present in both prokaryotes and eukaryotes, transposable elements can shift locations within the genome.
- Approximately 75% of human repetitive DNA comprises these elements.
- Eukaryotic transposable elements categorized into:
- Transposons: Move via a DNA intermediate and require a transposase enzyme.
- Retrotransposons: Move via an RNA intermediate and utilize reverse transcriptase.
Mechanism of Transposon Movement
- Transposon is copied.
- New copy of transposon is inserted.
- Resulting mobile copy of transposon is established.
Mechanism of Retrotransposon Movement
- Synthesis of a single-stranded RNA intermediate.
- Reverse transcriptase synthesizes the first strand of DNA.
- Synchronization of the second DNA strand synthesis.
Other Repetitive DNA Elements in Humans
- Repetitive DNA constitutes about 14% of the human genome.
- Includes 5-6% involving long sequence duplications.
- Simple Sequence DNA contains many copies of short tandemly repeated sequences.
Definition of Short Tandem Repeats (STRs)
- STRs are repeating units of 2 to 5 nucleotides that can vary in repeat numbers among different sites or individuals.
- Application: DNA Fingerprinting utilizes STR variations for identification purposes.
Concept 21.5: Genome Evolution Mechanisms
- Mutation: Serves as the basic mechanism driving genomic change and contributing to evolution.
- Early life forms possessed only essential genes for survival and reproduction.
- Genomic sizes generally increased through evolutionary times.
Polyploidy and Genome Changes
- Polyploidy is a condition where genetic accidents in meiosis lead to one or more extra sets of chromosomes.
- Such variations can persist if the organism survives and reproduces adequately.
Chromosome Structure Alteration
- Humans possess 23 pairs of chromosomes; chimpanzees have 24 pairs.
- A significant event post-divergence from a common ancestor involved the fusion of two ancestral chromosomes in the human lineage.
Genome Comparisons for Evolutionary Insight
- Comparing genomes between closely related species can illuminate recent evolutionary history.
- Analyzing genomes of distantly related species helps trace evolutionary history.
- Evolutionary relationships among species can be depicted using tree-shaped diagrams.
Developmental Biology and Genome Conservation
- Evolutionary Developmental Biology (evo-devo) studies the developmental processes across multicellular organisms.
- Genomic insights indicate that minor variations in gene sequences or their regulation can result in major morphological differences.
- Homeotic Genes in Drosophila melanogaster encode for body segment identities of the fly and contain a conserved 180-nucleotide sequence termed the homeobox or Hox genes.
Visual Representation of Gene Conservation
- Figures representing adult fruit flies and embryos illustrate conserved developmental processes among species such as flies and mice.
- Evolutionary implications of slight changes in regulatory sequences can lead to significant alterations in body forms.