Chapter 21

Genomes and Their Evolution

Key Concepts

  • 21.1: The Human Genome Project fostered development of faster, less expensive sequencing techniques.

  • 21.2: Scientists use bioinformatics to analyze genomes and their functions.

  • 21.3: Genomes vary in size, number of genes, and gene density.

  • 21.4: Multicellular eukaryotes have a lot of noncoding DNA and many multigene families.

  • 21.5: Duplication, rearrangement, and mutation of DNA contribute to genome evolution.

  • 21.6: Comparing genome sequences provides clues to evolution and development.

21.1 The Human Genome Project & Sequencing Techniques

  • Genomics: The study of whole sets of genes and their interactions.

  • Bioinformatics: Application of computational methods to store and analyze biological data.

  • Human Genome Project: Officially started in 1990, an international, publicly funded consortium aimed at sequencing the entire human genome.

    • Completed in 2003, with analysis of each chromosome finalized in 2006.

    • The sequence is over 99% complete, with gaps remaining in repetitive DNA regions.

    • A reference genome was created by pooling DNA from a few individuals.

  • Mapping a Genome: Determine the complete nucleotide sequence of each chromosome.

    • Initially, a methodical approach was used, ordering fragments based on earlier genetic mapping.

    • J. Craig Venter (Celera Genomics) introduced the whole-genome shotgun approach in 1998: cloning and sequencing random DNA fragments, then using computer programs to assemble the sequences.

  • Technological Advancements:

    • Sequencing speed increased dramatically: from 1,000 base pairs a day in the 1980s to 1,000 base pairs per second by 2000.

    • Next-generation sequencing machines can sequence nearly 35 million base pairs per second.

    • Cloning step is now unnecessary due to the sensitivity of these techniques; DNA can be sequenced directly.

  • High-Throughput: Methods that can analyze biological materials very rapidly and produce enormous volumes of data.

  • Cost Reduction:

    • Sequencing the first human genome took 13 years and cost between 500500 million and 11 billion.

    • By 2007, the same task took four months at a cost of 11 million.

    • In 2019, complete genomes of 48 individuals could be sequenced in 44 hours for under 1,0001,000 per genome.

  • Metagenomics: DNA from an entire community of species (a metagenome) is collected from an environmental sample and sequenced.

    • Computer software sorts partial sequences and assembles them into individual species' genomes.

    • Eliminates the need to culture each species separately.

    • Applied to communities in the human intestine and extreme habitats like thermal springs.

21.2 Bioinformatics: Analyzing Genomes

  • Centralized Resources: Established to coordinate efforts and track sequences from the Human Genome Project.

  • National Center for Biotechnology Information (NCBI): Maintained by the National Library of Medicine (NLM) and the National Institutes of Health (NIH), offering extensive bioinformatics resources.

  • GenBank: NCBI database of sequences, containing 214 million fragments of genomic DNA, totaling 366 billion base pairs as of August 2019.

  • BLAST (Basic Local Alignment Search Tool): Software program on the NCBI website that allows users to compare a DNA sequence with every sequence in GenBank.

  • Protein Data Bank: Maintained by Rutgers University and the University of California, San Diego, a worldwide database of all three-dimensional protein structures that have been experimentally determined.

  • Gene Annotation: Identifying all protein-coding genes in a sequence and determining their functions.

    • Uses computers to scan for patterns indicating the presence of genes, such as transcriptional and translational start and stop signals, RNA-splicing sites, and promoter sequences.

    • Looks for expressed sequence tags (ESTs) collected from cDNA sequences.

  • Confirming Gene Identities: Using RNA-seq or other methods to show that the relevant RNA is actually expressed from the proposed gene.

  • WD40 domains: Present in many eukaryotic proteins and known to function in signal transduction pathways.

  • Biochemical Approach: Determining the three-dimensional structure of the protein and potential binding sites for other molecules.

  • Functional Studies: Blocking or disabling the gene in an organism to see how the phenotype is affected. e.g., CRISPR-Cas 9 system

Understanding Genes and Gene Expression at the Systems Level

  • ENCODE (Encyclopedia of DNA Elements): A long-term research project to learn everything possible about the functionally important elements in the human genome.

    • Identifies protein-coding genes and genes for noncoding RNAs, along with sequences that regulate gene expression, such as enhancers and promoters.

    • Characterizes DNA and histone modifications and chromatin structure.

    • Found that about 75% of the human genome is transcribed at some point, even though less than 2% codes for proteins.

  • The aim was to focus on the epigenomes of stem cells, normal tissues from mature adults, and relevant tissues from individuals with dis eases such as cancer and neurodegenerative and autoimmune disorders.

Systems Biology

  • Roadmap Epigenomics Project: Aims to characterize the epigenome - the epigenetic features of the genome - of hundreds of human cell types and tissues.

  • Proteomics: Systematic studies of sets of proteins and their properties, such as their abundance, chemical modifications, and interactions.

  • Systems Biology: aims to model the dynamic behavior of whole biological systems based on the study of interactions among the systems parts.

21.3 Genome Size, Number of Genes, and Gene Density

  • General difference in genome size between prokaryotes and eukaryotes.

  • Most bacterial genomes have between 1 and 6 million base pairs (Mb).

  • Eukaryotic genomes tend to be larger: The genome of the single-celled yeast Saccharomyces cerevisiae has about 12 Mb, while most animals and plants, which are multicellular, have genomes of at least 100 Mb.

  • Humans have 3,000 Mb, about 500 to 3,000 times as many as a typical bacterium.

  • Paris japonica, the Japanese canopy plant, contains 149 billion base pairs (149,000 Mb), while that of another plant, Utricularia gibba, a bladderwort, contains only 82 Mb.

  • Polychaos dubium, whose genome size has been estimated at 670 billion base pairs (670,000 Mb).

  • Free-living bacteria and archaea have from 1,500 to 7,500 genes

  • Eukaryotes ranges from about 5,000 for unicellular fungi (yeasts) to at least 40,000 for some multicellular eukaryotes

  • The human genome contains 3,000 Mb, biologists expected somewhere between 50,000 and 100,000 genes to be identified in the completed sequence, based on the number of known human proteins.

    • However, the estimate was revised downward several times.

  • Humans

    • The estimate of genes in humans has steadily declined. As of 2019, it is estimated at around 20,000, only about 1.5 times the number of genes in the nematode Caenorhabditis elegans, a worm with only about 1 mm in length.

    • The gene density in humans is low, averaging only 15 genes per million base pairs.

    • Only 1.5% of the human genome codes for proteins or functional RNAs; the rest is noncoding DNA.

21.4 Noncoding DNA and Multigene Families
  • Eukaryotic genomes have a lot of noncoding DNA.

    • Much of this consists of sequences related to genes, such as introns (noncoding regions within genes) and regulatory sequences.

    • The bulk of noncoding DNA consists of repetitive DNA sequences.

    • Other elements include pseudogenes, former genes that have accumulated mutations and are now nonfunctional segments of DNA.

  • Repetitive DNA:

    • Consists of nucleotide sequences that are present in multiple copies in the genome.

    • About three-fourths of repetitive DNA is made up of transposable elements and sequences related to them.

    • The bulk of many eukaryotic genomes consists of DNA sequences that neither code for proteins nor are transcribed to produce RNAs with known functions; this noncoding DNA was often described in the past as “junk DNA.”

  • Transposable Elements (Transposons):

    • Stretches of DNA that can move from one location to another within the genome.

    • Sometimes called “Jumping Genes”, but never really detach from the DNA

    • First evidence for transposable elements came from Barbara McClintock’s work in the 1940s and 1950s, where she studied the inheritance of maize kernel color.

  • Movement of Transposons:

    • Occurs via a type of recombination between the transposable element DNA and a particular target site in the genome.

    • Enzymes that facilitate this movement are encoded by the transposable element itself.

    • Two types of eukaryotic transposable elements: transposons and retrotransposons (differ in mechanism of movement).

  • Transposons:

    • Move within a genome by means of a DNA intermediate.

    • Can move by a “cut and paste” mechanism, which removes the element from the original site, or by a “copy and paste” mechanism, which leaves a copy behind.

    • Both mechanisms require an enzyme called transposase, which is encoded by the transposon.

  • Retrotransposons:

    • Move by means of an RNA intermediate that is a transcript of the retrotransposon DNA.

    • RNA intermediate is converted back to DNA by reverse transcriptase, an enzyme also encoded by the retrotransposon.

    • The DNA copy is then inserted into a new location in the genome.

    • Retrotransposons leave a copy of the element at the original site.

    • Make up a substantial portion of many eukaryotic genomes (i.e. more than 90% of the transposable elements in the human genome are retrotransposons, with LINES and SINES being particularly abundant).

  • LINES (Long Interspersed Nuclear Elements):

    • About 6,500 base pairs long and have been found in a few places in the human genome.

    • An even larger percentage (17%) of the human genome is made up of a type of retrotransposon called LINE-1, or L1.

    • Have a low rate of transposition; may help regulate gene expression.

  • SINES (Short Interspersed Nuclear Elements):

    • About 300 base pairs long.

    • Alu elements: make up 10% of the human genome; many are transcribed into RNA molecules with unknown function.

  • Noncoding DNA between Genes:

    • Unique noncoding DNA makes up 25% of the human genome; most is in regions between genes.

    • Functions of intergenic DNA: control of gene expression, chromosome changes, etc.

  • Genes-Related Regulatory Sequences:

    • Introns: Noncoding regions within genes.

    • Regulatory Sequences: Control gene expression.

  • Multigene Families:

    • Collections of two or more identical or very similar genes.

    • Presumably arose from a single ancestral gene through repeated gene duplication.

    • Some multigene families consist of identical DNA sequences clustered tandemly, such as the genes that code for rRNA.

  • Genes for rRNA:

    • A cell needs to have a large number of these genes to produce sufficient rRNA for protein synthesis.

    • In humans, several hundred of these genes are clustered tandemly on several different chromosomes.

      • Example: is the family of identical DNA sequences that each include the genes for the three largest rRNA molecules

  • Nonidentical Genes:

    • Encode globins (α-globin and β-globin), a family of proteins that include the α and β subunits of hemoglobin.

    • The different versions of each globin subunit are expressed at different times in development.

    • The human α-globin genes are located together in a cluster on chromosome 16, and the β-globin genes are located together in a cluster on chromosome 11.

      • Example: Nonidentical Twins contain different combinations of alleles, resulting in variations in their globin gene expression.

21.5 Genome Evolution
  • The basis of change at genomic level is mutation, which underlies much of genome evolution.

  • Genomes can evolve in several ways:

    • Duplication of entire sets of chromosomes (polyploidy).

    • Alteration of chromosome number.

    • Duplication of single genes.

    • Rearrangement of parts of genes.

    • Movement of genes to new locations in the genome.

  • Duplication of Entire Chromosome Sets:

    • Accidents in meiosis can lead to one or more extra sets of chromosomes – a condition known as polyploidy.

    • One way to assess the impact of gene duplication on evolution is to compare the genomes of organisms that have arisen independently since a gene duplication event.

    • As long as one copy of an essential gene is expressed, the divergence of another copy can lead to its encoded protein acting in a novel way, thereby changing the organism’s phenotype.

  • Alteration of Chromosome Structure:

    • Rearrangements of parts of genes are more common than polyploidy and have played a major role in the evolution of mammals.

    • Inversion: Reverses a segment within a chromosome.

    • Translocation: Moves a segment from one chromosome to another.

  • Duplication and Divergence of Gene-Sized Regions:

    • The duplication rate varies among different regions of the genome, with low rates in genes that produce RNAs or proteins that interact with many other proteins.

    • Unequal crossing over during prophase I of meiosis can result in one chromosome with a duplication and another with a deletion of a particular region (non-sister chromatids).

  • Evolution of Genes with Related Functions:

    • Lysozyme: An enzyme that helps protect animals against bacterial infection.

    • α-lactalbumin: A nonenzymatic protein that plays a role in milk production in mammals.

    • Genes for both proteins are similar in sequence; the lysozyme gene duplicated and mutated to become the α-lactalbumin gene.

  • Exon Duplication and Shuffling:

    • Errors in meiosis can result in exon duplication.

    • Exon shuffling can also occur: Errors in meiotic recombination occasionally result in the mixing and matching of exons, either within a gene or between two nonallelic genes.

    • Can lead to new proteins with novel functions.

  • Transposable Elements and Genome Evolution:

    • Movement of transposable elements is promoted by stress.

    • Multiple copies of similar transposable elements may facilitate recombination (crossing over) between different chromosomes.

    • Can disrupt cellular genes or control elements, leading to phenotypic effects.

    • Can carry genes or individual exons to new locations.

21.6 Comparing Genome Sequences
  • Genome comparisons among organisms provide insights into evolution and other biological processes.

  • Comparing Genomes:

    • Comparing genome sequences is important for tracing evolutionary relationships among species, with interesting implications for the study of human evolution.

    • Analysis of human-chimpanzee differences is shedding light on the kinds of genetic changes that make us uniquely human.

  • Highly Conserved Genes:

    • Some genes are highly conserved across a wide range of organisms.

    • These genes often code for proteins involved in essential cellular processes.

  • Comparing Distantly Related Species:

    • Can provide information about ancient evolutionary events.

    • Bacteria, archaea, and eukaryotes share ancestral genes that date back to life’s early history.

  • Human and Chimpanzee Genomes:

    • 99% identical at the DNA sequence level.

    • Humans and chimpanzees differ in the expression of 19 regulatory genes.

  • FOXP2 Gene:

    • Evolved rapidly in the human lineage and is involved in vocalization.

    • Individuals with a mutated or disrupted version of this gene have difficulty speaking.

    • Several other genes are undergoing rapid evolution in humans, including those involved in defense against malaria and tuberculosis, metabolism of sugars and fats, and brain development.

  • Development:

    • The study of genes involved in development (the process by which a fertilized egg gives rise to an adult organism) has provided immense insight into the evolution of morphology.

  • Homeobox Genes:

    • Many developmental genes are highly conserved from species to species.

    • Homeobox genes are a large family of related genes that control development in animals and plants.

    • Hox genes in animals: specify types of appendages and other structures that will form in each segment; code for transcription factor proteins that bind to DNA and regulate the expression of other genes; relatively small number of Hox genes can control the expression of many other genes.

  • Colinearity:

    • Hox genes are arranged on chromosomes in the same order as the body regions whose development they control; this is known as colinearity.

    • Suggests that all Hox genes are derived from one or a few ancestral genes that duplicated and diverged over evolutionary time.