TZ

Evolution of Genes and Genomes

Introduction

  • This lecture focuses on how genes and genomes evolve, including methods for finding and working with genes in genomes.
  • The lecture builds on previous discussions about population evolution, mechanisms of evolution, and their implications, now focusing on the genome level.

Learning Objectives

  • Describe the difference between coding and non-coding regions in the genome.
  • Explain gene duplications and deletions and their implications.
  • Compare and contrast neutral evolution and natural selection at the gene level.
  • Concepts from previous lectures that are relevant include: mutation, genetic drift, selection, and speciation.

Genome Size Variation

  • There is significant variation in genome size across different species.
  • Genome size is measured in megabases (Mb), which represents the number of nucleotides (A, T, C, G base pairs) in the genome.
  • Genome files are essentially text files (FASTA files) filled with sequences of As, Ts, Cs, and Gs.
  • Genome size varies widely across different groups.
  • Some protist genomes are larger than animal genomes.
  • The largest genome is found in wheat.

Genomic Information Databases: NCBI

  • NCBI (National Center for Biotechnology Information) is a major database where genomic information is stored.
  • The database includes: genome sequences and transcriptome sequences.
  • Researchers deposit genomic data (DNA sequences) from their projects into this database.
  • NCBI provides resources such as genome data viewers, which allow users to examine genes and their components in specific genomes.

Using Genome Data Viewers

  • Genome data viewers allow users to explore the genome of a species of interest and examine the genes.
  • Example: The dog genome (Canis lupus familiaris) in the NCBI genome viewer.
  • The viewer displays chromosomes or scaffolds (large pieces of nucleotides) and the genes located within them.
  • Genes are identified by characteristics such as the presence of an open reading frame, which indicates where RNA polymerase should start encoding the gene.
  • Exons are coding regions that encode amino acids, while introns are non-coding regions between exons.
  • The viewer also shows gene expression results, indicating which parts of the gene are actively transcribed.

Genome Sequencing and Annotation

  • Sequencing involves extracting DNA from a cell, opening the nucleus, and sequencing the base pairs.
  • The next step is to identify which base pairs encode genes (coding regions) and which do not (non-coding regions).
  • This characterization process is called annotation.
  • NCBI has a pipeline for annotating genomes, determining the function of different sequences.
  • The level of characterization varies among species.

Coding Regions

  • Coding regions are genes that are translated into protein products with specific functions.
  • These regions are identified by start and stop codons, also known as an open reading frame.
  • Coding regions include exons, which encode amino acids.
  • Example: An odorant receptor in dogs.
  • Start codons are often ATG, which can be identified in the sequence data.

Non-Coding Regions

  • The majority of the genome consists of non-coding regions.
  • These regions do not encode genes but may include:
    • Regulatory elements that influence gene transcription.
    • Introns, which are non-coding regions within genes that are spliced out during translation.
    • Non-coding RNAs, like rRNAs and tRNAs.
    • Transposable elements, which are movable pieces of DNA that affect transcription.
    • Pseudogenes, which are sequences that look like genes but do not translate into functional proteins.
  • NCBI provides information on the proportion of coding versus non-coding regions in different genomes.
  • Example: In the dog genome, 46.6% is protein-coding.

Exploring the NCBI Database

  • Procedure:
    • Go to NCBI and type in the species of interest (e.g., otter).
    • Select "genome" to find genomic data.
    • Choose the NCBI RefSeq genome, which includes annotated genes.
    • View annotated genes to find genes present in other animals.
    • Use the genome viewer to examine scaffolds and genes.

Variation in Protein Coding Proportions

  • There's significant variation in the proportion of protein coding versus non-coding regions.
  • Genome size may not always indicate complexity, as some organisms with smaller genomes can have more complex lifestyles.
  • Animals generally have more genes than single-celled organisms.
  • However, the puffer fish has about twice as many genes as humans.

Gene Duplication and Loss

  • Mutations, gene duplications, and losses are major sources of genome variation.
  • Gene duplication occurs when a gene is copied in the genome, providing variation for evolutionary processes to act on.
  • This can be visualized by looking at graphs that show the number of expanded versus contracted genes across different groups.

Mechanisms of Gene Duplication

  • Unequal crossing over during recombination.
  • Replication slippage, where polymerase loses its place and copies DNA twice.
  • Retrotransposition, where mRNA is reverse transcribed to DNA, turning non-coding DNA sequences into functional genes.

Fates of Duplicated Genes

  • Gene copy is lost (pseudogenes).
  • Dosage effect when there are two copies of the same gene, which:
    • can result in fitness loss, leading to gene copy deletion.
    • or can increase protein product, leading to fixation of the gene copy
  • Subfunctionalization, where the second gene evolves a slightly different, complementary function thus fixing the copy.
  • Neofunctionalization, where the copy starts performing a new function. In which case both new copies, with new slightly related or entirely new functions, are fixed in the DNA.

Gene Families

  • Gene families are a result of genes getting copied and duplicated.
  • These genes typically perform similar functions related to, but distinct from, one another.
  • Gene families are classified by homology, meaning they have similar sequences derived from a common ancestral gene.
  • Orthologs: Genes in different species that evolved from a shared common ancestor.
  • Paralogs: Genes that are duplicated within the genome.

Orthologs vs. Paralogs

  • Orthologs: Genes in different species that originated from a common ancestor. If humans and mice both have a gene to smell coffee, we can say that one of the genes is the ''ortholog'' of the other, because it is likely that their common genome had this gene before speciation.
  • Paralogs: Genes that get duplicated within a species. If there is a duplication event in the genes from the previous example, it would mean that mice genome got ''two'' separate copies for smelling coffee, that diverged from each other slightly, after speciation occurred. Gene duplication is very important to the process of acquiring new biological functions. Taste receptors are a good example of this (lecture later).

Globin Gene Family

  • Illustration of the globin gene family across different species, including hemoglobin and myoglobin.

Chemoreceptor Gene Family

  • Another illustration using the chemoreceptor gene family in bees, showing how gene relationships and species relationships intertwine, with bees' gene for bitter tasting, for example, duplicated in some species after they split.
  • The different colors signify the genes that different species have (e.g., honeybees, bumblebees, and stingless bees).
  • Some genes are duplicated in honeybees after speciation, but not in bumblebees or stingless bees.
  • This creates genomic variation in gene numbers and copy numbers, which leads to evolution and adaptation.

Evolutionary Mechanisms: Non-Coding vs. Coding Regions

  • Non-coding regions that do not translate into protein products do not significantly impact fitness; they accumulate mutations more rapidly because selection does not remove these mutations.
  • Coding regions: If there is a deleterious mutation in an important gene, the mutation can affect the phenotype and fitness of the individual, leading to the change in frequency of the new allele.
  • Synonymous mutations: The codon still encodes the same amino acid.
  • Non-synonymous mutations: Nonsynonymous changes (or mutations) in genes: AAG to TAG, premature stop codon.

Genetic Drift and Natural Selection

  • Genetic drift is a primary driver of variation in gene sequences, resulting in variation between species that have orthologs.
  • This is because many variations at this gene sequence are not actually selected for or against.
  • Trajectories:
    • Purifying Selection: Genes are copied but do not change (synonymous substitutions), as any variation is eliminated through selection leading to not-detectable variations.
    • Drift: Different variants of that gene are attained over time by random chance (gene not particularly important).
    • Beneficial Mutation: Some mutations lead to increased fitness and are thus fixed into successive generaions very quickly. An example would be a receptor for smell, that now allows bees to sense citrus and lime, not just cilantro.

Measuring Selection and Drift: DN/DS Ratios

  • Comparing orthologs or homologous genes between different species.

  • D_N = fraction of sites that differ at non-synonymous sites (nucleotides that change codon/amino acid).

  • D_S = fraction of sites that differ at synonymous sites (different nucleotides but same codon).

  • DN/DS = 1 indicates neutral selection (genetic drift).

  • DN/DS < 1 indicates negative selection (purifying selection).

  • DN/DS > 1 indicates positive selection.

  • Rates of nonsynonymous and synonymous mutations are species-specific.

  • Example in drosophila: half the differences evolved via positive selection.

  • Example: In humans, less than 15% evolve via positive selection.

Genome Size and Protein Coding Sequences

  • Humans: more than 98% of the genome does not code for protein.
  • Bacteria and viruses: genomes are mostly coding sequences.

Non-Coding Regions and Molecular Clocks

  • Non-coding regions are valuable in determining when the species diverged, by measuring rates of neutral changes not related to evolution and adaptation

Example: Gene Duplication and Speciation

  • There are orchid bees, and their genes have been linked to their speciation.
  • These bee species collect odors to attract females.
  • Olfactory receptor genes (OR41) are different between two species (Veridesma and Dilemma).
  • The differences led to different smell-detection capabilities that affect mating.
  • This shows how the duplication and divergence of genes can lead to changes in phenotypes, resulting in speciation.