Genomes and Their Evolution Notes

Sequences Related to Transposable Elements

Multiple copies of transposable elements are scattered throughout eukaryotic genomes.
A single unit is hundreds to thousands of base pairs long.
Dispersed copies are similar but not identical.
Some are transposable elements that can move using enzymes encoded by themselves or other transposable elements.
Others are related sequences that have lost the ability to move.
Transposable elements and related sequences make up 25-50% of most mammalian genomes and even higher percentages in amphibians and many plants.
Extra transposable elements can account for the large size of some plant genomes; for example, transposable elements make up 85% of the corn genome.
In humans and other primates, Alu elements make up a large portion of transposable element-related DNA.
- Alu elements are about 300 nucleotides long and do not code for any protein.
- Approximately 10% of the human genome consists of Alu elements.
- Many Alu elements are transcribed into RNA, which may help regulate gene expression.
LINE-1 (L1) retrotransposons also make up a large percentage of the human genome (17%).
- L1 sequences are about 6,500 base pairs long and typically have a very low rate of transposition.
- L1 retrotransposons may be more active in cells of the developing brain and contribute to neuronal cell diversity.
Transposable element proteins do not carry out normal cellular functions and are usually included in the "noncoding" DNA category along with other repetitive sequences.

Other Repetitive DNA, Including Simple Sequence DNA

Repetitive DNA that is not related to transposable elements probably arose from mistakes during DNA replication or recombination.
Such DNA accounts for about 14% of the human genome.
About a third of this (5-6% of the human genome) consists of duplications of long stretches of DNA, with each unit ranging from 10,000 to 300,000 base pairs.
These long segments seem to have been copied from one chromosomal location to another site on the same or a different chromosome and probably include some functional genes.
Stretches of DNA known as simple sequence DNA contain many copies of tandemly repeated short sequences.
- Example: …GTTACGTTACGTTACGTTACGTTACGTTAC…
- The repeated unit (GTTAC) consists of 5 nucleotides.
- Repeated units may contain as many as 500 nucleotides, but often contain fewer than 15 nucleotides
When the unit contains 2-5 nucleotides, the series of repeats is called a short tandem repeat, or STR.

Short Tandem Repeat (STR) Analysis

STR analysis is used in preparing genetic profiles.
The number of copies of the repeated unit can vary from site to site within a given genome.
The repeat number varies from person to person.
Since humans are diploid, each person has two alleles per site, which can differ in repeat number.
This diversity produces the variation represented in the genetic profiles that result from STR analysis.
Altogether, simple sequence DNA makes up 3% of the human genome.
Much of a genome's simple sequence DNA is located at chromosomal telomeres and centromeres, suggesting that this DNA plays a structural role for chromosomes.

Role of Simple Sequence DNA

Centromeric DNA is essential for the separation of chromatids in cell division.
Centromeric DNA, along with simple sequence DNA located elsewhere, may also help organize the chromatin within the interphase nucleus.
The simple sequence DNA located at telomeres prevents genes from being lost as the DNA shortens with each round of replication.
Telomeric DNA also binds proteins that protect the ends of a chromosome from degradation and from joining to other chromosomes.

Challenges in Genome Sequencing

Short repetitive sequences like those described here provide a challenge for whole-genome shotgun sequencing because the presence of many short repeats hinders accurate reassembly of fragment sequences by computers.
Regions of simple sequence DNA account for much of the uncertainty present in estimates of whole-genome sizes and are the reason some sequences are considered "permanent drafts."

Genes and Multigene Families

DNA sequences that code for proteins or give rise to tRNA or rRNA compose a mere 1.5% of the human genome.
If we include introns and regulatory sequences associated with genes, the total amount of DNA that is gene-related-coding and noncoding-constitutes about 25% of the human genome.
Only about 6% (1.5% out of 25%) of the length of the average gene is represented in the final gene product.
Like the genes of bacteria, many eukaryotic genes are present as unique sequences, with only one copy per haploid set of chromosomes.
unique genes make up less than half of the total gene-related DNA in the human genome and the genomes of many other animals and plants.
The rest occur in multigene families, collections of two or more identical or very similar genes.

Multigene Families

In multigene families that consist of identical DNA sequences, those sequences are usually clustered tandemly and, with the notable exception of the genes for histone proteins, have RNAs as their final products.
- An example is the family of identical DNA sequences that each include the genes for the three largest rRNA molecules.
- These rRNA molecules are transcribed from a single transcription unit that is repeated tandemly hundreds to thousands of times in one or several clusters in the genome of a multicellular eukaryote.
- The many copies of this rRNA transcription unit help cells to quickly make the millions of ribosomes needed for active protein synthesis.
- The primary transcript is cleaved to yield three rRNA molecules, which combine with proteins and one other kind of rRNA (5S rRNA) to form ribosomal subunits.
The classic examples of multigene families of nonidentical genes are two related families of genes that encode globins, a group of proteins that include the α and β polypeptide subunits of hemoglobin.

Globin Gene Families

One family, located on chromosome 16 in humans, encodes various forms of α-globin; the other, on chromosome 11, encodes forms of β-globin.
The different forms of each globin subunit are expressed at different times in development, allowing hemoglobin to function effectively in the changing environment of the developing animal.
In humans, for example, the embryonic and fetal forms of hemoglobin have a higher affinity for oxygen than the adult forms, ensuring the efficient transfer of oxygen from mother to fetus.
Also found in the globin gene family clusters are several pseudogenes.

Duplication, Rearrangement, and Mutation of DNA Contribute to Genome Evolution

The basis of change at the genomic level is mutation, which underlies much of genome evolution.
It seems likely that the earliest forms of life had a minimal number of genes-those necessary for survival and reproduction.
If this were indeed the case, one aspect of evolution must have been an increase in the size of the genome, with the extra genetic material providing the raw material for gene diversification.

Duplication of Entire Chromosome Sets

An accident in meiosis, such as failure to separate homologs during meiosis I, can result in one or more extra sets of chromosomes, a condition known as polyploidy.
Although such accidents would most often be lethal, in rare cases they could facilitate the evolution of genes.
In a polyploid organism, one set of genes can provide essential functions for the organism.
The genes in the one or more extra sets can diverge by accumulating mutations; these variations may persist if the organism carrying them survives and reproduces.
In this way, genes with novel functions can evolve.
As long as one copy of an essential gene is expressed, the divergence of another copy can lead to its encoded protein acting in a novel way, thereby changing the organism's phenotype.
The outcome of this accumulation of mutations may eventually be the branching off of a new species.
While polyploidy is rare among animals, it is relatively common among plants, especially flowering plants.
Some botanists estimate that as many as 80% of the plant species that are alive today show evidence of polyploidy having occurred among their ancestral species.

Alterations of Chromosome Structure

With the recent explosion in genomic sequence information, we can now compare the chromosomal organizations of many different species in detail.
This information allows us to make inferences about the evolutionary processes that shape chromosomes and may drive speciation.
Sometime in the last 6 million years, when the ancestors of humans and chimpanzees diverged as species, the fusion of two ancestral chromosomes in the human line led to different haploid numbers for humans ( $n = 23$ ) and chimpanzees ( $n = 24$ ).
The banding patterns in stained chromosomes suggested that the ancestral versions of current chimpanzee chromosomes 12 and 13 fused end to end, forming chromosome 2 in an ancestor of the human lineage.
Sequencing and analysis of human chromosome 2 during the Human Genome Project provided very strong supporting evidence for the model we have just described.
Large blocks of genes on human chromosome 16 are found on four mouse chromosomes, indicating that the genes in each block stayed together in both the mouse and the human lineages during their divergent evolution from a common ancestor.
Comparison of chromosomes of humans and six other mammalian species allowed the researchers to reconstruct the evolutionary history of chromosomal rearrangements in these eight species.
- They found many duplications and inversions of large portions of chromosomes, the result of errors during meiotic recombination in which the DNA was broken and rejoined incorrectly.

Rate of Chromosomal Rearrangements

The rate of these events seems to have begun accelerating about 100 million years ago, around 35 million years before large dinosaurs became extinct and the number of mammalian species began rapidly increasing.
The apparent coincidence is interesting because chromosomal rearrangements are thought to contribute to the generation of new species.
Although two individuals with different arrangements could still mate and produce offspring, the offspring would have two nonequivalent sets of chromosomes, making meiosis inefficient or even impossible.
Thus, chromosomal rearrangements would lead to two populations that could not successfully mate with each other, a step on the way to their becoming two separate species.
Analysis of the chromosomal breakage points associated with the rearrangements showed that specific sites were used over and over again.
A number of these recombination "hot spots" correspond to locations of chromosomal rearrangements within the human genome that are associated with congenital diseases.

Duplication and Divergence of Gene-Sized Regions of DNA

Errors during meiosis can also lead to the duplication of chromosomal regions that are smaller than the ones we've just discussed, including segments the length of individual genes.
Unequal crossing over during prophase I of meiosis, for instance, can result in one chromosome with a deletion and another with a duplication of a particular gene.
Transposable elements can provide homologous sites where nonsister chromatids can cross over, even when other chromatid regions are not correctly aligned.
Also, slippage can occur during DNA replication, such that the template shifts with respect to the new complementary strand, and a part of the template strand is either skipped by the replication machinery or used twice as a template.
As a result, a segment of DNA is deleted or duplicated.
It is easy to imagine how such errors could occur in regions of repeats.

Multigene Families and Globin Gene Evolution

The variable number of repeated units of simple sequence DNA at a given site, used for STR analysis, is probably due to errors like these.
Evidence that unequal crossing over and template slippage during DNA replication lead to duplication of genes is found in the existence of multigene families, such as the globin family.
A comparison of gene sequences within a multigene family can suggest the order in which the genes arose.
Re-creating the evolutionary history of the globin genes using this approach indicates that they all evolved from one common ancestral globin gene that underwent duplication and divergence into the α-globin and β-globin ancestral genes about 450-500 million years ago.
Each of these genes was later duplicated several times, and the copies then diverged from each other in sequence, yielding the current family members.
In fact, the common ancestral globin gene also gave rise to the oxygen-binding muscle protein myoglobin and to the plant protein leghemoglobin.
After the duplication events, the differences between the genes in the globin families undoubtedly arose from mutations that accumulated in the gene copies over many generations.
The current model is that the necessary function provided by an α-globin protein, for example, was fulfilled by one gene, while other copies of the α-globin gene accumulated random mutations.
Many mutations may have had an adverse effect on the organism, and others may have had no effect.
However, a few mutations must have altered the function of the protein product in a way that benefitted the organism at a particular life stage without substantially changing the protein's oxygen-carrying function.
Presumably, natural selection acted on these altered genes, maintaining them in the population.

Evolution of Genes with Novel Functions

In the evolution of the globin gene families, gene duplication and subsequent divergence produced family members whose protein products performed functions similar to each other (oxygen transport).
However, an alternative scenario is that one copy of a duplicated gene can undergo alterations that lead to a completely new function for the protein product.
The genes for lysozyme and α-lactalbumin are a good example.
Lysozyme is an enzyme that helps protect animals against bacterial infection by hydrolyzing bacterial cell walls; α-lactalbumin is a nonenzymatic protein that plays a role in milk production in mammals.
The two proteins are quite similar in their amino acid sequences and three-dimensional structures.
Both genes are found in mammals, but only the lysozyme gene is present in birds.
The presence of introns may have promoted the evolution of new proteins by facilitating the duplication or shuffling of exons, as we'll discuss next.

Rearrangements of Parts of Genes: Exon Duplication and Exon Shuffling

Recall from Concept 17.3 that an exon often codes for a protein domain, a distinct structural and functional region of a protein molecule.
Unequal crossing over during meiosis can lead to duplication of a gene on one chromosome and its loss from the homologous chromosome.
By a similar process, a particular exon within a gene could be duplicated on one chromosome and deleted from the other.
The gene with the duplicated exon would code for a protein containing a second copy of that domain.