Evidence for large-scale evolution (macroevolution) comes from anatomy and embryology, molecular biology, biogeography, and fossils.
Similar anatomy found in different species may be homologous (shared due to ancestry) or analogous (shared due to similar selective pressures).
Molecular similarities provide evidence for the shared ancestry of life. DNA sequence comparisons can show how different species are related.
Biogeography, the study of the geographical distribution of organisms, provides information about how and when species may have evolved.
Fossils provide evidence of long-term evolutionary changes, documenting the past existence of species that are now extinct.
We can sometimes directly see small-scale evolution, or microevolution, taking place (for example, in the case of drug-resistant bacteria or pesticide-resistant insects). However, many of the most fascinating evolutionary events – such as the divergence, or splitting, of plant and animal lineages from a common ancestor – happened far in the past. Not only that, but they occurred over very long time periods, not on the days-to-weeks timescales of bacterial and viral evolution. This large-scale evolution is sometimes called macroevolution. What is evolution?
We can't directly observe evolutionary events that happened in the past. However, we often want to understand them. For instance, we may want to know whether two present-day species are closely related. Or we may have a group of species, and want to understand the evolutionary relationships among them. How can we answer these kinds of questions?
In this article, we'll look at several types of information biologists use to trace and reconstruct evolutionary histories of organisms across long timescales.
Anatomy and embryology. Anatomical features shared between organisms (including ones that are visible only during embryonic development) can indicate a shared evolutionary ancestry.
Molecular biology. Similarities and differences between the "same" gene in different organisms (that is, a pair of homologous genes) can help us determine how closely related the organisms are.
Biogeography. The geographical distribution of species can help us reconstruct their evolutionary histories.
Fossils. The fossil record is not a complete record of evolutionary history, but it confirms the existence of now-extinct species and sometimes captures potential "in-between" forms on the path to present-day species.
Let's take a closer look at these strategies for reconstructing evolutionary histories over long time periods.
Darwin thought of evolution as "descent with modification," a process in which species change and give rise to new species over many generations. He proposed that the evolutionary history of life forms a branching tree with many levels, in which all species can be traced back to an ancient common ancestor.
Image credit: "Darwin's tree of life, 1859," by Charles Darwin (public domain).
In this tree model, more closely related groups of species have more recent common ancestors, and each group will tend to share features that were present in its last common ancestor. We can use this idea to "work backwards" and figure out how organisms are related based on their shared features.
If two or more species share a unique physical feature, such as a complex bone structure or a body plan, they may all have inherited this feature from a common ancestor. Physical features shared due to evolutionary history (a common ancestor) are said to be homologous.
To give one classic example, the forelimbs of whales, humans, and birds look quite different on the outside. That's because they're adapted to function in different environments. However, if you look at the bone structure of the forelimbs, you'll find that the organization of the bones is remarkably similar across species. It's unlikely that such similar structures would have evolved independently in each species, and more likely that the basic layout of bones was already present in a common ancestor of whales, humans, and birds.
Image modified from "Homology vertebrates-en.svg" by Волков Владислав Петрович (CC BY-SA 4.0). The modified image is licensed under a CC BY-SA 4.0 license.
Some homologous structures can be seen only in embryos. For instance, did you know that you once had a tail and gill slits? All vertebrate embryos, from humans to chickens to fish, share these features during early development. Of course, the developmental patterns of these species become increasingly different later on (which is why your embryonic tail is now your tailbone, and your gill slits have turned into your jaw and inner ear)
\[^1\]
. However, the shared embryonic features are still homologous structures, and they reflect that the developmental patterns of vertebrates are variations on an ancestral program.
Image modified from "Rudimentary hindlegs spurs in Boa constrictor snake.jpg" by Stefan3345 (CC BY-SA 4.0). The modified image is licensed under a CC BY-SA 4.0 license.
Vestigial structures are reduced or non-functional versions of features, ones that serve little or no present purpose for an organism. The human tail, which is reduced to the tailbone during development, is one example. Vestigial structures are homologous to useful structures found in other organisms, and they can provide insights an organism's ancestry. For instance, the tiny vestigial legs found in some snakes, like the boa constrictor at right, reflect that snakes had a four-legged ancestor
\[^2\]
.
To make things a little more interesting and complicated, not all physical features that look alike are marks of common ancestry. Instead, some physical similarities are analogous: they evolved independently in different organisms because the organisms lived in similar environments or experienced similar selective pressures. This process is called convergent evolution. (To converge means to come together, like two lines meeting at a point.)
For example, two distantly related species that live in the Arctic, the arctic fox and the ptarmigan (a bird), both undergo seasonal changes of color from dark to snowy white. This shared feature doesn’t reflect common ancestry – i.e., it's unlikely that the last common ancestor of the fox and ptarmigan changed color with the seasons. Instead, this feature was favored separately in both species due to similar selective pressures. That is, the genetically determined ability to switch to light coloration in winter helped both foxes and ptarmigans survive and reproduce in a place with snowy winters and sharp-eyed predators.
Image credit: "Understanding evolution: Figure 7," by OpenStax College, Biology, CC BY 4.0.
How can we tell if features are homologous or analogous?
In general, biologists don't draw conclusions about how species are related on the basis of any single feature they think is homologous. Instead, they study a large collection of features (often, both physical features and DNA sequences) and draw conclusions about relatedness based on these features as a group. We will explore this idea further when we examine phylogenetic trees.
Like structural homologies, similarities between biological molecules can reflect shared evolutionary ancestry. At the most basic level, all living organisms share:
The same genetic material (DNA)
The same, or highly similar, genetic codes
The same basic process of gene expression (transcription and translation)
These shared features suggest that all living things are descended from a common ancestor, and that this ancestor had DNA as its genetic material, used the genetic code, and expressed its genes by transcription and translation. Present-day organisms all share these features because they were "inherited" from the ancestor (and because any big changes in this basic machinery would have broken the basic functionality of cells).
Although they're great for establishing the common origins of life, features like having DNA or carrying out transcription and translation are not so useful for figuring out how related particular organisms are. If we want to determine which organisms in a group are most closely related, we need to use different types of molecular features, such as the nucleotide sequences of genes.
Biologists often compare the sequences of related genes found in different species (often called homologous or orthologous genes) to figure out how those species are evolutionarily related to one another.
The basic idea behind this approach is that two species have the "same" gene because they inherited it from a common ancestor. For instance, humans, cows, chickens, and chimpanzees all have a gene that encodes the hormone insulin, because this gene was already present in their last common ancestor.
In general, the more DNA differences in homologous genes between two species, the more distantly the species are related. For instance, human and chimpanzee insulin genes are much more similar (about 98% identical) than human and chicken insulin genes (about 64% identical), reflecting that humans and chimpanzees are more closely related than humans and chickens
\[^c\]
.
The geographic distribution of organisms on Earth follows patterns that are best explained by evolution, in combination with the movement of tectonic plates over geological time. For example, broad groupings of organisms that had already evolved before the breakup of the supercontinent Pangaea (about
\[200\]
million years ago) tend to be distributed worldwide. In contrast, broad groupings that evolved after the breakup tend to appear uniquely in smaller regions of Earth. For instance, there are unique groups of plants and animals on northern and southern continents that can be traced to the split of Pangaea into two supercontinents (Laurasia in the north, Gondwana in the south).
Image credit: "Marsupial collage" by Aushulz, CC BY-SA 3.0.
The evolution of unique species on islands is another example of how evolution and geography intersect. For instance, most of the mammal species in Australia are marsupials (carry young in a pouch), while most mammal species elsewhere in the world are placental (nourish young through a placenta). Australia’s marsupial species are very diverse and fill a wide range of ecological roles. Because Australia was isolated by water for millions of years, these species were able to evolve without competition from (or exchange with) mammal species elsewhere in the world.
The marsupials of Australia, Darwin's finches in the Galápagos, and many species on the Hawaiian Islands are unique to their island settings, but have distant relationships to ancestral species on mainlands. This combination of features reflects the processes by which island species evolve. They often arise from mainland ancestors – for example, when a landmass breaks off or a few individuals are blown off course during a storm – and diverge (become increasingly different) as they adapt in isolation to the island environment.
Fossils are the preserved remains of previously living organisms or their traces, dating from the distant past. The fossil record is not, alas, complete or unbroken: most organisms never fossilize, and even the organisms that do fossilize are rarely found by humans. Still, the fossils we have been lucky enough to find offer unique insights into evolution over long timescales.
Image credit: "Rock strata, E ridge of Garish," by Chris Eilbeck, CC BY- SA 2.0.
To interpret fossils accurately, we need to know how old they are. Fossils are often contained in rocks that build up in layers called strata, and the strata provide a sort of timeline, with layers near the top being newer and layers near the bottom being older. Fossils found in different strata at the same site can be ordered by their positions, and "reference" strata with unique features can be used to compare the ages of fossils across locations. In addition, scientists can roughly date fossils using radiometric dating, a process that measures the radioactive decay of certain elements.
Fossils document the existence of now-extinct species, showing that different organisms have lived on Earth during different periods of the planet's history. They can also help scientists reconstruct the evolutionary histories of present-day species. For instance, some of the best-studied fossils are of the horse lineage. Using these fossils, scientists have been able to reconstruct a large, branching "family tree" for horses and their now-extinct relatives. Changes in the lineage leading to modern-day horses, such as the reduction of toed feet to hooves, may reflect adaptation to changes in the environment.
Image credit: "Equine evolution," by H. Zell, CC BY-SA 3.0.
Biologists use multiple types of evidence to trace evolutionary changes that occur over long time periods. For example:
Homologous physical features shared between species can provide evidence for common ancestry (but we have to be sure they are really homologous, and not the result of convergent evolution).
Similarities and differences among biological molecules (e.g., in the DNA sequence of genes) can be used to determine species' relatedness.
Biogeographical patterns provide clues about how species, both alive and extinct, are related to each other.
The fossil record, though incomplete, provides valuable information about what species existed at particular times in Earth’s history.
References
Humans know how long the earth is, animals that have existed, animals ancestors, and more because of fossils!
Species go extinct all the time. Scientists estimate that at least
\[99.9\%\]
percent of all species of plants and animals that have ever lived are now extinct. So the demise of dinosaurs like T. rex and Triceratops some
\[66\]
million years ago wouldn't be especially noteworthy—except for the fact that around
\[80\%\]
percent of all life alive at the time also died out, in what scientists call a mass extinction.
Mass extinctions—when at least
\[70\%\]
of all species die out in a relatively short time—have happened a handful of times over the course of our planet's history. The largest mass extinction event occurred around
\[251\]
million years ago, when perhaps
\[95\%\]
percent of all species went extinct.
Ordovician-Silurian Extinction: Small marine organisms died out. (
\[442\]
mya)
Devonian Extinction: Many tropical marine species went extinct. (
\[365\]
mya)
Permian-Triassic Extinction: The largest mass extinction event in Earth's history affected a range of species, including many vertebrates. (
\[251\]
mya)
Triassic-Jurassic Extinction: The extinction of other vertebrate species on land allowed dinosaurs to flourish. (
\[201\]
mya)
Cretaceous-Paleogene Extinction: (
\[65.5\]
mya)
Scientists refer to the major extinction that wiped out non-avian dinosaurs as the K-Pg event (formerly called the K-T extinction) because it happened at the end of the Cretaceous Period and the beginning of the Paleogene. Why not C-Pg? Geologists use "K" as a shorthand for Cretaceous based on the German word “kreide”, for which the Cretaceous Period with its chalky sediments is named.
The K-Pg event is so striking that it signals a major turning point in Earth's history, marking the end of the geologic period known as the Cretaceous and the beginning of the Paleogene period. We can see the impact of this event in the fossil record. Fossils that are abundant in earlier rock layers are simply not present in later rock layers, after the event. A wide range of animals and plants suddenly died out, from tiny marine organisms to large dinosaurs.
What happened to cause such widespread devastation
\[66\]
million years ago? Scientists agree that species go extinct primarily as a result of changes in their environment. The extinction of many species around the world at one time reflects large-scale changes in the global environment.
To explain what caused this mass extinction, scientists have focused on events that would have altered our planet's climate in dramatic, powerful ways. The leading theory is that a huge asteroid slammed into Earth
\[66\]
million years ago, blocking sunlight, changing the climate and setting off global wildfires. In recent years, researchers have also been investigating whether other forces, including massive volcanic eruptions and changes in sea level, may have contributed to the environmental changes.
Life on Earth has experienced repeated periods when a large portion of its species died off, followed by a recovery and the emergence of a newly shaped tree of life.
Dinosaur skeleton in the desert © Mark Garlick/Science Photo Library/CORBIS
Extinction events are periods in Earth’s history during which a sharp decrease in the diversity and abundance of living organisms occurs. This is measured by the easily observable life forms, and does not include the bacterial ones (which constitute a great portion, perhaps even the majority, of Earth’s bio-diversity and biomass). During these periods, the rate of extinctions greatly exceeds the normal, slow pace that regularly occurs as new species emerge.
The people who study extinctions are geologists and paleontologists; they examine the history of our planet as recorded in sedimentary rocks. They use fossils as evidence, especially marine fossils, since those are the most abundant. Only since the 1970s have scientists agreed that numerous extinction events have occurred, and only since the early 1980s have they agreed on what the five major ones were.
Sarychev Peak eruption, Kuril Islands, Russia Image Science & Analysis Laboratory, Johnson Space Center/NASA
One fine day about 65.5 million years ago, while dinosaurs were grazing and hunting around the world, an object the size of Mount Everest came hurtling through space. Only a seven-minute window existed during which the object’s path could intersect with Earth’s orbit around the Sun.
Although the chances seem to have been slight, the object hit Earth. (It may have been a comet, made of dirty ice, or an asteroid, made of rock.) The object landed just off the coast of what is now the Yucatán Peninsula in Mexico, at an estimated velocity 150 times the speed of a jet airliner.
The impact made a hole the size of Belgium, throwing up debris that rose high into the atmosphere and circled around the Earth. The collision generated so much initial heat that continental forests burned, putting more particulates in the atmosphere. With the Sun’s rays blocked by smoke and debris, photosynthesis slowed or stopped, the temperature cooled, and the amount of rainfall decreased significantly for a few months at least. Plants and animals died. All the dinosaurs, except some avian dinosaurs, which were on their way toward evolving into birds, died. An estimated 75 percent of all species disappeared. Among the survivors were crocodiles, turtles, and small, rodent-like mammals, which were our ancestors.
Geologists call this extinction event the “K-T event” because it marked the end of one geologic period, the Cretaceous (spelled with a “K” in German), and the beginning of the next, the Tertiary.
Artwork of the Chicxulub crater off the Yucatán Peninsula, Mexico © Detlev van Ravenswaay/Photo Researchers, Inc.
The story of the K-T event is quite well understood after years of patient detective work. It began in the mid-1970s with a young geologist, Walter Alvarez, in the mountains of Italy, near the town of Gubbio. There he found a thin layer of clay a centimeter thick between the layers of Cretaceous and Tertiary limestone; the Cretaceous layer contained many fossilized single-celled marine organisms, while very few appeared in the Tertiary layer. In the stratum between, Alvarez’s associates found iridium, an element extremely rare in the Earth’s crust but more common in meteorites. This suggested an impact by an asteroid or comet around the date of the extinction. In 1980 the Alvarez team presented its hypothesis that an asteroid/ comet had hit and had caused massive, rapid extinction by altering the air and water. Further research around the world showed that high levels of iridium existed in the rock record at other K-T boundary sites.
Within two years the evidence persuaded most geologists to accept this hypothesis. Others were unsure. If a massive asteroid/comet had hit, where was the crater? No known depression on land seemed large enough for such a massive object; hence, the crater must be under water. Large objects that hit water create huge tsunami waves, which leave telltale signs in the rock record, sometimes well inland from the coast. A worldwide search turned up evidence of such a large tsunami on the shores of Texas, across the Gulf of Mexico from the Yucatán Peninsula.
Much earlier, in 1950, geologists working for the Mexican national oil company, PEMEX, had mapped a 120-mile crater underwater, off the coast of the Yucatán Peninsula. To find this crater, they had charted tiny variations in the pull of gravity, which reflected variations in rock density. From these maps, geologists could tell where the dense and light rocks were located beneath the sea. But not until 1991 did the K-T researchers get together with the PEMEX geologists, who tended not to publish their information, and realize that the “crater of doom” had been found. They named it Chixculub (a Mayan word pronounced cheek-shoe-lube), after the small coastal town nearby.
Paleontologists and geologists have identified four other major extinction events, all of which predate the K-T extinction. Named for the geologic times they correspond to, they are the End-Triassic, the End-Permian, the Late Devonian, and the Ordovician.
Of the five major extinctions, the End-Permian proved to be the most massive — the mother of all extinction events. An estimated 95 percent of marine species and 70 percent of land species were lost. This dying-off lasted for about 165,000 years and included both gradual and sudden environmental changes that greatly altered conditions on the Earth.
Click here for a bigger version.
Very few creatures made it through the End-Permian extinction. Cockroaches did — and ginkgo trees and horseshoe crabs. So did our ancestors, small protomammals that had evolved from reptiles: they were furry and warm-blooded, but still laid eggs.
Once most geologists and paleontologists agreed that the cause of the K-T extinction was an asteroid/comet hitting Earth, many of them first hypothesized that objects from space had caused all the major extinctions. That proved false when studies of fossil layers from the times of earlier extinctions showed that life forms had disappeared gradually, not suddenly as they did in the layers of sediment dated 65.5 million years ago.
The discussion about what causes mass extinctions continues. Scientists do not yet fully understand the reasons for them. Some possible explanations are:
Sudden massive volcanic activity, as evidenced by vast areas of lava plains that date to coincide with extinction events. (Volcanoes emit carbon dioxide, which results in global warming; they also emit dust and aerosols that inhibit photosynthesis, causing food chains to collapse.)
Rapidly changing climate
Impact or multiple-impact events
Anoxic events (the middle or lower layers of ocean becoming deficient or lacking in oxygen)
Ever-changing position of oceans and continents (plate tectonics)
It seems likely that some combination of these possible causes may have taken place at certain times. One reputable paleontologist, Peter Ward, made the following hypothesis in 2006 to explain the four major extinctions other than the K-T event:
A “sudden” increase of carbon dioxide and methane in the atmosphere occurred, caused by vast volcanic lava beds. The warmer world disrupted ocean circulation patterns and the position of the currents that convey downward warm surface water with oxygen and upward the cold bottom water with less oxygen. Without the mixing of the ocean layers, the bottom water became anoxic, without oxygen. This allowed green sulfur bacteria, which live on sulfur not oxygen, to expand. They produced hydrogen sulfide, which bubbled up, killing much of life and destroying the ozone layer, which protected life against ultraviolet rays from the Sun.
Ward’s discussion, and the conclusions of some other scientists, suggests that humans must reduce the carbon dioxide that we are emitting, or we may set off a similar chain of events.
Many biologists agree that a sixth major extinction is currently underway. This one is unique because it is the result of humans degrading and destroying the habitats of other life forms. This extinction apparently began about 50,000 years ago when humans moved into Australia and the Americas, causing the disappearance of many species.
No one knows how many species currently exist on Earth. The best estimate is about 8.7 million, not counting microorganisms. To date, only a small fraction of these estimated species have been identified, but new ones are constantly discovered and named.
This gives the impression that new species are appearing as fast as old ones are disappearing. A 2003 study by the World Conservation Union suggested that one in four known mammal species is threatened with extinction in the next several decades, while one in eight known bird species is at risk.
If the present trend continues, biologists fear that we could lose 50 percent of all known living species by the end of this century.
By Cynthia Stokes Brown
Alvarez, Walter. T. Rex and the Crater of Doom. Princeton, NJ: Princeton University Press, 1997.
Erwin, Douglas H. Extinction: How Life on Earth Nearly Ended 250 Million Years Ago. Princeton, NJ, and Oxford, UK: Princeton University Press, 2006.
Ward, Peter D. Under a Green Sky: Global Warming, the Mass Extinctions of the Past, and What They Can Tell Us About Our Future. New York: Smithsonian/HarperCollins, 2007.
A phylogenetic tree is a diagram that represents evolutionary relationships among organisms. Phylogenetic trees are hypotheses, not definitive facts.
The pattern of branching in a phylogenetic tree reflects how species or other groups evolved from a series of common ancestors.
In trees, two species are more related if they have a more recent common ancestor and less related if they have a less recent common ancestor.
Phylogenetic trees can be drawn in various equivalent styles. Rotating a tree about its branch points doesn't change the information it carries.
Humans as a group are big on organizing things. Not necessarily things like closets or rooms; I personally score low on the organization front for both of those things. Instead, people often like to group and order the things they see in the world around them. Starting with the Greek philosopher Aristotle, this desire to classify has extended to the many and diverse living things of Earth.
Most modern systems of classification are based on evolutionary relationships among organisms – that is, on the organisms’ phylogeny. Classification systems based on phylogeny organize species or other groups in ways that reflect our understanding of how they evolved from their common ancestors.
In this article, we'll take a look at phylogenetic trees, diagrams that represent evolutionary relationships among organisms. We'll see exactly what we can (and can't!) infer from a phylogenetic tree, as well as what it means for organisms to be more or less related in the context of these trees.
When we draw a phylogenetic tree, we are representing our best hypothesis about how a set of species (or other groups) evolved from a common ancestor
\[^1\]
. As we'll explore further in the article on building trees, this hypothesis is based on information we’ve collected about our set of species – things like their physical features and the DNA sequences of their genes.
Are phylogenetic trees only for species?
In a phylogenetic tree, the species or groups of interest are found at the tips of lines referred to as the tree's branches. For example, the phylogenetic tree below represents relationships between five species, A, B, C, D, and E, which are positioned at the ends of the branches:
Image modified from Taxonomy and phylogeny: Figure 2 by Robert Bear et al., CC BY 4.0
The pattern in which the branches connect represents our understanding of how the species in the tree evolved from a series of common ancestors. Each branch point (also called an internal node) represents a divergence event, or splitting apart of a single group into two descendant groups.
At each branch point lies the most recent common ancestor of all the groups descended from that branch point. For instance, at the branch point giving rise to species A and B, we would find the most recent common ancestor of those two species. At the branch point right above the root of the tree, we would find the most recent common ancestor of all the species in the tree (A, B, C, D, E). Why is this the most recent common ancestor of all the species?
Image modified from Taxonomy and phylogeny: Figure 2 by Robert Bear et al., CC BY 4.0
Each horizontal line in our tree represents a series of ancestors, leading up to the species at its end. For instance, the line leading up to species E represents the species' ancestors since it diverged from the other species in the tree. Similarly, the root represents a series of ancestors leading up to the most recent common ancestor of all the species in the tree.
In a phylogenetic tree, the relatedness of two species has a very specific meaning. Two species are more related if they have a more recent common ancestor, and less related if they have a less recent common ancestor.
We can use a pretty straightforward method to find the most recent common ancestor of any pair or group of species. In this method, we start at the branch ends carrying the two species of interest and “walk backwards” in the tree until we find the point where the species’ lines converge.
For instance, suppose that we wanted to say whether A and B or B and C are more closely related. To do so, we would follow the lines of both pairs of species backward in the tree. Since A and B converge at a common ancestor first as we move backwards, and B only converges with C after its junction point with A, we can say that A and B are more related than B and C.
Image modified from Taxonomy and phylogeny: Figure 2 by Robert Bear et al., CC BY 4.0
Importantly, there are some species whose relatedness we can't compare using this method. For instance, we can't say whether A and B are more closely related than C and D. That’s because, by default, the horizontal axis of the tree doesn't represent time in a direct way. So, we can only compare the timing of branching events that occur on the same lineage (same direct line from the root of the tree), and not those that occur on different lineages.
You may see phylogenetic trees drawn in many different formats. Some are blocky, like the tree at left below. Others use diagonal lines, like the tree at right below. You may also see trees of either kind oriented vertically or flipped on their sides, as shown for the blocky tree.
Image modified from Taxonomy and phylogeny: Figure 2 by Robert Bear et al., CC BY 4.0
The three trees above represent identical relationships among species A, B, C, D, and E. You may want to take a moment to convince yourself that this is really the case – that is, that no branching patterns or recent-ness of common ancestors are different between the two trees. The identical information in these different-looking trees reminds us that it's the branching pattern (and not the lengths of branches) that's meaningful in a typical tree.
Another critical point about these trees is that if you rotate the structures, using one of the branch points as a pivot, you don’t change the relationships. So just like the two trees above, which show the same relationships even though they are formatted differently, all of the trees below show the same relationships among four species:
Image modified from Taxonomy and phylogeny: Figure 3 by Robert Bear et al., CC BY 4.0
If you don’t see right away how that is true (and I didn’t, on first read!), just concentrate on the relationships and the branch points rather than on the ordering of species (W, X, Y, and Z) across the tops of the diagrams. That ordering actually doesn’t give us useful information. Instead, it’s the branch structure of each diagram that tells us what we need to understand the tree.
So far, all the trees we've looked at have had nice, clean branching patterns, with just two lineages (lines of descent) emerging from each branch point. However, you may see trees with a polytomy (poly, many; tomy, cuts), meaning a branch point that has three or more different species coming off of it
\[^2\]
. In general, a polytomy shows where we don't have enough information to determine branching order.
Image modified from Taxonomy and phylogeny: Figure 2 by Robert Bear et al., CC BY 4.0
If we later get more information about the species in a tree, we may be able to resolve a polytomy using the new information.
To generate a phylogenetic tree, scientists often compare and analyze many characteristics of the species or other groups involved. These characteristics can include external morphology (shape/appearance), internal anatomy, behaviors, biochemical pathways, DNA and protein sequences, and even the characteristics of fossils.
To build accurate, meaningful trees, biologists will often use many different characteristics (reducing the chances of any one imperfect piece of data leading to a wrong tree). Still, phylogenetic trees are hypotheses, not definitive answers, and they can only be as good as the data available when they're made. Trees are revised and updated over time as new data becomes available and can be added to the analysis. This is particularly true today, as DNA sequencing increases our ability to compare genes between species.
In the next article on building a tree, we’ll see concrete examples of how different types of data are used to organize species into phylogenetic trees.
Attribution and references
Phylogenetic trees represent hypotheses about the evolutionary relationships among a group of organisms.
A phylogenetic tree may be built using morphological (body shape), biochemical, behavioral, or molecular features of species or other groups.
In building a tree, we organize species into nested groups based on shared derived traits (traits different from those of the group's ancestor).
The sequences of genes or proteins can be compared among species and used to build phylogenetic trees. Closely related species typically have few sequence differences, while less related species tend to have more.
We're all related—and I don't just mean us humans, though that's most definitely true! Instead, all living things on Earth can trace their descent back to a common ancestor. Any smaller group of species can also trace its ancestry back to common ancestor, often a much more recent one.
Given that we can't go back in time and see how species evolved, how can we figure out how they are related to one another? In this article, we'll look at the basic methods and logic used to build phylogenetic trees, or trees that represent the evolutionary history and relationships of a group of organisms.
In a phylogenetic tree, the species of interest are shown at the tips of the tree's branches. The branches themselves connect up in a way that represents the evolutionary history of the species—that is, how we think they evolved from a common ancestor through a series of divergence (splitting-in-two) events. At each branch point lies the most recent common ancestor shared by all of the species descended from that branch point. The lines of the tree represent long series of ancestors that extend from one species to the next.
Image modified from Taxonomy and phylogeny: Figure 2, by Robert Bear et al., CC BY 4.0
For a more detailed explanation, check out the article on phylogenetic trees.
Even once you feel comfortable reading a phylogenetic tree, you may have the nagging question: How do you build one of these things? In this article, we'll take a closer look at how phylogenetic trees are constructed.
How do we build a phylogenetic tree? The underlying principle is Darwin’s idea of “descent with modification.” Basically, by looking at the pattern of modifications (novel traits) in present-day organisms, we can figure out—or at least, make hypotheses about—their path of descent from a common ancestor.
As an example, let's consider the phylogenetic tree below (which shows the evolutionary history of a made-up group of mouse-like species). We see three new traits arising at different points during the evolutionary history of the group: a fuzzy tail, big ears, and whiskers. Each new trait is shared by all of the species descended from the ancestor in which the trait arose (shown by the tick marks), but absent from the species that split off before the trait appeared.
That tree is confusing! Can we go through step-by-step?
When we are building phylogenetic trees, traits that arise during the evolution of a group and differ from the traits of the ancestor of the group are called derived traits. In our example, a fuzzy tail, big ears, and whiskers are derived traits, while a skinny tail, small ears, and lack of whiskers are ancestral traits. An important point is that a derived trait may appear through either loss or gain of a feature. For instance, if there were another change on the E lineage that resulted in loss of a tail, taillessness would be considered a derived trait.
Derived traits shared among the species or other groups in a dataset are key to helping us build trees. As shown above, shared derived traits tend to form nested patterns that provide information about when branching events occurred in the evolution of the species.
When we are building a phylogenetic tree from a dataset, our goal is to use shared derived traits in present-day species to infer the branching pattern of their evolutionary history. The trick, however, is that we can’t watch our species of interest evolving and see when new traits arose in each lineage.
Instead, we have to work backwards. That is, we have to look at our species of interest – such as A, B, C, D, and E – and figure out which traits are ancestral and which are derived. Then, we can use the shared derived traits to organize the species into nested groups like the ones shown above. A tree made in this way is a hypothesis about the evolutionary history of the species – typically, one with the simplest possible branching pattern that can explain their traits.
If we were biologists building a phylogenetic tree as part of our research, we would have to pick which set of organisms to arrange into a tree. We'd also have to choose which characteristics of those organisms to base our tree on (out of their many different physical, behavioral, and biochemical features).
If we're instead building a phylogenetic trees for a class (which is probably more likely for readers of this article), odds are that we'll be given a set of characteristics, often in the form of a table, that we need to convert into a tree. For example, this table shows presence (+) or absence (0) of various features:
Feature | Lamprey | Antelope | Bald eagle | Alligator | Sea bass |
Lungs | 0 | + | + | + | 0 |
Jaws | 0 | + | + | + | + |
Feathers | 0 | 0 | + | 0 | 0 |
Gizzard | 0 | 0 | + | + | 0 |
Fur | 0 | + | 0 | 0 | 0 |
Table modified from Taxonomy and phylogeny: Figure 4, by Robert Bear et al., CC BY 4.0
Next, we need to know which form of each characteristic is ancestral and which is derived. For example, is the presence of lungs an ancestral trait, or is it a derived trait? As a reminder, an ancestral trait is what we think was present in the common ancestor of the species of interest. A derived trait is a form that we think arose somewhere on a lineage descended from that ancestor.
Without the ability to look into the past (which would be handy but, alas, impossible), how do we know which traits are ancestral and which derived?
In the context of homework or a test, the question you are solving may tell you which traits are derived vs. ancestral.
If you are doing your own research, you may have knowledge that allows you identify ancestral and derived traits (e.g., based on fossils).
You may be given information about an outgroup, a species that's more distantly related to the species of interest than they are to one another.
If we are given an outgroup, the outgroup can serve as a proxy for the ancestral species. That is, we may be able to assume that its traits represent the ancestral form of each characteristic. Is that always true?
For instance, in our example (data repeated below for convenience), the lamprey, a jawless fish that lacks a true skeleton, is our outgroup. As shown in the table, the lamprey lacks all of the listed features: it has no lungs, jaws, feathers, gizzard, or fur. Based on this information, we will assume that absence of these features is ancestral, and that presence of each feature is a derived trait.
Feature | Lamprey | Antelope | Bald eagle | Alligator | Sea bass |
Lungs | 0 | + | + | + | 0 |
Jaws | 0 | + | + | + | + |
Feathers | 0 | 0 | + | 0 | 0 |
Gizzard | 0 | 0 | + | + | 0 |
Fur | 0 | + | 0 | 0 | 0 |
Table modified from Taxonomy and phylogeny: Figure 4, by Robert Bear et al., CC BY 4.0
Now, we can start building our tree by grouping organisms according to their shared derived features. A good place to start is by looking for the derived trait that is shared between the largest number of organisms. In this case, that's the presence of jaws: all the organisms except the outgroup species (lamprey) have jaws. So, we can start our tree by drawing the lamprey lineage branching off from the rest of the species, and we can place the appearance of jaws on the branch carrying the non-lamprey species.
Image based on Taxonomy and phylogeny: Figure 6, by Robert Bear et al., CC BY 4.0
Next, we can look for the derived trait shared by the next-largest group of organisms. This would be lungs, shared by the antelope, bald eagle, and alligator, but not by the sea bass. Based on this pattern, we can draw the lineage of the sea bass branching off, and we can place the appearance of lungs on the lineage leading to the antelope, bald eagle, and alligator.
Image based on Taxonomy and phylogeny: Figure 6, by Robert Bear et al., CC BY 4.0
Following the same pattern, we can now look for the derived trait shared by the next-largest number of organisms. That would be the gizzard, which is shared by the alligator and the bald eagle (and absent from the antelope). Based on this data, we can draw the antelope lineage branching off from the alligator and bald eagle lineage, and place the appearance of the gizzard on the latter.
Image based on Taxonomy and phylogeny: Figure 6, by Robert Bear et al., CC BY 4.0
Wait, how did you know to put the eagle on the left and the alligator on the right?
What about our remaining traits of fur and feathers? These traits are derived, but they are not shared, since each is found only in a single species. Derived traits that aren't shared don't help us build a tree, but we can still place them on the tree in their most likely location. For feathers, this is on the lineage leading to the bald eagle (after divergence from the alligator). For fur, this is on the antelope lineage, after its divergence from the alligator and bald eagle.
Image based on Taxonomy and phylogeny: Figure 6, by Robert Bear et al., CC BY 4.0
When we were building the tree above, we used an approach called parsimony. Parsimony essentially means that we are choosing the simplest explanation that can account for our observations. In the context of making a tree, it means that we choose the tree that requires the fewest independent genetic events (appearances or disappearances of traits) to take place.
For example, we could have also explained the pattern of traits we saw using the following tree:
Image based on Taxonomy and phylogeny: Figure 6, by Robert Bear et al., CC BY 4.0
This series of events also provides an evolutionary explanation for the traits we see in the five species. However, it is less parsimonious because it requires more independent changes in traits to take place. Because where we've put the sea bass, we have to hypothesize that jaws independently arose two separate times (once in the sea bass lineage, and once in the lineage leading to antelopes, bald eagles, and alligators). This gives the tree a total of
\[6\]
tick marks, or trait change events, versus
\[5\]
in the more parsimonious tree above.
In this example, it may seem fairly obvious that there is one best tree, and counting up the tick marks may not seem very necessary. However, when researchers make phylogenies as part of their work, they often use a large number of characteristics, and the patterns of these characteristics rarely agree
\[100\%\]
with one another. Instead, there are some conflicts, where one tree would fit better with the pattern of one trait, while another tree would fit better with the pattern of another trait. In these cases, the researcher can use parsimony to choose the one tree (hypothesis) that fits the data best.
You may be wondering: Why don't the trees all agree with one another, regardless of what characteristics they're built on? After all, the evolution of a group of species did happen in one particular way in the past. The issue is that, when we build a tree, we are reconstructing that evolutionary history from incomplete and sometimes imperfect data. For instance:
We may not always be able to distinguish features that reflect shared ancestry (homologous features) from features that are similar but arose independently (analogous features arising by convergent evolution). See an example
Traits can be gained and lost multiple times over the evolutionary history of a species. A species may have a derived trait, but then lose that trait (revert back to the ancestral form) over the course of evolution. See an example
Biologists often use many different characteristics to build phylogenetic trees because of sources of error like these. Even when all of the characteristics are carefully chosen and analyzed, there is still the potential for some of them to lead to wrong conclusions (because we don't have complete information about events that happened in the past).
A tool that has revolutionized, and continues to revolutionize, phylogenetic analysis is DNA sequencing. With DNA sequencing, rather than using physical or behavioral features of organisms to build trees, we can instead compare the sequences of their orthologous (evolutionarily related) genes or proteins.
The basic principle of such a comparison is similar to what we went through above: there's an ancestral form of the DNA or protein sequence, and changes may have occurred in it over evolutionary time. However, a gene or protein doesn't just correspond to a single characteristic that exists in two states.
Instead, each nucleotide of a gene or amino acid of a protein can be viewed as a separate feature, one that can flip to multiple states (e.g., A, T, C, or G for a nucleotide) via mutation. So, a gene with
\[300\]
nucleotides in it could represent
\[300\]
different features existing in
\[4\]
states! The amount of information we get from sequence comparisons—and thus, the resolution we can expect to get in a phylogenetic tree—is much higher than when we're using physical traits.
To analyze sequence data and identify the most probable phylogenetic tree, biologists typically use computer programs and statistical algorithms. In general, though, when we compare the sequences of a gene or protein between species:
A larger number of differences corresponds to less related species
A smaller number of differences corresponds to more related species
For example, suppose we compare the beta chain of hemoglobin (the oxygen-carrying protein in blood) between humans and a variety of other species. If we compare the human and gorilla versions of the protein, we'll find only
\[1\]
amino acid difference. If we instead compare the human and dog proteins, we'll find
\[15\]
differences. With human versus chicken, we're up to
\[45\]
amino acid differences, and with human versus lamprey (a jawless fish), we see
\[127\]
differences
\[^1\]
. These numbers reflect that, among the species considered, humans are most related to the gorilla and least related to the lamprey.
You can see Sal working through an example involving phylogenetic trees and sequence data in this AP biology free response question video.