Although it is now hard to imagine, it was once thought that DNA was too simple a molecule to store genetic information. With only four different nucleotides, it seemed that a molecule of much greater complexity must house the genetic information of a cell. It was argued that proteins, being composed of 20 different amino acids, were the better candidate for this important cellular function.
The early work of Fred Griffith in 1928 on the transfer of virulence (i.e., the ability to cause disease) in the pathogen Streptococcus pneumoniae, commonly called pneumococcus, set the stage for research showing that DNA was indeed the genetic material. Griffith found that if he boiled virulent bacteria and injected them into mice, the mice were not affected and no pneumococci could be recovered from the animals (figure 13.1). When he injected a combination of killed virulent bacteria and a living nonvirulent strain, the mice died; moreover, he could recover living virulent bacteria from the dead mice. Griffith called this change of nonvirulent bacteria into virulent pathogens transformation.
Oswald Avery and his colleagues then set out to discover which constituent in the heat-killed virulent pneumococci was responsible for transformation. These investigators used enzymes to selectively destroy DNA, RNA, or protein in purified extracts of virulent pneumococci (S strain). They then exposed nonvirulent pneumococcal strains (R strains) to the treated extracts. Transformation of the nonvirulent bacteria was blocked only if DNA was destroyed, suggesting that DNA was carrying the information required for transformation (figure 13.2). The publication of these studies by Avery, C. M. MacLeod, and M. J. McCarty in 1944 provided the first evidence that DNA carried genetic information.
Eight years later, Alfred Hershey and Martha Chase wanted to know if protein or DNA carried the genetic information of a bacterial virus called T2 bacteriophage. They performed experiments in which they made the virus’s DNA radioactive with 32P, or they labeled its protein coat with 35S. They mixed radioactive virions (either 32P-labeled or 35S-labeled) with Escherichia coli and incubated the mixture for a few minutes. This allowed the virions to attach to E. coli and initiate the infection process (see figure 6.15). The culture was then centrifuged. This separated the cells, which formed a pellet at the bottom of the tube, from any unadsorbed phage particles, which remained suspended in the liquid supernatant. The supernatant was discarded. The infected cells in the resulting pellet were resuspended and the suspension was agitated violently in a blender. The blender treatment sheared off bacteriophage particles adsorbed to E. coli cells (figure 13.3). Some of the suspension was used in a plaque assay (see figure 6.17) to determine if the blender treatment affected the ability of the phage to multiply within host cells. The remaining suspension was centrifuged to separate cells from the sheared-off phage particles. After centrifugation, radioactivity in the supernatant (where the phage particles remained) versus that in the bacterial cells in the pellet was determined.
The results of their experiments showed that blender treatment did not disrupt the infection process, because progeny phages were produced. They also demonstrated that when 35S-labeled T2 was used in the experiment, the majority of the radioactive protein was in the supernatant; whereas when 32 P-labeled T2 was used, the radioactive DNA was in the bacterial cells that formed the pellet. Because DNA entered the cells and the protein did not, the phage DNA must have been carrying the genetic information needed to complete the infection process. Some luck was involved in their discovery, for the genetic material of many viruses is RNA and the researchers happened to select a DNA virus for their studies. Imagine the confusion if T2 had been an RNA virus! The controversy surrounding the nature of genetic information might have lasted considerably longer than it did.
DNA, RNA, and proteins are often called informational molecules. The information exists as the sequence of monomers from which they are built. Here we describe the monomers and how they are linked together to form these important macromolecules.
Deoxyribonucleic acid (DNA) is a polymer of deoxyribonucleotides (figure 13.4) linked together by phosphodiester bonds (figure 13.5a). It contains the bases adenine, guanine, cytosine, and thymine. DNA molecules are usually composed of two polynucleotide chains coiled together to form a double helix that is many times longer than it is wide. The monomers of DNA are called deoxyribonucleotides because the sugar found in them is deoxyribose (figure 13.4b). The bond that links the monomers together to form the polymer is called a phosphodiester bond because it consists of a phosphate that forms ester linkages between the 3′-hydroxyl of one sugar and the 5′-hydroxyl of an adjacent sugar. Purine and pyrimidine bases are attached to the 1′-carbon of the deoxyribose sugars, and the bases extend toward the middle of the cylinder formed by the two chains. (The numbers designating the carbons in the sugars are given a prime notation to distinguish them from the numbers designating the carbons and nitrogens in the nitrogenous bases.) The bases from each strand interact with those of the other strand, forming base pairs. The base pairs are stacked on top of each other in the center, like the rungs of a ladder. The purine adenine (A) of one strand is always paired with the pyrimidine thymine (T) of the opposite strand by two hydrogen bonds. The purine guanine (G) pairs with cytosine (C) by three hydrogen bonds. This AT and GC base pairing means that the two strands in a DNA double helix are complementary. In other words, the bases in one strand match up with those of the other according to specific base-pairing rules. Because the sequences of bases in these strands encode genetic information, considerable effort has been devoted to determining the base sequences of DNA and RNA from many organisms
The two polynucleotide strands of DNA fit together much like the pieces in a jigsaw puzzle. Inspection of figure 13.5b,c shows that the two strands are not positioned directly opposite one another. Therefore when the strands twist about one another, a wide major groove and narrower minor groove are formed by the backbone. There are 10.5 base pairs per turn of the helix. The helix is right-handed; that is, the chains turn counterclockwise as they approach a viewer looking down the longitudinal axis. The two backbones are antiparallel, which means they run in opposite directions with respect to the orientation of their sugars. Thus the 5′ end of one strand is paired with the 3′ end of the complementary strand; or stated another way, one strand is oriented 5′ to 3′ and its complement, 3′ to 5′ (figure 13.5b).
The structure of DNA just described is that of the B form, the most common form in cells. The A form, an alternative DNA structure, primarily differs from the B form in that it is slightly wider, with 11 base pairs per helical turn, rather than 10.5. The A form DNA is found in bacterial endospores, and is partially responsible for protecting the DNA from UV damage. In addition, some virus genomes have A form DNA. Endospore resistance (section 3.10) DNA Structure
Supercoiling is another property of DNA. DNA is helical; that is, it is a coil. Whenever the rotation of a coil is restrained in some way, it causes it to coil on itself. The coiling of a coil is supercoiling and is readily observed by twisting a rubber band. Recall that most bacterial chromosomes are closed, circular double-stranded DNA molecules. In this state, the two strands are unable to rotate freely relative to each other, and the molecule is said to be constrained. The strain is relieved by supercoiling. There are two types of supercoiling: positive and negative. For DNA, these are defined by the change in number of base pairs per turn in the double helix. Supercoiling that decreases the number of base pairs per turn is said to be negative supercoiling. Likewise, supercoiling that increases the number of base pairs per turn is called positive supercoiling.
Supercoiling is another level of DNA structure that is critical for its function. Although bacterial chromosomes are negatively supercoiled overall, smaller regions can vary depending on how the chromosome is used at a given time. Importantly for this chapter, supercoiling “loosens” up the DNA, making it easier to separate the two strands. Strand separation is an important early step in both DNA replication and transcription, as we discuss in sections 13.3 and 13.5, respectively.
RNA Is a Polymer of Ribonucleotides
Ribonucleic acid (RNA) is a polymer of ribonucleotides containing the sugar ribose and the bases adenine, guanine, cytosine, and uracil. In RNA, uracil (U) replaces thymine. The nucleotides are joined by a phosphodiester bond, just as in DNA. In cells, RNA molecules are single stranded. However, an RNA strand can coil back on itself to form secondary structures such as hairpins with complementary base pairing and helical organization. The formation of double-stranded regions in RNA is often critical to its function. Attenuation and riboswitches can stop transcription prematurely (section 14.3); Translational riboswitches (section 14.4)
Proteins Are Polymers of Amino Acids
Proteins are polymers of amino acids linked by peptide bonds; thus they are also called polypeptides. An amino acid is defined by the presence of a central carbon (the α carbon) with an attached proton, a carboxyl group, an amino group, and a side chain (figure 13.6). Twenty amino acids are normally used to form proteins. Amino acids differ in their side chains. Depending on the chemical structure of the side chain, the amino acid is described as nonpolar, polar, or charged. The peptide bonds linking individual amino acids are carbon-nitrogen (C–N) bonds formed by a reaction between the carboxyl group of one amino acid and the amino group of the next amino acid in the protein (figure 13.7). A polypeptide has polarity just as DNA and RNA do. At one end of the chain is an amino group, and at the other end is a carboxyl group. Thus a polypeptide has an amino or N terminus and a carboxyl or C terminus.
Proteins do not typically exist as extended chains of amino acids. Rather, they fold to form three-dimensional structures whose final shape is determined to a large extent by the sequence of amino acids in the polypeptide. This sequence is called the primary structure. Secondary and tertiary structures result from the folding of the chain. Finally, two or more polypeptide strands can interact to form the final, functional protein. This level of structure is called quaternary structure. These higher levels of structure are stabilized by intra- (and inter-) chain bonds. Protein structure is described in more detail in appendix I.
DNA replication is an extraordinarily important and complex process upon which all life depends. During DNA replication, the two strands of the double helix are separated; each then serves as a template for the synthesis of a complementary strand according to the base-pairing rules. Each of the two progeny DNA molecules consists of one new strand and one old strand, and DNA replication is said to be semiconservative (figure 13.8). DNA replication is also extremely accurate; E. coli makes errors with a frequency of only 10−9 or 10−10 per base pair replicated (or about one in a million [10−6] per gene per generation). Despite its complexity and accuracy, replication is very rapid. In bacteria, replication rates approach 1,000 base pairs per second. Most of our discussion in this section is based on studies of chromosomal DNA replication in E. coli.
Bacterial DNA Replication Initiates from a Single Origin of Replication
The replication of chromosomal DNA begins at a single point, the origin of replication, often termed oriC. It’s important for bacteria to have just a single origin of replication so that replication is coordinated with other cellular events. Recall that chromosome partitioning is carried out by proteins that bind near the origin and position the newly replicated chromosomes at opposite cell poles (see figure 7.4). The E. coli oriC is a 260-bp region with an array of sites for protein binding where strand separation and replication initiation occur (figure 13.9). The initiator protein, DnaA, binds to a 9-bp sequence that is repeated 12 times in oriC. Multiple DnaA proteins form a filament that distorts the DNA, a process also promoted by the binding of the nucleoid-associated protein IHF. The adjacent region termed the DNA unwinding element (DUE) is enriched in AT base pairs, and at this site, the two strands separate. Recall that adenines pair with thymines using only two hydrogen bonds, so AT-rich segments become single stranded more readily than GC-rich regions. The separation at the origin, often termed a replication bubble, is enlarged by the action of several enzymes. This exposes the nitrogenous bases that form the template for copying each strand.
DNA Synthesis occurs at the replication fork, the place at which the parental DNA helix is unwound and the two strands are replicated (figure 13.9). Two replication forks move outward from the origin until they have copied the whole replicon—the portion of the genome that contains an origin and is replicated as a unit. When the replication forks move around the circular chromosome, a structure shaped like the Greek letter theta (θ) is formed (figure 13.9). Because the bacterial chromosome is a single replicon, the forks meet on the other side and two separate chromosomes are released.
DNA replication is essential to organisms, and a great deal of effort has been devoted to understanding its mechanism. The replication of E. coli DNA requires at least 12 proteins that form a complex at the replication fork called the replisome. Two replisomes move in either direction away from the origin. To separate (or denature) the parental DNA strands beyond the DUE, an enzyme termed helicase is required (figure 13.10). Helicase is a six-membered ring that encircles one DNA strand. Its movement, powered by ATP, disrupts the hydrogen bonds holding the parental DNA strands together. Thus helicases provide the force to move the replisomes.
Within the replisome, enzymes called DNA polymerases catalyze DNA synthesis. DNA synthesis occurs in the 5′ to 3′ direction, and the nucleotide to be added is a deoxyribonucleoside triphosphate (dNTP). Deoxynucleotides are linked by phosphodiester bonds formed by a reaction between the hydroxyl group at the 3′ end of the growing DNA strand and the phosphate closest to the 5′ carbon (the α-phosphate) of the incoming deoxynucleotide (figure 13.11). The energy needed to form the phosphodiester bond is provided by release of the terminal two phosphates as pyrophosphate (PPi) from the nucleotide that is added. The PPi is subsequently hydrolyzed to two separate phosphates (Pi). Thus the deoxynucleoside triphosphates dATP, dTTP, dCTP, and dGTP serve as DNA polymerase substrates while deoxynucleoside monophosphates (dNMPs: dAMP, dTMP, dCMP, dGMP) are incorporated into the growing chain.
Like all enzymes, DNA polymerase has a specific substrate requirement. It needs both a template (the parental DNA strand) and a 3′-OH group from the growing nucleic acid chain to add nucleotides onto. DNA polymerase cannot copy DNA with only the template. To get DNA polymerase started, an enzyme termed primase synthesizes a short 10-base RNA molecule complementary to the template. This short primer is made from RNA rather than DNA because RNA polymerases (such as primase) can initiate RNA synthesis without an existing 3′-OH. The multiprotein complex that includes helicase and primase is termed the primosome.
E. coli has five different DNA polymerases (DNA polymerase I–V). DNA polymerase III plays the major role in replication, although it is assisted by DNA polymerase I. Each replisome has two DNA polymerases, and each core polymerase binds one strand of DNA. The two DNA polymerases sandwich the primosome between them, and are oriented such that one replicates the strand that passes through the helicase and the other replicates the strand that passes over it (figure 13.12). A donut-shaped protein termed a clamp attaches to each polymerase and stabilizes the enzyme on the DNA template. Finally a clamp loader complex mediates the polymerase attachments to the primosome.
Two additional types of proteins found in the replisome are single-stranded DNA binding protein, and topoisomerases (figure 13.13). Single-stranded DNA binding proteins (SSBs), as their name implies, coat single-stranded DNA to protect it from damage. Topoisomerases relieve the twist generated by the rapid unwinding of the double helix. This is important because rapid unwinding leads to excessive supercoiling in the helix ahead of the replication fork, which can impede helicase if not removed. Topoisomerases transiently break and reseal one or both strands without altering the nucleotide sequence. These are critical enzymes to prevent tangled chromosomes.
As noted, DNA polymerases synthesize DNA in the 5′ to 3′ direction. Therefore one of the DNA polymerase core enzymes moves in the same direction as the replication fork and synthesizes DNA continuously as it is denatured ahead of the replisome. This strand is called the leading strand (figure 13.13). The other strand, called the lagging strand, cannot be extended in the same direction as the movement of the replication fork because there is no free 3′-OH to which a nucleotide can be added. As a result, the lagging strand is synthesized discontinuously in the 5′ to 3′ direction (i.e., in the direction opposite of the movement of the replication fork) and produces a series of fragments called Okazaki fragments, after their discoverer, Reiji Okazaki (1930–1975). The lagging strand passes through the central channel of helicase, and primase is positioned to make many RNA primers along the template strand. Thus while the leading strand requires only one RNA primer to initiate synthesis, the lagging strand has many RNA primers that must eventually be removed. Okazaki fragments are about 1,000 to 3,000 nucleotides long in bacteria.
After most of the lagging strand has been synthesized by the formation of Okazaki fragments, DNA polymerase I removes the RNA primers using its ability to snip off nucleotides one at a time starting at the 5′ end. This ability is referred to as 5′ to 3′ exonuclease activity. DNA polymerase I begins its exonuclease activity at the free 5′ end of each RNA primer. With the removal of each ribonucleotide, the adjacent 3′-OH from the deoxynucleotide is used by DNA polymerase I to fill the gap between Okazaki fragments (figure 13.14). Finally, the Okazaki fragments are joined by the enzyme DNA ligase, which forms a phosphodiester bond between the 3′-OH of the growing strand and the 5′-phosphate of an Okazaki fragment (figure 13.15).
Amazingly, DNA polymerase III has an additional critically important function: proofreading. Proofreading is the removal of a mismatched base immediately after it has been added; its removal must occur before the next base is incorporated. The ε subunit of DNA polymerase III has 3′ to 5′ exonuclease activity. This activity enables the polymerase core enzyme to check each newly incorporated base to see that it forms stable hydrogen bonds with the template. In this way, mismatched bases can be detected. If the wrong base has been mistakenly added, the exonuclease activity removes it, but only as long as it is still at the 3′ end of the growing strand (figure 13.16). Once removed, holoenzyme backs up and adds the proper nucleotide in its place. DNA proofreading is not 100% efficient, and as discussed in chapter 16, the mismatch repair system is the cell’s second line of defense against the potential harm caused by the incorporation of the incorrect nucleotide. Proofreading Function of DNA Polymerase
We present replication as a series of discrete steps, but in the cell these events occur quickly and simultaneously on both the leading and lagging strands. Lagging strand synthesis is particularly amazing because of the gymnastic feats performed by the holoenzyme. It must discard old clamps, load new clamps, and tether the template to the core enzyme with each new round of Okazaki fragment synthesis. All of this occurs as DNA polymerase III is synthesizing DNA. DNA Replication; Structural Basis of DNA Replication
In E. coli, DNA replication stops when the replisome reaches a termination site (ter) on the DNA. A protein called Tus binds to the ter sites and halts progression of the forks. In many other bacteria, replication stops spontaneously when the forks meet. Regardless of how fork movement is stopped, there are two problems that must be solved by the replisome. The first is the formation of interlocked chromosomes called catenanes (figure 13.17a). Catenanes are produced when topoisomerases break and rejoin DNA strands to ease supercoiling ahead of the replication fork. The two daughter DNA molecules are separated by the action of other topoisomerases that break both strands of one molecule, pass the other DNA molecule through the break, and then rejoin the strands. The second problem is a dimerized chromosome—two chromosomes joined together to form a single chromosome twice as long (figure 13.17b). Dimerized chromosomes result from DNA recombination that sometimes occurs between two daughter molecules during DNA replication. Recombinase enzymes (e.g., XerCD in E. coli) catalyze an intramolecular crossover that separates the two chromosomes.
DNA replication allows genetic information to be passed from one generation to the next. But how is the information used? To answer that question, we must first look at how genetic information is organized. The basic unit of genetic information is the gene. Genes have been regarded in several ways. At first, it was thought that a gene contained information for the synthesis of one enzyme—the one gene–one enzyme hypothesis. This was modified to the one gene–one polypeptide hypothesis because of the existence of enzymes and other proteins composed of two or more different polypeptide chains (subunits) coded for by separate genes. A segment of DNA that encodes a single polypeptide is sometimes termed a cistron. However, not all genes encode proteins; some code instead for ribosomal RNA (rRNA) and transfer RNA (tRNA) (figure 13.18). Synthesis of RNA from a DNA template is called transcription, and the RNA product has a sequence complementary to the DNA template directing its synthesis. Transcription generates three major kinds of RNA. tRNA carries amino acids during protein synthesis, and rRNA molecules are components of ribosomes. Messenger RNA (mRNA) bears the message for protein synthesis. Thus a gene might be defined as a polynucleotide sequence that codes for one or more functional products (i.e., a polypeptide, tRNA, or rRNA). In this section, we consider the structure of each of these three types of genes.
Most of the genes found in bacterial genomes encode proteins; these are called structural genes. However, DNA does not serve directly as the template for protein synthesis. Rather, the genetic information in the gene is transcribed to give rise to a messenger RNA (mRNA), which is translated (section 13.7) into a protein (figure 13.18). For this to occur, protein-coding genes must contain signals that indicate where transcription should start and stop, and signals in the resulting mRNA that indicate where translation should start and stop. As we describe in more detail in section 13.5, during transcription only one strand of a gene directs mRNA synthesis. This strand is called the template strand, and the complementary DNA strand is known as the sense strand (figure 13.19). Messenger RNA is synthesized from the 5′ to the 3′ end in a manner similar to DNA synthesis. Because the template strand and the mRNA are complementary to one another, the sequence of the mRNA is the same as the sense strand, with uracil substituted for thymine. In other words, a human reading the sense strand can read the nucleotide sequence of the gene in the correct order, even though this is not the strand used as the template.
An important site called the promoter is located at the start of the gene. The promoter is the binding site for RNA polymerase, the enzyme that synthesizes RNA. The promoter is neither transcribed nor translated; it functions strictly to orient RNA polymerase so it is a specific distance from the first DNA nucleotide that will serve as a template for RNA synthesis. The promoter thus specifies which strand is to be transcribed and where transcription should begin. As we discuss in chapter 14, the sequences near the promoter often are very important in regulating when and at what rate a gene is transcribed. Regulation of transcription initiation saves considerable energy and materials (section 14.2)
The transcription start site (labeled +1 in figure 13.19) represents the first nucleotide in the mRNA synthesized from the gene. However, the initially transcribed portion of the gene does not code for amino acids. Instead, it is a leader that is transcribed into mRNA but is not translated into amino acids. In bacteria, the leader includes a region called the Shine-Dalgarno sequence, which is important in the initiation of translation. The leader sometimes is also involved in regulation of transcription and translation. Attenuation and riboswitches stop transcription prematurely (section 14.3); RNA secondary structures control translation (section 14.4)
Immediately next to (and downstream of) the leader is the most important part of the gene, the coding region (figure 13.19). The coding region typically begins with the template DNA sequence 3′-TAC-5′. This is transcribed into the start codon, 5′-AUG-3′, which codes for the first amino acid of the polypeptide encoded by the gene. The remainder of the coding region specifies the sequence of amino acids for the rest of the protein. The coding region ends with a sequence that signals the end of the protein and stops the ribosome during translation. The stop signal is immediately followed by the trailer, which is transcribed but not translated. The trailer contains sequences that prepare RNA polymerase for release from the template strand. Indeed, just beyond the trailer (and sometimes slightly overlapping it) is the terminator. The terminator is a sequence that signals RNA polymerase to stop transcription.
Actively growing cells need a ready supply of tRNA and rRNA molecules for protein synthesis. To ensure this, bacterial cells often have more than one gene for each of these molecules. Furthermore, it is important that the number of each tRNA or rRNA relative to other tRNAs or rRNAs be controlled. This is accomplished in part by having several tRNA or rRNA genes transcribed together from a single promoter.
Page 291
In bacteria, genes for tRNA consist of a promoter, leader, tRNA coding region, and trailer. This is the same as for protein-coding genes, except that the coding region is not translated. When more than one tRNA is transcribed from the promoter, the coding regions are separated by short spacer sequences (figure 13.20a). Whether the gene encodes a single tRNA or multiple tRNAs, the initial transcript must be processed to remove the noncoding sequences (i.e., leader, trailer, and spacers, if present). This is called posttranscriptional modification, and it is accomplished by ribonucleases—enzymes (and in some cases ribozymes) that cut RNA.
Bacterial cells usually also contain more than one rRNA gene. Each gene has a promoter, trailer, and terminator (figure 13.20b). As seen for tRNA genes, the transcript from an rRNA gene is a single, large precursor molecule cut by ribonucleases to yield the final rRNA products. Interestingly, in many bacteria, the trailer regions and the spacers often contain tRNA genes. Thus the precursor rRNA encodes for both tRNA and rRNA.
So far we have discussed genes as discrete units bounded by a promoter and terminator. However not all bacterial genes are organized in this way. Bacterial genes encoding proteins involved in a related process (e.g., encoding enzymes for synthesis of an amino acid) are often located close to each other and are transcribed from a single promoter. Multiple genes controlled by a single promoter comprise an operon. Transcription of an operon yields an mRNA consisting of a leader followed by one coding region, which is separated from the second coding region by a spacer, and so on, with the final sequence of nucleotides being the trailer. Such mRNAs are said to be polycistronic mRNAs (figure 13.21a). Each coding region in the polycistronic mRNA is defined by a start and stop codon. Thus each coding region is translated separately to give rise to a separate polypeptide. Many archaeal genes are also organized in operons. However, operons are rare in eukaryotes. Instead, their mRNAs are usually monocistronic (figure 13.21b), containing information of a single gene
RNA is synthesized by enzymes called RNA polymerases. Most bacterial RNA polymerases contain five types of polypeptide chains: α, β, β′, ω, and σ (figure 13.22). The RNA polymerase core enzyme is composed of five polypeptides (two α subunits, β, β′, and ω) and catalyzes RNA synthesis. The sigma factor (σ) has no catalytic activity but instead functions as a transcription factor by helping the core enzyme recognize the promoter. When sigma is bound to the core enzyme, the six-subunit complex is termed RNA polymerase holoenzyme (figure 13.22). Only holoenzyme can begin transcription, but the core enzyme completes RNA synthesis once it has been initiated.
Transcription involves three separate processes: initiation, elongation, and termination, which together are referred to as the transcription cycle (figure 13.23). The transcription factor sigma is critical to the initiation process. As part of the RNA polymerase holoenzyme, it helps position the core enzyme at the promoter (figure 13.22). Bacterial promoters have several characteristic features. Two of particular note are a sequence of six bases (often TTGACA) about 35 base pairs before (upstream of) the transcription starting point and a TATAAT sequence, usually about 10 base pairs upstream of the transcriptional start site (figure 13.24; also figure 13.19). These regions are called the −35 and −10 sites, respectively, because these are their approximate distances in nucleotides upstream of the first nucleotide to be transcribed (i.e., the +1 site). Sigma factor first recognizes the −35 sequence, directing the holoenzyme to “settle down” on that region of the promoter. Sigma and the core enzyme undergo conformational changes that cause the DNA strands in the AT-rich −10 region to separate. Sigma then interacts with one of the strands and stabilizes the interaction of RNA polymerase with the denatured DNA. The resulting complex of RNA polymerase holoenzyme and DNA is called the open complex
At this point in our discussion, it is worth noting that bacterial cells produce more than one type of sigma factor. Each sigma factor preferentially directs RNA polymerase to a distinct set of promoters. For instance, in E. coli, most genes have promoters recognized by a sigma factor called σ70. This sigma factor recognizes promoters having the −10 and −35 sequences shown in figure 13.24. These are the consensus sequences within promoters recognized by σ70. Promoters recognized by other sigma factors have different consensus sequences. The use of different sigma factors to initiate transcription is a common regulatory mechanism, as we describe in chapter 14. Our focus here is on transcription of genes recognized by σ70.
Once the open complex is formed, transcription can begin. Within the open complex is a region of denatured DNA equivalent to about 16 to 20 base pairs. This is the transcription bubble, and it moves with the RNA polymerase as it synthesizes mRNA during elongation (figure 13.22). Within the transcription bubble, a temporary RNA:DNA hybrid is formed as RNA is synthesized complementary to the DNA template. As RNA polymerase holoenzyme progresses along the DNA template, the sigma factor dissociates from the open complex and can direct another RNA polymerase core enzyme to initiate transcription (figure 13.23).
Page 293
The reaction catalyzed by RNA polymerase is quite similar to that catalyzed by DNA polymerase (figure 13.11). ATP, GTP, CTP, and UTP are incorporated into the RNA complementary to the DNA template. Pyrophosphate fuels the reaction as ribonucleoside monophosphates are incorporated into the growing RNA chain. RNA synthesis proceeds in a 5′ to 3′ direction with new ribonucleotides added to the 3′ end of the growing chain, making the RNA complementary and antiparallel to the template DNA. As elongation continues, mRNA is released through the enzyme’s exit tunnel, and the two strands of DNA behind the transcription bubble resume their double helical structure. As shown in figure 13.23, RNA polymerase is a remarkable enzyme capable of several activities, including separating the DNA strands, moving along the template, and synthesizing phosphodiester bonds in RNA. As is the case with DNA replication, DNA denaturation for transcription introduces supercoiling. A transcribing RNA polymerase requires a topoisomerase to relieve the torsion introduced into the chromosome.
Page 294
An elongating RNA polymerase pauses briefly every 100-200 bases. During a pause, the template base slips in the active site, halting the enzyme. It quickly reengages to continue transcription. Pausing allows the enzyme to interact with sequence-specific regulatory signals that are described more fully in Chapter 14.
Transcription termination occurs when the core RNA polymerase dissociates from the template DNA. There are three kinds of termination mechanisms. The first, intrinsic termination, involves signals within the RNA transcript itself and is summarized in figure 13.26. The other two types of termination are factor-dependent, meaning that they require the aid of proteins. Termination factor rho (ρ) requires ATP to power its movement along a transcript and physically pull RNA out of the open complex, as shown in figure 13.27. The third mechanism is discussed in section 13.8.
The final step in expression of protein-coding genes is translation. Protein synthesis is called translation because the information encoded in the language of nucleic acids must be rewritten in the language of proteins. During translation, the sequence of nucleotides is read in discrete sets of three nucleotides, each set being a codon. Each codon specifies a single amino acid. The sequence of codons is read in only one way—the reading frame (figure 13.28)—to give rise to the amino acid sequence of a polypeptide. Deciphering the genetic code was one of the great achievements of the twentieth century. Here we examine the nature of the genetic code.
The genetic code, presented in RNA form, is summarized in table 13.1. Close inspection of the code reveals several features related not only to the way cells use DNA to store information but also to why it is valuable for storing data, as described in the chapter opening story. One feature is that the code words (codons) are three letters (bases) long; thus one small word conveys a significant amount of information. Each codon is recognized by an anticodon present on a tRNA molecule. Another feature is that the code has punctuation. One codon, AUG, is almost always the first codon in the protein-coding portion of mRNA molecules. It is called the start codon because it serves as the start site for translation by coding for the initiator tRNA. Three other codons (UGA, UAG, and UAA) terminate translation and are called stop or nonsense codons. These codons do not encode an amino acid and therefore do not have a tRNA bearing their anticodon. Thus only 61 of the 64 codons in the code, the sense codons, direct amino acid incorporation into protein. Finally, the genetic code exhibits code degeneracy (also called redundancy); that is, there are up to six different codons for a given amino acid.
Despite the existence of 61 sense codons, there are fewer than 61 different tRNAs. It follows that not all codons have a corresponding tRNA. Cells can successfully translate mRNA using fewer tRNAs because loose pairing between the 5′ base in the anticodon and the 3′ base of the codon is tolerated. Thus as long as the first and second bases in the codon correctly base pair with an anticodon, the tRNA bearing the correct amino acid will bind to the mRNA during translation. This is evident on inspection of the code. Note that the codons for a particular amino acid most often differ at the third position (table 13.1). This somewhat loose base pairing is known as wobble, and it relieves cells of the need to synthesize so many tRNAs (figure 13.29). Wobble also decreases the effects of some mutations. Mutations: heritable changes in a genome (section 16.1)
The description of the genetic code just provided is of the universal genetic code. There are exceptions to the code, although they are limited to organelles and microbes with reduced genomes. The first exceptions discovered were stop codons that encode one of the 20 amino acids. For instance, mycoplasma bacteria use the stop codon UGA to code for glutamine. More dramatic deviations from the code have also been discovered. Members of all three domains of life encode proteins containing the amino acid selenocysteine, the twenty-first amino acid (figure 13.30a). Most selenocysteine-containing enzymes catalyze redox reactions. Pyrrolysine, the twenty-second amino acid, can be found in the proteins from multiple archaea (figure 13.30b). Selenocysteine is inserted at certain UGA codons, whereas pyrrolysine is inserted at UAG codons.
Translation involves decoding mRNA and covalently linking amino acids together to form a polypeptide; this occurs within the ribosome. Translation begins when a ribosome binds mRNA and is positioned so that translation will yield the correct amino acid sequence in the polypeptide chain. Transfer RNA molecules carry amino acids to the ribosome so that they can be added to the polypeptide chain as the ribosome moves down the mRNA molecule. Just as DNA and RNA synthesis proceeds in one direction, so too does protein synthesis. Polypeptide synthesis begins with the amino acid at the end of the chain with a free amino group (the N-terminal) and moves in the C-terminal direction. Thus translation is said to occur in the amino terminus to carboxyl terminus direction.
For translation to occur, a ready supply of tRNA molecules bearing the correct amino acid must be available. Thus a preparatory step for protein synthesis is amino acid activation, the process in which amino acids are attached to tRNA molecules. Before we discuss this process, we need to examine the structure of tRNA molecules.
Transfer RNA molecules are about 70 to 95 nucleotides long and possess several characteristic structural features. These features become apparent when the tRNA is folded so that base pairing within the tRNA strand is maximized. When represented two dimensionally, this base pairing causes the tRNA to assume a cloverleaf conformation (figure 13.31a). However, the three-dimensional structure looks like the letter L (figure 13.31b). One important feature of tRNAs is the acceptor stem, which holds the activated amino acid. The 3′ end of all tRNAs has the same CCA sequence, and in all cases the amino acid is attached to the 3′-hydroxyl of the adenosine nucleotide. Another important feature of a tRNA is the anticodon. The anticodon is complementary to an mRNA codon and is located on the anticodon arm.
Enzymes called aminoacyl-tRNA synthetases catalyze amino acid activation (figure 13.32). As is true of DNA and RNA synthesis, the reaction is driven to completion when ATP is hydrolyzed to release pyrophosphate. The amino acid is attached to the tRNA by a high-energy bond. The storage of energy in this bond provides the fuel needed to generate the peptide bond when the amino acid is added to the growing peptide chain.
There are at least 20 aminoacyl-tRNA synthetases, each specific for a single amino acid and its tRNAs (cognate tRNAs). It is critical that each tRNA attach the corresponding amino acid because if an incorrect amino acid is attached to a tRNA, it will be incorporated into a polypeptide in place of the correct amino acid. The protein synthetic machinery recognizes only the anticodon of the aminoacyl-tRNA and cannot tell whether the correct amino acid is attached. Some aminoacyl-tRNA synthetases proofread just like DNA polymerases. If the wrong amino acid is attached to tRNA, the enzyme hydrolyzes the amino acid from the tRNA, rather than release the incorrect product.
Protein synthesis takes place within ribosomes that serve as workbenches, with mRNA acting as the blueprint. Recall that ribosomes are formed from two subunits, the large subunit and the small subunit, and each contains one or more rRNA molecules and numerous proteins. A bacterial ribosome and its components are shown in figure 13.33. Three sites are found within the ribosome for binding tRNAs: A, P, and E sites. The A (aminoacyl or acceptor) site receives tRNAs carrying an amino acid to be added to the protein being synthesized. The P (peptidyl or donor) site holds a tRNA attached to the growing polypeptide. The E (exit) site is the location from which empty tRNAs leave the ribosome.
Ribosomal RNA has three roles. (1) All three rRNA molecules contribute to ribosome structure. (2) The 16S rRNA of the 30S subunit is needed for initiation of protein synthesis because its 3′ end binds to a site on the leader of the mRNA called the Shine-Dalgarno sequence; thus the Shine-Dalgarno sequence is part of the ribosome-binding site (RBS). This positions the mRNA on the ribosome. The 16S rRNA also binds a protein needed to initiate translation and the 3′ CCA end of aminoacyl-tRNA. (3) The 23S rRNA is a ribozyme that catalyzes peptide bond formation.
Like transcription and DNA replication, protein synthesis is divided into three stages: initiation, elongation, and termination. Bacteria begin protein synthesis with a modified aminoacyl-tRNA, N-formylmethionyl-tRNAfMet (fMet-tRNA), which is called the initiator tRNA and is coded for by the start codon AUG (figure 13.34). The amino acid of the initiator tRNA has a formyl group covalently bound to the amino group and can be used only for initiation because the formyl group blocks peptide bond formation. When methionine is to be added to a growing polypeptide chain (i.e., at an AUG codon in the middle of the mRNA), a normal methionyl-tRNAMet is employed. Although bacteria start protein synthesis with N-formylmethionine, the formyl group is not retained but is hydrolytically removed (section 13.9).
Protein synthesis in bacteria begins with formation of the 30S initiation complex, consisting of the initiator tRNA, the mRNA to be translated, and the 30S ribosomal subunit; two protein initiation factors (IF-1 and IF-2) are involved (figure 13.35). Positioning of the initiator fMet-tRNA on the mRNA is crucial for proper translation of the mRNA. This is accomplished with the help of the 16S rRNA within the 30S subunit, which is complementary to and binds the Shine-Dalgarno sequence in the leader sequence of the mRNA. By aligning the Shine-Dalgarno sequence with the 16S rRNA, the start codon (AUG or sometimes GUG) specifically binds with the fMet-tRNA anticodon. This ensures that the start codon will be translated first.
Once the 30S initiation complex is formed, it binds the 50S ribosomal subunit, forming the 70S initiation complex. The fMet-tRNA is positioned at the peptidyl or P site. At this juncture, you may be wondering what kept the 30S and 50S subunits from binding each other earlier in the initiation stage. The answer is the third initiation factor (IF-3), as illustrated in figure 13.35. Also revealed in this figure is the energy cost of initiation. GTP, like ATP, is a high-energy molecule. Hydrolysis of GTP to GDP provides the energy needed to accomplish initiation. Translation Initiation
Every addition of an amino acid to a growing polypeptide chain is the result of an elongation cycle composed of three phases: aminoacyl-tRNA binding, the transpeptidation reaction, and translocation. The process is aided by proteins called elongation factors (EF). In each turn of the cycle, an amino acid corresponding to the proper mRNA codon is added to the C-terminal end of the polypeptide chain as the ribosome moves down the mRNA in the 5′ to 3′ direction.
At the beginning of an elongation cycle, the P site is filled with either the initiator fMet-tRNA or a tRNA bearing a growing polypeptide chain (peptidyl-tRNA), and the A and E sites are empty (figure 13.36). Messenger RNA is bound to the ribosome in such a way that the proper codon interacts with the P site tRNA (e.g., an AUG codon for fMet-tRNA). The next codon is located within the A site and is ready to accept an aminoacyl-tRNA.
In the aminoacyl-tRNA binding phase, the first phase of the cycle, the aminoacyl-tRNA corresponding to the codon in the A site is inserted so its anticodon is aligned with the codon on the mRNA. In bacterial cells, this is aided by two elongation factors and requires the expenditure of one GTP (figure 13.36). Once the proper aminoacyl-tRNA is in the A site, the transpeptidation reaction occurs (figure 13.36 and figure 13.37). Transpeptidation is catalyzed by the peptidyl transferase activity of the 23S rRNA ribozyme, which is part of the 50S ribosomal subunit. Transpeptidation results in the transfer of the peptide chain from the tRNA in the P site to the tRNA in the A site, as a peptide bond is formed. No extra energy source is required for peptide bond formation because breaking the bond that links an amino acid to tRNA releases enough energy to drive the reaction (figure 13.32).
The final phase in the elongation cycle is translocation. Three things happen simultaneously: (1) the peptidyl-tRNA moves from the A site to the P site; (2) the ribosome moves one codon along mRNA so that a new codon is positioned in the A site; and (3) the empty tRNA moves from the P site to the E site and subsequently leaves the ribosome. Translocation involves rotations of the 30S and 50S subunits relative to each other. In addition, the head portion of the 30S subunit swivels. These changes in ribosome structure move the tRNAs into their new locations; the codon-anticodon interactions between the tRNAs and the mRNA move the mRNA as the tRNAs move. One elongation factor participates and one GTP is hydrolyzed during this intricate process. Translation Elongation
Protein synthesis stops when the ribosome reaches one of three stop codons: UAA, UAG, and UGA. The stop codon is found on the mRNA immediately before the trailer. Three protein release factors (RF-1, RF-2, and RF-3) aid the ribosome in recognizing these codons. Because there is no cognate tRNA for a stop codon, the ribosome halts. Peptidyl transferase hydrolyzes the bond linking the polypeptide to the tRNA in the P site, and the polypeptide and the empty tRNA are released. Ribosome recycling factor disassembles the translational complex. The mRNA is released and the two ribosomal subunits separate.
Insertion of the unusual amino acids selenocysteine and pyrrolysine during translation occurs by two distinctive mechanisms. Selenocysteine is synthesized from serine after it has been attached to certain tRNAs. The enzyme catalyzing the conversion is selenocysteine synthase. Once formed, the amino acid is recognized by a specific elongation factor and is incorporated when a UGA stop codon is encountered in association with nucleotide sequences called cis-acting selenocysteine insertion sequence elements (SECIS). In bacteria, SECIS are found immediately after the UGA stop codon.
Pyrrolysine insertion differs from that of selenocysteine in several ways. Pyrrolysine is synthesized from lysine before being attached to a tRNA. Organisms that use pyrrolysine make an unusual tRNA with a CUA anticodon; the pyrrolysine is attached by a specific aminoacyl-tRNA synthetase. Pyrrolysine is inserted at UAG stop codons located near a sequence element called pyrrolysine insertion sequence (PYLIS). Both SECIS and PYLIS form stem-loop structures that prevent translation termination.
Protein synthesis is a very expensive process. Two ATP high-energy bonds are required for amino acid activation, initiation consumes one GTP, two GTP molecules are used during each elongation cycle, and another GTP is hydrolyzed when protein synthesis terminates (figures 13.32, 13.35, 13.36, and 13.38). Presumably this large energy expenditure is required to ensure the fidelity of protein synthesis. Fidelity is assessed both before and after formation of the peptide bond. When an aminoacyl-tRNA enters the A site, correct pairing of the anticodon and codon causes conformational changes in components of the ribosome such that the aminoacyl-tRNA is locked into place in a manner that facilitates peptide bond formation. These conformational changes do not occur for an incorrect aminoacyl-tRNA and the tRNA is ejected. However, on rare occasions the incorrect aminoacyl-tRNA is selected and a peptide bond is formed between the growing polypeptide and the wrong amino acid. The presence of an incorrect amino acid is recognized by release factors. This leads to hydrolysis of the aberrant polypeptide from the tRNA, its release from the ribosome, and termination of translation.
In sections 13.3, 13.5, and 13.7, we described the mechanistic details of DNA replication, transcription, and translation. We must now acknowledge that these processes don’t take place in isolation but occur coordinately within the cell. Replication and transcription both denature the double-stranded DNA in a short region to use it as a template. Transcription and translation are physically linked on mRNA. In this section, we examine the physical and regulatory strategies that enable these processes to co-exist within the bacterial cell.
Chromosomes have 2–4 replisomes, the DNA-protein complexes at the replication fork. In contrast, transcribing RNA polymerases are significantly more abundant on the chromosome, numbering in the thousands. A replisome, then, must be able to navigate an obstacle course of RNA polymerases in competition for the same template.
Replisomes move in a specified direction from the origin. In contrast, genes can have either DNA strand act as the template for transcription—that is, they can be oriented in the same or the opposite direction of replisome movement (figure 13.38). Gene orientation is not random, however. The sequences of thousands of genomes illustrate that most genes, and especially the genes that are highly transcribed, are read in the same direction as the replisome moves.
The outcome of a conflict between the replisome and RNA polymerase is dictated by gene orientation, with the most severe effects from head-on collisions (figure 13.38a). These collisions stall the replisome, with three consequences. First, the replisome may dissociate from the DNA template, which then requires accessory proteins to reassemble at sites other than oriC. Second, replication fidelity is compromised, leading to errors. Third, increased supercoiling in front of moving polymerases (both DNA and RNA) builds to an extreme level as the enzymes approach one another (figure 13.38a). Spontaneous mutations (section 16.1)
Page 303
In contrast, co-directional conflicts, or rear-end collisions (figure 13.38b), are more easily resolved. In addition to the helicase that propels the replisome, alternative helicases in the cell promote replication fork movement through barriers like RNA polymerases. It is believed that the replisome may use the RNA transcript to re-prime leading strand synthesis. In addition, the protein Mfd is considered a transcriptional terminator because it travels along DNA to evict paused or stalled RNA polymerases.
To achieve rapid translation rates of protein synthesis, mRNAs often are simultaneously complexed with several ribosomes, each ribosome reading the mRNA message and synthesizing a polypeptide. At maximal translation rates, there may be a ribosome every 80 nucleotides or as many as 20 ribosomes simultaneously reading an mRNA that codes for a 50,000 dalton polypeptide. A complex of mRNA with several ribosomes is called a polyribosome or polysome (figure 13.39). Polysomes are present in all organisms. Bacteria can further increase the efficiency of gene expression by coupling transcription and translation (figure 13.39b). While RNA polymerase is synthesizing an mRNA, ribosomes can already be attached to the mRNA so that transcription and translation occur simultaneously. Coupled transcription and translation is possible in bacterial cells because a nuclear envelope does not separate the translation machinery from DNA, as it does in eukaryotes.
DNA replication and transcription use the same substrate: the chromosome located in the nucleoid. Translation, however, occurs in the cytoplasm. It is believed that a transcribing RNA polymerase moves toward the nucleoid periphery where its transcript interacts with ribosomal subunits. Recent work has demonstrated that the transcription elongation factor NusG bridges RNA polymerase and the 30S ribosomal subunit. It appears to keep the mRNA single stranded for a smooth transition to the ribosome. This assembly of enzymes and the ribosome has been termed the expressome.
For a polypeptide to assume its cellular function, more is required than simply having the correct sequence of amino acids linked together on the ribosome. Some amino acids must be modified, and proteins must be properly folded and in some cases associated with other protein subunits to generate a functional enzyme (e.g., DNA and RNA polymerases are multimeric proteins). In addition, proteins must be delivered to the proper subcellular or extracellular site. Because these three events—maturation, folding, and secretion—are often linked in protein complexes, we discuss them together.
The information for protein maturation is encoded in the protein’s primary and secondary structures. Decoding that information occurs as the protein emerges from the ribosome. Because protein synthesis is initiated with N-formylmethionine, the N-terminus of a polypeptide matures with the removal of the N-formyl group. Some proteins require the additional removal of the methionine, while others remove several additional N-terminal amino acids. This is specified by the identity of the second amino acid in the polypeptide. The enzymes that catalyze these reactions associate with the ribosome near the polypeptide exit tunnel in the large subunit. These reactions occur cotranslationally, i.e., during protein synthesis.
Polypeptides begin to assume their secondary structures, α-helices, β-strands, and coils, as they pass through the ribosome exit tunnel. Assembly of a functional tertiary structure requires the correct interaction of the secondary structural elements. The linear format of protein synthesis can sometimes result in improper folding or aggregation as polypeptides enter the crowded cytoplasm. Cells use proteins called chaperones to suppress incorrect folding and in some cases to correct any misfolding that may have occurred. Chaperones are so critical to protein structure that they are present in all three domains of life.
A chaperone called trigger factor (TF) associates with the ribosome and successfully folds many cytoplasmic proteins. TF binds to the growing polypeptide chain and is thought to mask hydrophobic regions so they don’t interact with each other prematurely or with other proteins. Recall that hydrophobic regions are generally in the protein interior, so the presence of a hydrophobic surface region indicates that the polypeptide may need help to assume its functional shape. TF also acts as an isomerase on the peptide bond that precedes proline residues in the polypeptide. The proline cis/trans isomerization is a critical step in protein folding, and several different chaperones have this enzymatic activity.
About 25% of proteins need additional help folding, either cotranslationally or posttranslationally. For these proteins, other chaperones like DnaJ/DnaK/GrpE or GroES/GroEL complete the folding process. These chaperones are cytoplasmic and require ATP. Among its many roles, DnaJ/DnaK/GrpE assists proteins with iron-sulfur centers to insert and chemically coordinate the iron atom. GroES/GroEL forms a cage in which misfolded proteins are uncoiled and refolded, away from cytoplasmic crowding. Large proteins with complex topologies are most likely to require GroES/GroEL for proper maturation.
It has been estimated that more than one-third of the proteins synthesized leave the cytoplasm. They can have several destinations including the plasma membrane, the external milieu, or in the case of Gram negatives, the periplasm or the outer membrane. Therefore it is not surprising that multiple systems for moving proteins have evolved. Some of these are found in all domains of life, while others are unique to bacterial cells, and still others are restricted to only Gram-negative or Gram-positive bacteria. When proteins are moved from the cytoplasm to or across the plasma membrane, the movement is called translocation. Protein secretion refers to the movement of proteins from the cytoplasm to the external environment. All protein translocation and secretion systems described here require the expenditure of energy in the form of ATP or the proton motive force. In addition, the protein complexes responsible for movement across the plasma membrane are localized to functional membrane microdomains. Plasma membrane structure is dynamic (section 3.3)
Page 305
The variations in cell envelope structure pose different challenges for protein secretion. In Gram-positive bacteria, the proteins must be translocated across the plasma membrane. Once across the plasma membrane, the protein either passes through the relatively porous peptidoglycan into the external environment or becomes attached to the peptidoglycan. Gram-negative bacteria also transport proteins across the plasma membrane, but then must move them across the outer membrane.
Three translocation systems—the Sec system, the Tat system, and YidC—are observed in both Gram-negative and Gram-positive bacteria. The Sec system, sometimes called the general secretion pathway, is highly conserved, having been identified in all three domains of life (figure 13.40). It translocates unfolded proteins across the plasma membrane. YidC is dedicated to folding and translocating plasma membrane proteins, and often coordinates with the Sec system.
Before the Sec system translocates proteins, they must be sorted and targeted to their ultimate destination. The N-terminal polypeptide region, the first part to exit the ribosome, is termed the signal peptide or signal sequence. This region of the protein signals factors associated with the ribosome to direct it to one of several routes. Signal recognition particle (SRP) is a protein-RNA complex that surveys actively translating ribosomes at the same time as trigger factor (TF). Whereas TF folds and releases cytoplasmic proteins, SRP identifies membrane proteins with extremely hydrophobic signal peptides. Membrane proteins in particular have a transmembrane α-helix that SRP recognizes as it probes the ribosome exit tunnel. SRP escorts the entire ribosome to the membrane.
An alternative route to the membrane is via the SecA protein. SecA recognizes signal peptides that are less hydrophobic than those preferred by SRP. SecA transport to the membrane may occur cotranslationally or posttranslationally. SecA does not function alone. Three proteins (SecY, SecE, and SecG) form a channel in the membrane. SecA threads the signal peptide and the preprotein passes through the channel (figure 13.40). SecA acts as a motor, using the energy from ATP hydrolysis, and two other proteins (SecDF) use the proton motive force to translocate the preprotein. The SecYEG channel is unique in that it opens in two ways, based on the delivery vehicle and the protein destination. SRP-ribosome complexes align the ribosome exit tunnel with the SecYEG channel, and SecYEG opens laterally for membrane proteins to be inserted.
The Tat system (figure 13.41) is also widespread in both Gram-positive and Gram-negative bacteria, although it translocates only a few dozen proteins. Tat is an acronym for twin-arginine translocase, referring to two consecutive arginines, a distinctive feature in the signal peptide of proteins translocated by this system. Tat-secreted proteins must completely fold in the cytoplasm; thus unlike the Sec system, the Tat system secretes only folded proteins. Some proteins have dedicated chaperones to insert cofactors and assist in folding. A protein docking complex recognizes the signal peptide and escorts the protein to the pore complex in the membrane. Secretion by Tat relies on the proton motive force for energy.
Following polypeptide translocation, two additional maturation steps occur. Regardless of the secretion system used, the signal peptide must be removed from the preprotein. Signal peptidase is a membrane protein that recognizes the signal peptide and hydrolyzes the peptide bond just after it. Finally, for proteins that require disulfide bonds, the enzyme protein disulfide isomerase (PDI) joins cysteine residues (with -SH groups) to form disulfides (-S-S-). Because the cytoplasm is a reducing environment, it does not support this oxidation reaction. In Gram-negative bacteria, it occurs in the periplasm, catalyzed by enzymes abbreviated Dsb (disulfide bond). DsbA and DsbB act on proteins as they are released from SecYEG, and DsbA oxidizes cysteines in order from the N-terminus. Not all proteins fold in this pattern, and DsbC acts as both a chaperone and PDI to correct any misformed disulfides. Gram-positive bacteria, in general, do not rely on disulfide bond formation. Although some secreted proteins contain disulfides, their maturation pathways remain to be determined.
Page 306
In Gram-positive bacteria, a few proteins are anchored to the cell surface by covalent attachment to peptidoglycan. These proteins have a second signal peptide termed a sorting signal at the C-terminus that directs them to sortase enzymes embedded in the membrane. Sortase catalyzes a reaction between the sorting signal and Lipid II, the peptidoglycan precursor and these surface protein–Lipid II molecules are inserted into the growing cell envelope. Synthesis of peptidoglycan occurs in the cytoplasm, at the plasma membrane, and in the periplasmic space (section 12.4)
Proteins produced in Gram-negative bacteria destined for secretion or surface display must pass through the plasma membrane, the periplasmic space, and the outer membrane. Some proteins cross in a two-step process. The first step is translocation to the periplasm by the Sec or Tat system, as described, and the second step is secretion across the outer membrane. In contrast, one-step processes involve structures composed of multiple polypeptides that completely span the periplasm (figure 13.41). Secretion systems are numbered (Type I secretion system, Type II secretion system, etc.) in order of discovery. Three of these secretion systems, Types II, V, and IX, comprise the second step in two-step processes; and the remainder constitute one-step systems. Although Type VII secretion systems are found in Gram-positive bacteria, we discuss them here because genera possessing Type VII secretion systems have a diderm cell wall structure. Several secretion systems are best known for their roles in pathogenicity, where the proteins that they secrete, termed effectors, contribute to host injury.
Type V secretion systems (T5SS) are mechanistically simple in that the protein to be exported passes from the periplasm through a barrel structure in the outer membrane (figure 13.41). The barrel may be either a part of the protein being translocated or a separate polypeptide that is coexpressed and cotranslocated to the periplasm. Chaperones accompany the barrel or barrel domain through the aqueous periplasm to the membrane for insertion and folding. T5SS secrete proteases involved in virulence and adhesins that mediate the attachment of bacterial cells to substrates.
Page 307
Type II secretion systems (T2SS) are found only in Proteobacteria. A cell may have multiple T2SSs, as each system is dedicated to one or a few proteins. The T2SS is anchored in both membranes by complexes comprised of many proteins, and the two membrane complexes are connected with a pseudopilus (figure 13.41). Proteins for secretion enter from the periplasm and the pseudopilus acts like a piston to push the proteins through the outer membrane. Typical cargo for a T2SS includes degradative enzymes like proteases, lipases, and cellulases. Synthesis of the T2SS is often initiated by quorum sensing. Cell-cell communication within microbial populations (section 7.6) Proteobacteria (chapter 21)
Type IX secretion systems (T9SS) also have a narrow phylogenetic distribution. They are limited to the Bacteroidota, where they are responsible for secreting the gliding motility proteins in Flavobacterium and the virulence proteases in the dental pathogen Porphyromonas. These proteins are translocated by the Sec system and fold in the periplasm. A C-terminal signal peptide tags the proteins for outer membrane localization or surface display. At the outer membrane, a sortase removes the signal peptide and attaches the protein to the lipopolysaccharide
Type I secretion systems (T1SS) are the simplest of the one-step secretion systems, and a bacterial cell may contain many of these systems for the export of different proteins (figure 13.41). Proteins exported by T1SS include S-layer subunits, degradative enzymes, and proteins important in pathogenesis. Each T1SS has three components: an ABC (ATP binding cassette) transporter in the plasma membrane (see figure 3.13), a membrane fusion protein that bridges the periplasm, and a barrel structure that spans the outer membrane. The ABC component is specific for a single substrate, whereas the outer membrane barrel is common to multiple T1SS. The three components assemble only when the substrate binds to the ABC transporter.
Type III secretion systems (T3SS), also called injectosomes, are molecular syringes (see figure 38.12) used by pathogens to inject proteins into eukaryotic host cells. The injected proteins are called effector proteins and promote pathogenesis by altering the cell cytoskeleton, signal transduction pathways, or other cellular processes. T3SS have a basal body embedded in the plasma membrane, and an extracellular needle. The basal body includes a channel that directs effector proteins through the periplasm to the hollow needle for delivery to a host cell. The T3SS structure uses energy from both the proton motive force and ATP. T3SS are structurally complex, and the assembly of multiple polypeptides must be carefully regulated. Proteins to be secreted by a T3SS include the distal portions of the T3SS needle, and later the effector proteins targeted to the eukaryotic host. Assembly is initiated in response to both bacterial and host factors, and once assembled, the T3SS remains poised to act until an appropriate target is contacted. The nature of the signal to secrete effectors into the host cell varies with different species. In some cases, chemical signals like a change in pH or calcium concentration initiates secretion, whereas in other cells, the signal appears to be a consequence of mechanical sensing. Plague (section 38.2)
Type IV secretion systems (T4SS) are most commonly used for DNA transfer although some are dedicated to the secretion of DNA-protein complexes or proteins. They evolved from the conjugative machinery used to transfer DNA (see figure 16.20). Assembly of the T4SS is believed to occur at the cell poles and includes integral membrane proteins in both inner and outer membranes, ATPases in the cytoplasm to provide the energy for secretion, and a pilus. Pili are produced constitutively, but effector secretion requires a host cell for delivery. Understanding how pilus biogenesis and secretion of effector molecules are coupled is an active area of investigation.
Type VI secretion systems (T6SS) are contractile weapons used by bacteria to deliver toxins to neighboring cells, both bacterial and eukaryotic. T6SS are composed of a baseplate in the plasma membrane and a membrane spanning complex in the periplasm. Two concentric cylindrical structures, an outer sheath and an inner tube, form in the cytoplasm and then dock onto the membrane complex (figure 13.41). The sheath and tube assembly may be up to 1 μm long. In response to a contact signal, the inner tube is ejected, puncturing the target cell and delivering multiple diverse effector proteins to the target cell’s cytoplasm. Related bacteria have immunity proteins and are not affected by the secreted toxins. Thus a population of related cells can dominate a niche, as exemplified by Vibrio cholerae. In aquatic environments, chitin is a preferred nutrient, and V. cholerae competes with other bacteria and predatory amoeba by delivering toxins via the T6SS. Should humans ingest V. cholerae via contaminated water, the same T6SS is used to compete with the gut microbiome to establish the infection known as cholera. Bacteriophage T4: a virulent bacteriophage (section 26.2); A functional core microbiome is required for host homeostasis (section 33.3); Cholera (section 38.4)
Type VII secretion systems (T7SS) were initially described in Mycobacterium spp. Although these organisms are considered Gram positive, they have a unique cell envelope comprised of a plasma membrane and an outer mycomembrane that features mycolic acids (see figure 22.6). Protein secretion across both membranes necessitates a secretion system, as found in Gram-negative bacteria. M. tuberculosis uses a T7SS to deliver important virulence factors to host immune cells. Mycobacterium tuberculosis