CHAPTER- 7 DNA: Genetic Material, Structure, and Replication

7.1 DNA Is The Genetic Material

Historical Context: What was known before Watson and Crick

Genes (Mendel's 'hereditary factors'): Were known to be associated with specific traits, but their physical nature remained unknown.
Mutations: Were understood to alter gene function, but their precise chemical nature was a mystery.
One-gene-one-enzyme hypothesis (Chapter 5): Postulated that genes are responsible for determining the structure of proteins.
Location of Genes: Genes were firmly established to be carried on chromosomes.
Chromosome Composition: Chromosomes were known to be composed of both DNA and protein.
Pioneering Experiments: Studies initiated in the $1920 ext{s}$ provided the initial evidence identifying DNA as the genetic material.

The Discovery of Bacterial Transformation: The Griffith Experiment (1928)

Observation: Frederick Griffith observed that the genotype and phenotype of a live bacterial strain could be changed, or "transformed," by mixing it with a different, heat-killed bacterial strain.
Bacterium Used: Streptococcus pneumoniae, which causes pneumonia in humans and is lethal in mice.
Strains Employed: Griffith used two distinguishable strains:
- S strain: Normal, virulent type. Cells possess a polysaccharide capsule, giving colonies a smooth appearance. This strain is lethal to most laboratory animals (mice).
- R strain: Mutant, non-virulent type. Lacks the polysaccharide capsule, resulting in rough-appearing colonies. This strain grows in mice but is not lethal.
**Experimental Design and Results (Summarized in Figure 7-2):
- (a) Live S strain injection: Mouse dies. Virulent S cells are deadly.
- (b) Live R strain injection: Mouse lives. Non-virulent R cells are harmless.
- (c) Heat-killed S strain injection: Mouse lives. Boiling virulent S cells kills them, and their carcasses alone do not cause death.
- (d) Mixture of heat-killed S strain and live R strain injection: Mouse dies. Significantly, live S cells were recovered from the dead mice.
Conclusion: The cell debris from the heat-killed S cells somehow converted some of the live R cells into live S cells. This indicated that the live R cells were "transformed" by picking up a chemical component from the dead S cells. This process is called transformation (also discussed in Chapter 6).

Evidence that DNA is the Genetic Material in Bacteria: The Avery, MacLeod, and McCarty Experiments (1944)

Objective: To identify the specific chemical component from dead S cells responsible for the transforming ability observed by Griffith, as this molecule was a strong candidate for hereditary material.
Approach: Oswald Avery, Colin MacLeod, and Maclyn McCarty systematically destroyed major categories of chemicals (polysaccharides, lipids, RNAs, proteins, and DNA) in an extract from dead S cells, then tested the extract's ability to transform live R cells.
Results:
- Destroying polysaccharides, lipids, RNAs, or proteins in the S cell extract did not abolish its transforming ability.
- However, when the S cell extract was treated with the enzyme deoxyribonuclease (DNase), which specifically degrades DNA, the mixture lost its ability to transform R cells into S cells.
Conclusion: These results provided strong evidence that DNA is the genetic material.
Mechanism (now known): Fragments of the transforming DNA (carrying genes for virulence) enter the recipient bacterial chromosome and replace their non-virulent counterparts.
Key Concept: This scientific demonstration that DNA is the transforming agent was the first direct evidence that genes, the hereditary material, are composed of DNA.

Evidence that DNA is the Genetic Material in Phage: The Hershey-Chase Experiment (1952)

Context: Despite Avery's definitive findings, many scientists remained skeptical, questioning how a seemingly "low-complexity molecule" like DNA could encode the vast diversity of life, favoring proteins as a more complex candidate.
Organism: Bacteriophage T2 (or phage T2), a virus that specifically infects bacteria.
Rationale: Hershey and Chase reasoned that the infecting phage must inject the specific genetic information into the bacterium that directs the production of new viral particles. Identifying this injected material would determine the genetic material of phages.
Phage Composition: T2 phage is primarily composed of protein (forming the structural parts) and DNA (contained within the protein sheath of its "head").
Experimental Method: Radioisotopic Labeling:
- They used radioisotopes to uniquely label phage DNA and protein.
- DNA Labeling: Since phosphorus is present in DNA but absent in amino acids (protein building blocks), they incorporated ${}^{32} ext{P}$ into the DNA of one phage culture.
- Protein Labeling: Since sulfur is found in proteins but not in the nucleotide building blocks of DNA, they incorporated ${}^{35} ext{S}$ into the proteins of a separate phage culture.
- Radioisotopes are unstable and emit radiation detectable by instruments such as a scintillation counter or Geiger counter, or by autoradiography.
Infection and Separation:
- Two E. coli cultures were infected: one with ${}^{32} ext{P}$ -labeled phages, the other with ${}^{35} ext{S}$ -labeled phages.
- After allowing sufficient time for infection, the cultures were sheared in a kitchen blender to dislodge empty phage carcasses (called "ghosts") from the bacterial cells.
- Centrifugation was then used to separate the heavier bacterial cells (forming a solid pellet) from the lighter phage ghosts (remaining in the liquid supernatant).
Results:
- ${}^{32} ext{P}$ -labeled phages: The radioactivity (DNA) was found primarily inside the bacterial cells (pellet). Furthermore, the progeny phages produced from this infection also contained ${}^{32} ext{P}$ .
- ${}^{35} ext{S}$ -labeled phages: The radioactive material (protein) was found predominantly in the phage ghosts (supernatant), outside the bacterial cells. The progeny phages produced were not labeled with ${}^{35} ext{S}$ .
Conclusion: These data definitively demonstrated that DNA, not protein, is the hereditary material injected into bacterial cells by phages. Phage proteins merely serve as structural packaging that is discarded after delivering the viral DNA to the host cell.

7.2 DNA Structure

Requirements for Hereditary Material

Even before its structure was fully understood, genetic studies indicated that the hereditary material must possess three fundamental properties:

Accurate Replication: Given that nearly every cell in an organism's body contains the same genetic information, the genetic material must be capable of highly accurate replication during every cell division. Therefore, its structural features must facilitate precise copying.
Informational Content: The genetic material must encode the entire collection of proteins expressed by an organism. Its structural features must allow for the storage and expression of this vast amount of information.
Capacity for Change (Mutation): Hereditary changes, known as mutations, provide the essential raw material for evolutionary selection. Thus, the genetic material must be able to change on rare occasions. Concurrently, its structure must be stable enough to reliably carry its encoded information without excessive alteration.

DNA Structure Before Watson and Crick

Watson and Crick's elucidation of the double-helical structure of DNA was akin to solving a complex three-dimensional puzzle, achieved through "model building" by integrating existing experimental data.

The Building Blocks of DNA

Basic Components: Chemically, DNA is relatively simple, consisting of three primary components:
1. Phosphate group.
2. A sugar called deoxyribose.
3. Four nitrogenous bases: Adenine (A), Guanine (G), Cytosine (C), and Thymine (T).
Deoxyribose vs. Ribose: The sugar in DNA is "deoxyribose" because it is a variant of ribose (found in RNA) that is missing an oxygen atom. Specifically, deoxyribose has a hydrogen atom ( $H$ ) at the $2'$ -carbon position, unlike ribose which has a hydroxyl ( $OH$ ) group there (Figure 7-5).
Types of Bases:
- Purines: Adenine and Guanine, characterized by a double-ring structure.
- Pyrimidines: Cytosine and Thymine, which have a single-ring structure.
Nomenclature: Carbon and nitrogen atoms within the rings of the bases are assigned numbers. Carbon atoms in the sugar group are assigned prime-numbered positions ( $1', 2',$ etc.).
Nucleotides (Deoxynucleotides): The fundamental chemical subunits of DNA. Each deoxynucleotide is composed of a phosphate group, a deoxyribose sugar molecule, and one of the four bases.
- For convenience, nucleotides are referred to by the first letter of their base (A, G, C, T).
- Example: The nucleotide with an adenine base is deoxyadenosine $5'$ -monophosphate, abbreviated dAMP. The $5'$ refers to the carbon atom position on the sugar where the single phosphate group is attached.
Key Concept: DNA contains four bases: two purines (adenine and guanine) and two pyrimidines (cytosine and thymine). DNA nucleotides (deoxynucleotides) are comprised of a phosphate, a deoxyribose sugar, and a purine or pyrimidine base.

Chargaff's Rules of Base Composition

Erwin Chargaff's work, conducted years before Watson and Crick, provided crucial empirical rules about the amounts of each nucleotide type in DNA, observed across a wide range of organisms (Table 7-1):

Purine-Pyrimidine Equivalence: The total amount of purine nucleotides ( $A + G$ ) always equals the total amount of pyrimidine nucleotides ( $T + C$ ).
Specific Base Pairing Ratios: The amount of Adenine (A) always approximately equals the amount of Thymine (T), and the amount of Guanine (G) always approximately equals the amount of Cytosine (C). This means the A/T and G/C ratios are close to $1.0$ , regardless of the DNA source.
Organismal Variation in (A+T)/(G+C) Ratio: The total amount of A + T is not necessarily equal to the total amount of G + C. The $(A + T)/(G + C)$ ratio varies significantly among different organisms. For instance:
- Sea urchins have a ratio of $1.85$ , indicating an AT-rich genome (nearly twice as much A+T as G+C).
- Mycobacterium tuberculosis has a ratio of $0.42$ , indicating a GC-rich genome (about twice as much G+C as A+T).
Tissue Invariance: The $(A + T)/(G + C)$ ratio is virtually identical in different tissues of the same organism (e.g., human thymus, liver, sperm), reinforcing the concept that all somatic cells within an organism share the same genomic DNA sequence.

Key Concept: DNA contains equal amounts of A and T nucleotides, and G and C nucleotides. While organisms vary in their overall A+T versus G+C content, different tissues within the same organism maintain consistent ratios.

X-ray Diffraction Analysis of DNA: Rosalind Franklin

Method: Rosalind Franklin (Figure 7-6b) utilized X-ray diffraction, where X-rays were aimed at fibers of purified DNA extracted from cells.
Observation: The scatter of X-rays from the fibers was detected as spots on photographic film (Figure 7-6a). The angle of scatter for each spot provides information about the position of atoms or groups of atoms within the DNA structure. Darker spots result from multiple X-ray hits from repeating DNA motifs, such as nucleotide bases.
Interpretation: While complex, the data strongly suggested that DNA is a long, skinny molecule composed of two similar parts that run parallel to each other along its length. Crucially, the X-ray patterns indicated that DNA possessed a helical (spiral staircase-like) structure.
Significance: Franklin's best X-ray photograph (Figure 7-6a) was shown to Watson and Crick and proved to be the "crucial piece of the puzzle" that enabled them to deduce the precise three-dimensional structure of DNA.

Watson and Crick's DNA Model: The Double Helix Structure (1953)

Watson and Crick's landmark paper, published in Nature in $1953$ , heralded a new era in biology, proposing a novel DNA structure with profound biological implications. Their model successfully satisfied the requirements for a hereditary molecule: information storage, replication fidelity, and mutability.

Model Building and Insights: By constructing physical models and synthesizing existing data, Watson and Crick made critical observations:
- Diameter Constraint: The observed diameter of the double helix (from X-ray data) was best explained if a purine base (double-ring) always paired with a pyrimidine base (single-ring). Pairing two purines would make the DNA too wide, while two pyrimidines would make it too skinny (Figure 7-8).
- Chargaff's Rules Integration: This purine-pyrimidine pairing could account for the $(A + G) = (T + C)$ regularity. However, Chargaff's more specific observation of $A=T$ and $G=C$ guided their final deduction.
Complementary Base Pairing: Watson and Crick concluded that each base pair consists of one purine and one pyrimidine, paired according to strict complementarity:
- Guanine (G) always pairs with Cytosine (C) (G-C).
- Adenine (A) always pairs with Thymine (T) (A-T).
Validation: This double helix model elegantly accounted for both Franklin's X-ray diffraction data and Chargaff's base composition rules.
Key Concept: The two strands of DNA contain complementary base pairs—G pairs with C, and A pairs with T.

Detailed Structural Features of the Double Helix (Figure 7-9)

Strands and Helical Shape: The structure consists of two side-by-side nucleotide chains, or "strands," twisted into a double helix.
Turns per Helix: Each complete turn of the helix contains $10$ base pairs.
Handedness: DNA is a right-handed helix, meaning it twists clockwise, analogous to a standard screw.
Stabilization: The two strands are held together by hydrogen bonds formed between the complementary purine and pyrimidine bases.
Sugar-Phosphate Backbone: The outer structure of each strand (the "stairs of a spiral staircase") is formed by alternating phosphate and deoxyribose sugar units, connected by phosphodiester linkages.
- Phosphodiester Linkage: Connects the $5'$ -carbon atom of one deoxyribose sugar to the $3'$ -carbon atom of the adjacent deoxyribose sugar.
Polarity (Directionality): Each sugar-phosphate backbone therefore has a distinct $5'$ -to- $3'$ polarity or direction.
Antiparallel Orientation: The two strands of DNA run in opposite directions; one is oriented $5' ext{-to-}3'$ and the other is oriented $3' ext{-to-}5'$ . This antiparallel arrangement is crucial for DNA function.
Base Attachment: Each base is attached to the $1'$ -carbon atom of its deoxyribose sugar and points inward, interacting with a base on the opposing strand.
Hydrogen Bonds and Stability:
- G-C base pairs form three hydrogen bonds.
- A-T base pairs form two hydrogen bonds.
- Prediction and Confirmation: DNA with a higher G-C content is more stable and requires higher temperatures to "melt" (separate the strands, a process called DNA denaturation).
Key Concept: A-T base pairs have two hydrogen bonds, while G-C base pairs have three.
Base Stacking: The flat planar base pairs stack on top of one another in the center of the double helix (Figure 7-10a). This stacking interaction contributes significantly to DNA stability by excluding water molecules from the spaces between base pairs. A single strand of nucleotides on its own does not form a helix; the helical shape depends entirely on the pairing and stacking of bases within the antiparallel strands.
Grooves: The geometry of the base pairs creates two distinct helical grooves along the DNA molecule:
- Major grooves: Shallow and wide regions where the sugar-phosphate backbones are farther apart.
- Minor grooves: Narrow and deep regions where the sugar-phosphate backbones are closer together.
- These grooves are vital features recognized by proteins that bind to DNA.
Key Concept: The geometry of base pairs creates shallow, wide major grooves and narrow, deep minor grooves along the DNA helix, which are essential features for protein binding.

The Double Helix Model Fulfills Hereditary Material Requirements

The structure of DNA is considered a pivotal $20^{ ext{th}}$ century biological discovery because it not only aligned with existing data but also provided a clear framework for how DNA could function as hereditary material:

Determines Protein Structure (Information Storage): The double-helical structure suggested that the sequence of nucleotides in DNA directly dictates the sequence of amino acids in a protein. This implied the existence of a "genetic code" for translating nucleotide sequences into protein sequences (further discussed in Chapter 9).
Mechanism for Replication: Watson and Crick famously noted in their $1953$ paper, "It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material." This statement proposed the semiconservative replication mechanism.
Basis for Mutation: If the nucleotide sequence specifies amino acid sequence, then mutations could arise from the substitution of one nucleotide for another at specific positions (discussed in Chapter 16).

7.3 DNA Replication Is Semiconservative

Semiconservative Replication: Watson and Crick's Hypothesis

Mechanism: In this hypothesized model, the DNA double helix unwinds, and each original DNA strand serves as a template. New complementary bases are assembled onto each template strand following the A-T and G-C base-pairing rules.
Outcome: This process creates two new double helices, both identical to the original. Each new double helix is composed of one original ("parental") strand and one newly synthesized ("daughter") strand.
Term: This mode is termed semiconservative because each new helix conserves half of the original (parental) molecule.

Alternative Hypotheses for DNA Replication

Before the Meselson-Stahl experiment, two other modes of DNA replication were also considered:

Conservative Replication: The parent DNA double helix would be entirely conserved, and a completely new daughter double helix would be produced, consisting solely of two newly synthesized strands (Figure 7-11b).
Dispersive Replication: Two new DNA double helices would be produced, but each strand within these helices would be a mosaic, containing segments of both parental DNA and newly synthesized daughter DNA (Figure 7-11c).

Evidence that DNA Replication is Semiconservative: The Meselson-Stahl Experiment (1958)

Objective: Matthew Meselson and Franklin Stahl designed an experiment to definitively determine which of the three proposed DNA replication models (semiconservative, conservative, or dispersive) was correct.
Key Idea: They would allow parental DNA, labeled with a distinct density, to replicate using nucleotides of a different density. The density profiles of the newly replicated DNA after one and two rounds of replication would distinguish the models.
Methodology:
1. Labeling Parental DNA: E. coli cells were grown for many generations in a liquid medium containing the heavy isotope of nitrogen, ${}^{15} ext{N}$ , instead of the common light isotope, ${}^{14} ext{N}$ . Nitrogen is a component of all nitrogenous bases, so the cells incorporated ${}^{15} ext{N}$ into their newly synthesized DNA strands. After many divisions, the DNA was almost entirely labeled with ${}^{15} ext{N}$ (heavy).
2. Shift to Light Medium: The ${}^{15} ext{N}$ -labeled cells were transferred to a medium containing only ${}^{14} ext{N}$ and allowed to undergo DNA replication and cell division for one and two generations.
3. DNA Isolation and Analysis: After each generation, DNA was isolated from the cells.
4. Density Separation: DNA samples of different densities were separated using cesium chloride (CsCl) gradient centrifugation. When CsCl is spun at high speeds ( $50,000 ext{ rpm}$ ) for many hours, a density gradient of ${Cs}^{ ext{+}}$ and ${Cl}^{ ext{-}}$ ions forms. DNA, when centrifuged with CsCl, migrates and forms a band at the position corresponding to its own density.
Results and Interpretation (Figure 7-11):
- Initial DNA ( ${}^{15} ext{N}$ -labeled): Showed a single band of high density (heavy DNA).
- After 1st Generation ( ${}^{14} ext{N}$ medium): The DNA showed a single band of intermediate density. This finding:
  - Disproved the conservative model (which predicted two distinct bands: one heavy parental, one light daughter).
  - Was consistent with both the semiconservative and dispersive models (both predicted a hybrid or mixed-density product).
- After 2nd Generation ( ${}^{14} ext{N}$ medium): Two distinct bands were observed: one band of intermediate density and one band of low density (light DNA).
  - This strongly supported the semiconservative model (which predicted one hybrid molecule and one entirely light molecule per original hybrid molecule from the first generation).
  - This disproved the dispersive model (which predicted that all DNA molecules would still be of intermediate density, as parental segments would be spread across all strands).
Conclusion: The Meselson-Stahl experiment conclusively demonstrated that DNA replication occurs through a semiconservative mechanism.
Key Concept: DNA is replicated semiconservatively by unwinding the two strands of the double helix and building a new complementary strand on each of the separated original strands.

Evidence for a Replication Fork: The Cairns Experiment (1963)

Objective: John Cairns sought to determine the initiation sites and directionality of DNA replication on the bacterial chromosome (e.g., if replication started at one or many sites, and if these sites were random or defined).
Methodology: Autoradiography:
1. Radioactive Labeling: Cairns allowed replicating DNA in bacterial cells to incorporate tritiated thymidine ( $[{}^{3} ext{H}] ext{thymidine}$ ). Thymidine is a nucleoside labeled with tritium ( ${}^{3} ext{H}$ ), a radioactive hydrogen isotope. In the cell, thymidine is phosphorylated to a nucleotide and incorporated into newly synthesized DNA.
2. DNA Isolation and Emulsion Exposure: After varying numbers of replication cycles in the presence of ${}^{3} ext{H}$ -thymidine, Cairns carefully isolated the DNA and covered it with a photographic emulsion for several weeks. Autoradiography detects the location of ${}^{3} ext{H}$ in the DNA; as ${}^{3} ext{H}$ decays, it emits a beta particle (energetic electron) that creates a black spot on the film where it strikes the emulsion. Each incorporated ${}^{3} ext{H}$ -thymidine thus appears as a black spot.
Results and Interpretation:
- After one replication cycle in ${}^{3} ext{H}$ -thymidine: The autoradiograph showed a prominent ring of black spots (Figure 7-12a). This was interpreted as a newly formed, radioactive strand within a circular daughter DNA molecule. This observation supported the semi-conservative replication model and confirmed that the bacterial chromosome is circular.
- During a second replication cycle: Chromosomes captured in the midst of replication displayed a structure resembling the Greek letter theta ($ heta$) (Figure 7-12b). This structure consisted of a thin circle of dots (representing a single radioactive strand) and a thicker curve of dots cutting through the interior (representing two radioactive strands).
- Replication Forks: The ends of the thick curve of dots clearly defined two sites where DNA replication was actively occurring. These sites were termed replication forks.
Conclusion: The presence of various sizes of theta patterns suggested that bacterial DNA replication initiates at a single, specific site and that these two replication forks progressively move around the circular chromosome. Subsequent experiments further confirmed that replication initiates at a single, specific DNA sequence and proceeds bidirectionally (in opposite directions) from this origin, with both DNA strands being simultaneously replicated. This process is often referred to as theta replication.

7.4 DNA Replication in Bacteria

Overview of DNA Replication

DNA replication in bacteria is a highly coordinated process carried out by a multi-protein molecular machine called the replisome. Eukaryotic DNA replication follows similar steps and utilizes analogous enzymes (Table 7-2).

Unwinding the DNA Double Helix

Challenge of Unwinding: A major initial objection to the double helix model was the logistical challenge of unwinding the long, twisted DNA molecule and breaking its numerous hydrogen bonds quickly, without causing severe overwinding and tangling ahead of the replication point.
Molecular Solutions: The replisome contains specialized proteins to manage this challenge:
- Helicases: These enzymes are responsible for disrupting the hydrogen bonds that hold the two DNA strands together, effectively "unzipping" the double helix.
  - In E. coli, the DNA replication helicase (DnaB) is a ring-shaped homohexamer (a complex of six identical DnaB proteins). It encircles one of the single DNA strands at the replication fork.
  - Helicases utilize energy derived from ATP hydrolysis to rapidly unwind the double helix ahead of the progressing DNA synthesis (Figure 7-13b, step 1).
- Single-strand DNA-binding (SSB) proteins: After unwinding, these proteins bind specifically to the newly separated single-stranded DNA. Their role is to stabilize these strands and prevent them from reannealing (re-forming the double helix) prematurely.
- Topoisomerases (e.g., DNA gyrase): The unwinding action of helicases generates positive supercoils and extra twisting ahead of the replication forks (Figure 7-13b, step 1). Topoisomerases are enzymes that relieve this torsional strain by relaxing the supercoiled DNA.
  - They achieve this by temporarily breaking either a single DNA strand or both strands.
  - This break allows the DNA to rotate, releasing the strain.
  - Topoisomerases then religate (rejoin) the DNA strands, completing the relaxation process (Figure 7-13b, steps 2, 3, 4).
Key Concept: Helicases, topoisomerases, and single-strand-binding proteins work in concert to generate and maintain the single-stranded DNA templates necessary for DNA replication.

Assembling the Replisome: Replication Initiation

Replisome assembly is a highly regulated process that begins at a precise chromosomal location known as the origin of replication (or simply origin).

E. coli Origin (oriC): E. coli replication starts from a single origin, oriC, and then proceeds bidirectionally (with two replication forks moving in opposite directions) until the forks meet and replication is complete (as seen in Figure 7-12b).
- oriC is $245$ base pairs long.
- It contains five copies of $9$ -base-pair sequences called DnaA boxes, and an adjacent AT-rich DNA unwinding element.
Steps of Initiation (Figure 7-14):
1. DnaA Binding: The protein DnaA initially binds to the DnaA boxes within oriC. This binding facilitates the oligomerization (binding of additional DnaA copies) at the origin.
2. Helix Unwinding: Subsequent binding of DnaA proteins to the adjacent AT-rich region promotes the unwinding of the double helix at this site, forming a single-stranded DNA bubble.
  - Rationale for AT-rich region: A-T base pairs are held together by only two hydrogen bonds, making them easier to separate (melt) compared to G-C base pairs which have three hydrogen bonds.
3. Further Unwinding and Helicase Loading: More DnaA proteins bind to the newly unwound single-stranded regions. Then, two DnaB helicases are loaded and begin to slide in a $5' ext{-to-}3'$ direction, actively unzipping the helix at the growing replication forks.
4. DnaA Displacement: While DnaA is essential for initiating replisome assembly and bringing the machinery to the correct origin, it is not part of the core replication machinery and is displaced from the DNA as replication progresses.
Key Concept: The precise location and timing of DNA replication are strictly controlled by the ordered assembly of the replisome at a specific site called the origin.

DNA Polymerases Catalyze DNA Chain Elongation

Arthur Kornberg's work in $1959$ led to the isolation of the first DNA polymerase (DNA pol I) from E. coli, confirming enzymatic DNA synthesis.

General Polymerase Function: DNA polymerases extend a nucleotide chain by adding deoxyribonucleotides to the $3'$ end of a growing strand.
- They use a exposed single strand of DNA as a template.
Substrates: The substrates are the triphosphate forms of deoxyribonucleotides: dATP, dGTP, dCTP, and dTTP (collectively referred to as dNTPs).
Energy for Synthesis: The addition of each base to the growing polymer is coupled with the removal of two phosphates as pyrophosphate ( $PP<em>i$ ). The subsequent hydrolysis of pyrophosphate into two inorganic phosphate molecules (PPi
ightarrow 2P_i) releases energy that helps drive the DNA polymerization process.
DNA Polymerase I (DNA Pol I):
- Three Catalytic Activities:
  1. Polymerase activity: Synthesizes DNA in the $5' ext{-to-}3'$ direction.
  2. $3' ext{-to-}5'$ exonuclease activity: Removes mismatched nucleotides (proofreading function).
  3. $5' ext{-to-}3'$ exonuclease activity: Degrades single strands of DNA or RNA.
- Initial Suspicions and Later Clarification: While DNA pol I was the first discovered, scientists suspected it wasn't the primary replicative polymerase due to its slow speed ( $ext{approximately } 20 ext{ nucleotides}/ ext{second}$ , which would take $ext{about } 30 ext{ hours}$ to replicate the E. coli genome), high abundance ( $ext{approximately } 400 ext{ molecules}/ ext{cell}$ , more than needed for two forks), and low processivity (it dissociated after adding only $20$ to $50$ nucleotides).
DNA Polymerase III (DNA Pol III):
- John Cairns and Paula DeLucia's Experiment (1969): They demonstrated that an E. coli strain with a mutation in the DNA pol I gene, resulting in less than $1%$ of its activity, could still grow and replicate its DNA normally.
- Conclusion: This indicated that another DNA polymerase must be responsible for the bulk of DNA synthesis at the replication fork. This enzyme was later identified as DNA polymerase III.
Key Concept: DNA polymerases synthesize DNA in the $5' ext{-to-}3'$ direction, using a single-stranded DNA template.

DNA Replication is Semidiscontinuous

The Primer Problem: A fundamental challenge in DNA replication is that DNA polymerases can only extend an existing chain; they cannot initiate a new one. Therefore, synthesis must begin with a primer.
Primers: Short chains of nucleotides that form a segment of duplex nucleic acid, providing a free $3'-OH$ group for the DNA polymerase to extend from (Figure 7-16).
- Primase: Primers are synthesized by an RNA polymerase called primase, which is a central component of the primosome protein complex.
- Primase copies the template DNA in the $5' ext{-to-}3'$ direction, producing a short RNA primer (typically about $11$ nucleotides long).
- DNA pol III then takes over, extending off the $3'$ end of this RNA primer.
Leading and Lagging Strands (Figure 7-17): Because DNA polymerases only synthesize in the $5'-to-3'$ direction, and the DNA strands are antiparallel, replication proceeds differently on the two template strands:
- Leading Strand Synthesis: One of the two template strands is oriented such that its replication in the $5' ext{-to-}3'$ direction proceeds continuously, in the same direction as the movement of the replication fork. This is a smooth, uninterrupted process.
- Lagging Strand Synthesis: The other template strand runs in the opposite direction relative to the replication fork movement. Therefore, DNA synthesis on this strand must occur in short, discontinuous segments.
  - As the replication fork opens, new template DNA is exposed. Primase synthesizes an RNA primer for each segment.
  - DNA pol III then extends this primer in the $5' ext{-to-}3'$ direction.
  - These short, newly synthesized DNA stretches are called Okazaki fragments (named after Reiji Okazaki) and are typically $1000-2000$ nucleotides long.
Semidiscontinuous Replication: Due to the continuous synthesis on the leading strand and discontinuous synthesis on the lagging strand, the overall DNA replication process is described as semidiscontinuous.
Processing of Okazaki Fragments:
1. RNA Primer Removal: DNA pol I (Kornberg's enzyme) removes the RNA primers from the $5'$ end of each Okazaki fragment using its $5' ext{-to-}3'$ exonuclease activity.
2. Gap Filling: Immediately after primer removal, DNA pol I fills in the resulting gaps with DNA using its $5' ext{-to-}3'$ polymerase activity.
3. Fragment Ligation: Finally, DNA ligase catalyzes the formation of a phosphodiester bond, joining the $3'$ end of the newly synthesized DNA (filling the primer gap) to the $5'$ end of the adjacent, downstream Okazaki fragment. This seals the nicks and creates a continuous DNA strand.
Key Concept: DNA synthesis by DNA polymerase III requires an RNA primer, which is synthesized by the primase enzyme (an RNA polymerase).

DNA Replication is Accurate and Rapid

Accuracy (Fidelity): DNA replication is remarkably accurate, with an error rate typically less than one error per ${10}^{10}$ nucleotides.
- Proofreading Activity: A major contributor to this fidelity is the $3' ext{-to-}5'$ exonuclease activity possessed by both DNA pol I and DNA pol III (Figure 7-18).
  - This function acts as a "proofreader," excising any incorrectly inserted, mismatched bases immediately after they are added.
  - Once the mismatched base is removed, the polymerase has another opportunity to incorporate the correct complementary base.
  - Implication: Mutant strains lacking this $3' ext{-to-}5'$ exonuclease activity exhibit a significantly higher mutation rate.
- Primase Lacks Proofreading: Unlike DNA polymerases, primase does not have a proofreading function, making RNA primers more prone to errors. This is a critical reason why RNA primers must be removed and replaced with DNA by DNA pol I, ensuring high fidelity of the final DNA sequence.
- Post-replication Repair: Mismatches that escape the proofreading mechanisms are subsequently corrected by dedicated DNA repair pathways (discussed in Chapter 15).
Key Concept: DNA polymerases I and III have proofreading ( $3' ext{-to-}5'$ exonuclease) activity, but primase does not.
Speed: DNA replication is also incredibly fast. For example, E. coli replicates its chromosome (approximately $5$ million base pairs) in about $40$ minutes.
- This translates to a replication rate of about $2000$ nucleotides per second for the entire genome.
- Since E. coli uses two replication forks, each fork must move at a rate of about $1000$ nucleotides per second.
Maintaining Speed and Accuracy: The Replisome as a Molecular Machine: The remarkable feat of maintaining both speed and accuracy simultaneously is achieved through the replisome, a large, highly coordinated multi-protein complex.
- DNA pol III Holoenzyme: At the replication fork, the catalytic core of DNA pol III is part of a much larger complex called the DNA pol III holoenzyme, which comprises two catalytic cores and several accessory proteins.
  - One catalytic core is dedicated to synthesizing the leading strand, while the other handles the lagging strand.
  - The lagging strand template loops around, allowing both polymerase units to move in the same direction as the replication fork while coordinating synthesis (Figure 7-19).
  - Accessory proteins (not all visible in Figure 7-19) bridge the two catalytic cores, ensuring synchronized synthesis of both strands.
- Processivity through the $\beta$ clamp: DNA pol III's attachment to the DNA template is maintained by additional accessory proteins:
  - $\beta$ clamp (sliding clamp): This protein encircles the DNA like a donut.
  - Clamp loader ( $au$ complex): This protein complex assembles the $\beta$ clamps onto the DNA.
  - The $\beta$ clamp transforms DNA pol III from a distributive enzyme (which adds only about $10$ nucleotides before dissociating) into a processive enzyme (which remains associated with the moving fork and adds tens of thousands of nucleotides).
- Primase's Distributive Action: In contrast, primase, which synthesizes RNA primers, acts as a distributive enzyme, adding only a few ribonucleotides before dissociating. This is functionally appropriate since primers only need to be short starting points for DNA pol III.
Key Concept: The $\beta$ clamp converts DNA polymerase III from a distributive enzyme to a processive enzyme, allowing for rapid and continuous synthesis.
Key Concept: DNA synthesis is carried out by a molecular machine called the replisome, which includes two DNA polymerase units, coordinates unwinding, stabilizes single strands, and processes RNA primers.

7.5 DNA Replication in Eukaryotes

Comparison with Bacterial DNA Replication

While eukaryotic DNA replication shares fundamental principles with bacterial replication (semiconservative mechanism, leading and lagging strand synthesis), significant differences arise due to the larger genomes and linear chromosomes of eukaryotes.

Similarities: Both systems employ analogous enzymes and follow the same basic semiconservative mode of replication, utilizing leading and lagging strand synthesis (Table 7-2).
Key Differences & Complexities:
- Genome Size & Chromosome Structure: Eukaryotic genomes are vastly larger and organized into multiple linear chromosomes, unlike smaller, typically single, circular bacterial chromosomes.
- Replication Time: E. coli can replicate its chromosome in about $40$ minutes. Eukaryotic replication, however, can range from a few minutes to many hours, depending on genome size, the number of replication origins, and the cell type.
- Coordination: Eukaryotes must coordinate the replication of numerous chromosomes simultaneously.

Eukaryotic Origins of Replication

Yeast (Saccharomyces cerevisiae) Origins: Studies in yeast have been crucial for understanding eukaryotic origins. Yeast origins of replication are known as Autonomously Replicating Sequences (ARSs).
- ARSs are $100-200$ base pairs long and contain conserved DNA sequence elements, including an AT-rich element that melts upon initiator protein binding.
- They are functionally similar to oriC in E. coli.
Multiple Origins Strategy: To efficiently replicate their much larger genomes, eukaryotic chromosomes possess multiple replication origins.
- Yeast: Approximately $400$ ARSs are dispersed across its $16$ chromosomes.
- Humans: Have an estimated $40,000$ to $80,000$ origins distributed among their $23$ chromosomes.
- Replication proceeds bidirectionally from each of these numerous origins, with the elongating replication bubbles eventually merging to complete DNA synthesis across the entire chromosome (Figure 7-20).
Higher Eukaryote Origins: Origins in higher eukaryotes (like humans) exhibit greater complexity:
- They are considerably longer, potentially tens or hundreds of thousands of base pairs.
- They display limited sequence similarity, unlike the conserved sequences found in yeast ARSs or bacterial oriC.
- This lack of conserved sequence makes isolating and studying higher eukaryotic origins more challenging.
ORC Recognition in Higher Eukaryotes: While the Origin Recognition Complex (ORC) proteins are conserved across eukaryotes, the exact mechanism for ORC recognition of origins in higher eukaryotes is not fully understood.
- It is hypothesized that higher eukaryotic ORCs interact indirectly with origins, by associating with other protein complexes already bound to chromosomes, rather than recognizing specific DNA sequences directly.
- This indirect recognition mechanism might enable the regulation of the timing of DNA replication during S phase.
  - Euchromatin (gene-rich regions): These regions typically replicate early in S phase.
  - Heterochromatin (gene-poor, densely packed regions): These regions tend to replicate later in S phase (Chapter 12).
Key Concept: Yeast origins, like bacterial origins, have conserved DNA sequences recognized by ORC and other replisome assembly proteins. Higher eukaryotic origins are longer, more complex, lack conserved sequences, and are harder to study; their ORCs likely recognize origins indirectly.

DNA Replication and the Yeast Cell Cycle

Cell Cycle Restriction: In eukaryotes, DNA synthesis is strictly confined to the S (synthesis) phase of the cell cycle (Figure 7-21).
Control Mechanism: The onset of DNA synthesis is regulated by linking replisome assembly to specific cell cycle phases.
Replisome Assembly in Yeast (Figure 7-22):
1. ORC Binding: The Origin Recognition Complex (ORC) initially binds to specific DNA sequences at yeast origins (analogous to DnaA in E. coli).
2. Cdc6 Recruitment (Early G1): ORC serves as a landing platform to recruit Cdc6 protein to the origins during early G1 phase.
3. Helicase Loading: Together, ORC and Cdc6 then load a complex of Cdt1 and helicase onto the DNA. A second helicase-Cdt1 complex is subsequently recruited.
4. Activation and Polymerase Loading (Early S phase): Once the helicases are loaded onto the DNA in early S phase, Cdc6 and Cdt1 are released, and DNA polymerases are recruited and loaded.
Cell Cycle Linkage: This tight regulation is achieved through the availability of Cdc6 and Cdt1 proteins. They are synthesized during late mitosis (M phase) and G1 phase but are subsequently destroyed by proteolysis at the beginning of S phase. This ensures that new replisomes can only be assembled before S phase. Once replication has commenced, further replisome formation at origins is prevented because Cdc6 and Cdt1 are no longer available.

Telomeres and Telomerase: Replication Termination

The End-Replication Problem for Linear Chromosomes: Replication of linear eukaryotic DNA proceeds bidirectionally from multiple origins (Figure 7-20), replicating most of the chromosome. However, an inherent problem arises at the very ends of linear DNA molecules, regions called telomeres.
- Leading Strand: Can be synthesized continuously right up to the very tip of its template.
- Lagging Strand: Requires an RNA primer to initiate synthesis. When the last RNA primer at the absolute end of the lagging strand template is removed, there is no available $3'-OH$ group upstream for a DNA polymerase to extend from and fill the gap (Figure 7-23, "terminal gap").
- Consequence: This results in a short, unreplicated segment at the $5'$ end of the newly synthesized lagging strand. With each subsequent round of DNA replication, the chromosome would progressively shorten, leading to the loss of essential genetic information.
Eukaryotic Solution: Cells have evolved a specialized two-part system to prevent this loss:
1. Telomeres: The ends of chromosomes consist of simple, non-coding, repetitive DNA sequences (e.g., TTGGGG repeats in Tetrahymena, TTAGGG repeats in humans for $ext{10-15 kb}$ ). These repeats are lost with each replication instead of vital coding information.
2. Telomerase: An enzyme that adds these repeated sequences back to the chromosome ends, counteracting the shortening.
Key Concept: Telomeres stabilize chromosomes by preventing the loss of genomic information after each round of DNA replication.
Discovery of Telomerase: Discovered by Elizabeth Blackburn and Carol Grieder (with Jack Szostak) in Tetrahymena (a ciliate with many telomeres). They identified it as an enzyme specifically adding short repeats to $3'$ DNA ends.
Telomerase Composition and Mechanism (Figure 7-24):
- Ribonucleoprotein (RNP): Telomerase is an RNA-protein complex.
- Protein Component: A special type of DNA polymerase known as reverse transcriptase, which uses an RNA template to synthesize DNA.
- RNA Component: Telomerase carries its own internal RNA molecule that serves as the template for synthesizing the telomeric DNA repeats (e.g., in humans, the telomerase RNA contains the sequence $3'- ext{AAUCCC}-5'$ to template the $5'- ext{TTAGGG}-3'$ repeat).
- Steps:
  1. The telomerase RNA component anneals to the complementary $3'$ single-stranded DNA overhang at the chromosome end.
  2. Using its built-in RNA template and reverse transcriptase activity, telomerase extends the $3'$ DNA end (polymerization).
  3. The telomerase then translocates (moves) along the DNA, repositioning its RNA template.
  4. This cycle of elongation and translocation is repeated multiple times, progressively lengthening the $3'$ overhang.
  5. Once the $3'$ overhang is sufficiently extended, primase and conventional DNA polymerases use this extended overhang as a template to synthesize the complementary lagging strand, filling in the previous gap.
  6. The RNA primer is then removed, and DNA ligase seals any remaining nicks.
Telomeric Loop (t-loop): Telomeres protect chromosomal integrity by associating with specific proteins (e.g., WRN, TRF1, TRF2) to form a telomeric loop (t-loop) structure (Figure 7-25).
- This loop sequesters the $3'$ single-stranded overhang (which can be up to $100$ nucleotides long).
- Protective Role: Without t-loops, the cell's DNA repair machinery would mistakenly identify chromosome ends as dangerous double-strand breaks. Cells respond to double-strand breaks by fusing them, initiating cell cycle arrest (senescence), or programmed cell death (apoptosis) to prevent chromosomal instability, which can lead to cancer and aging phenotypes (Chapter 15).
Key Concept: Telomeres stabilize chromosomes by associating with proteins to form a structure that "hides" chromosome ends from the cell's DNA repair machinery.

Telomeres and Telomerase: Associations with Aging and Cancer

Telomerase Activity in Somatic vs. Germ Cells: Most human somatic cells produce very little or no functional telomerase. In contrast, germ cells (sperm and egg cells) typically have ample telomerase activity.
Telomere Shortening and Senescence: Because somatic cells lack sufficient telomerase, their chromosomes progressively shorten with each cell division. Eventually, this shortening can trigger crucial cell-cycle checkpoints, leading the cell to stop dividing altogether and enter a state of senescence.
Premature Aging Syndromes: Evidence suggests a strong link between telomere shortening and aging.
- Werner syndrome: Individuals with this premature-aging phenotype experience early onset of age-related conditions (e.g., skin wrinkling, cataracts, osteoporosis, graying hair, cardiovascular disease) (Figure 7-26). They have shorter telomeres due to mutations in the WRN gene, which encodes a helicase that associates with telomeric loop proteins (like TRF2). This disruption leads to chromosomal instability and signs of accelerated aging.
- Dyskeratosis congenita: Patients with this syndrome also exhibit shorter telomeres and harbor mutations in genes essential for telomerase activity.
Telomerase and Cancer: Geneticists are highly interested in the connection between telomeres and cancer.
- Cancer Cell Immortality: Unlike normal somatic cells, approximately $80%$ of cancer cells exhibit active telomerase activity. This ability to maintain telomere length is believed to be a key factor enabling cancer cells to divide indefinitely in cell culture and to be considered "immortal."
- Therapeutic Target: This distinction makes telomerase an attractive target for anti-cancer drug development. Many pharmaceutical companies are actively researching drugs that selectively inhibit telomerase activity in cancerous cells, aiming to trigger senescence or apoptosis specifically in these cells.
Key Concept: Telomeres and telomerase are significantly associated with processes of aging and the development of cancer.