Reverse Transcription and Integration - Vocabulary Flashcards

Introduction

  • Topic: Reverse Transcription and Integration in retroviruses; essential steps for replication and targets for antiretroviral therapy.
  • Nobel Prize context: Interaction between tumour viruses and the genetic material of the cell; Temin’s discovery of RNA-dependent DNA synthesis vs Baltimore’s RNA polymerase work; joint Nobel Prize in Physiology or Medicine in 1975 to Temin, Baltimore, and Temin’s collaborator Dulbecco for work on retroviral interaction with host genome.
  • Proviral DNA and reservoirs: Integrated proviral DNA can be transcriptionally silent for extended periods, creating viral reservoirs that can reactivate, underpinning persistence and challenges to cure (e.g., HIV-1).
  • Two key retroviral activities highlighted: reverse transcription (RT) and integration by integrase (IN).

Retroviridae and Genome Organization

  • Family name origin: Retroviruses use the enzyme reverse transcriptase (RT) to convert RNA genome to DNA (RNA→DNA), reversing the central dogma direction.
  • Retroviral genome basics:
    • All retroviruses carry three essential genes: gag (structural proteins), pol (enzymes; includes RT, integrase IN, protease PR), and env (envelope).
    • Simple retroviruses (e.g., avian leukosis virus, ALV) have only gag, pol, env.
    • Complex retroviruses (e.g., HIV-1) include these three plus additional non-structural genes; these extra products can influence replication and pathogenesis.
  • Gene expression:
    • Viral mRNAs and proteins produced via host transcription/translation machinery.
    • Gag products are expressed from unspliced transcripts as precursors and are proteolytically cleaved into mature proteins.
    • Env is expressed from spliced transcripts and cleaved into surface (SU) and transmembrane (TM) components.
  • Retrovirus structure:
    • Enveloped viruses; envelope contains SU and TM proteins attached to a lipid bilayer.
    • Inside the envelope: matrix (MA), nucleocapsid (NC), capsid (CA), enzymes RT, IN, PR, and the viral RNA genome.
    • Each viral protein is labeled with a two-letter designation.
  • Genome organization in particles:
    • Each virion contains two copies of the single-stranded positive-sense RNA genome (a 70S dimer) that associate as a dimer via base pairing at the 5’ ends.
    • In progeny virions, typically only one gene copy is carried forward during productive replication.
  • HIV-1 and complex retroviruses have additional gene content beyond basic gag-pol-env; reviews cover these extra genes and products.

Retrovirus Replication Overview

  • Core replication steps:
    • Binding/entry: virus fuses with host cell membrane via envelope–receptor interactions.
    • Reverse transcription: RNA genome converted to double-stranded proviral DNA by RT within the preintegration complex.
    • Nuclear import and integration: proviral DNA integrated into host genome by integrase (IN).
    • Transcription/translation: host machinery expresses viral mRNAs and proteins.
    • Genome packaging and assembly: progeny virions assemble at the cell membrane and bud.
  • Key implication: Integration into host genome is central to persistence and pathogenesis; integrated proviral DNA can serve as a reservoir for reactivation.
  • Short note on pathology: HIV-1 reservoirs contribute to immune evasion and chronic infection; eradication/cure remains challenging.

Retroviral Reverse Transcription (Section 2: Retrovirus Reverse Transcription)

  • Reverse transcription purpose: RT converts the single-stranded RNA genome into double-stranded proviral DNA.
  • Background: Viruses in Retroviridae use RT; retroviruses are named for this reverse transcription step.
  • Basic RT components:
    • Viral RNA genome (template).
    • Primer: a host cell tRNA bound to a specific primer binding site (pbs) on the viral genome; tRNA serves as primer for first DNA synthesis.
    • RT enzyme: delivered in roughly ~ ext{50--100} copies per virion; performs RNA→DNA synthesis and RNase H activity.
  • Genome packaging and primer details:
    • Each virion carries ~ ext{2 copies} of the RNA genome, which dimerize via 5’ end interactions to form a 70S complex; NC (nucleocapsid) proteins coat the RNA and promote template exchanges during RT.
    • tRNA primers are partially denatured and bound to the PBS; specific tRNAs (e.g., tRNA^Trp in ASLV) are used as primers in a sequence-specific manner.
  • RT activities within the preintegration complex:
    • RNA-dependent DNA synthesis (primary activity).
    • Helicase-like unwinding to relieve torsional stress during DNA synthesis.
    • RNase H activity degrades RNA in RNA:DNA hybrids, enabling synthesis of the second DNA strand and template exchanges.
  • Reverse transcription is multi-step and involves template switches between the two RNA genomes in the virion (pseudodiploidy) to allow complementation if one template is damaged.
  • Structure of RT across retroviruses:
    • RT enzymes vary by virus: avian sarcoma/leukosis virus (ASLV) RT is a heterodimer with polymerase and RNase H domains in both subunits; HIV-1 RT is a heterodimer where only one subunit has RNase H and the other lacks integrase activity; murine leukemia virus (MLV) RT is a monomer with RNase H domain but no integrase activity.
  • Kinetics and fidelity of RT:
    • RT polymerization rate is slow relative to cellular DNA polymerases: about 1 ext{ to }1.5 ext{ nt s}^{-1}, leading to total synthesis times of roughly ext{4 hours} for HIV-1 proviral DNA, given genome length of ≳9{,}000 bases.
    • Fidelity is low: no proofreading/editing; misincorporations occur at roughly 1 ext{ error per } 10^{4} ext{ to } 10^{6} ext{ nucleotides}. For an HIV-1 genome (~10^{4} bases), this yields about one error per genome per replication cycle, driving rapid evolution and the emergence of quasispecies.
    • This high mutation rate underpins drug resistance via mutations that impair RT inhibitors while preserving enzymatic activity.
  • Bottom-line features of RT in retroviruses:
    • RT is the central enzyme for converting RNA to DNA; it has DNA polymerase, RNase H, and helicase-like activities.
    • RT fidelity is low, generating genetic diversity that fuels adaptation and drug resistance.
    • RT is a major target for antiretroviral therapy (ART).

The Reverse Transcription Process: Stepwise View (Steps 1–4)

  • Step 1: Initiation of (−) strand DNA synthesis
    • Primer: host tRNA binds to PBS at the 5’ end of the RNA genome; RT copies the 5’ end to generate a short DNA strand.
    • RNase H degrades the RNA template portion that has been copied, producing the initial short DNA product; this is called the negative strong-stop DNA (≈100 nucleotides).
  • Step 2: First Template Exchange
    • The 3' end of the nascent (−) DNA binds to the 3' end of the viral RNA (RNA template) at a complementary R region.
    • Copying continues toward the 5’ end of the genome; the RNA template is degraded as synthesis proceeds by RNase H.
  • Step 3: Initiation of (+) strand DNA synthesis
    • RNase H leaves behind a short RNA segment called the polypurine tract (PPT), about 13 ext{ to }15 bases, which serves as a primer for (+) strand DNA synthesis.
    • The positive strand DNA is extended through the tRNA-derived region (about 18 nucleotides) and through the tRNA primer region; elongation is blocked by a modified tRNA base, forming a positive strong-stop DNA.
    • The ppt and tRNA primer are degraded by RNase H; the negative-strand DNA extension continues by copying against the PBS region to pair with the newly synthesized positive strand DNA.
    • A second template exchange is prepared as the newly synthesized DNA ends anneal to the complementary PBS region on the positive strand.
  • Step 4: Second Template Exchange and Completion
    • The circular DNA construct formed by negative strand DNA and the positive strong-stop DNA is used as the starting point.
    • Positive strand synthesis proceeds from the PBS, while the negative strand is completed by displacement from the strong-stop DNA.
    • Result: a linear proviral DNA with identical long terminal repeats (LTRs) at both ends.
    • LTRs: multi-functional elements; 5’ LTR acts as promoter; 3’ LTR provides poly(A) tail sequences; LTR ends serve as binding handles for integrase during integration.
  • Outcome: production of a provirus capable of integration into the host genome and subsequent transcription/translation of viral genes.

Proviral DNA and Integration (Section 3: Retrovirus Integration)

  • Purpose of integration: covalent insertion of proviral DNA into the host genome by integrase; essential for persistent replication and transcriptional control by host RNA polymerase II.
  • Integrase (IN): encoded in the pol gene; packaged in virions with RT and PR; IN performs the covalent integration step.
  • Preintegration complex (PIC): includes proviral DNA, RT, integrase, capsid proteins, nucleocapsid proteins; responsible for transporting proviral DNA into the nucleus for integration.
  • Equimolar packaging: in productive replication, virions contain equimolar amounts of RT, IN, and PR.
  • Early clues about integration: end-terminal dinucleotide sequences of integrated proviruses (5’ TG and 3’ CA ends) point to specific processing by integrase and a conserved mechanism across retroviruses; homologous relationships to bacterial transposons and transposases supported an integrase-like activity.
  • In vitro integration assays: cell-free systems with purified integrase, cofactors (e.g., Mg^{2+}), short target DNA, and proviral DNA surrogates; used to study structure–function relationships and sequence requirements; panels A and B examine individual steps; panel C examines concerted integration (both ends).
  • The integration reaction is sequential and virus-specific; the preintegration complex must reach the nucleus for integration to occur.

The Integration Process: Stepwise Reactions (Steps 1–3)

  • Step 1: Processing reaction
    • Integrase removes two terminal dinucleotides from each end of the linear proviral DNA, generating 5' overhangs; only intact proviral DNA can be processed and integrated; this step requires both ends.
    • The processing depends on terminal LTR sequences, particularly the conserved 5'-CA-3' dinucleotide at the ends; short base-pair preferences exist within the first ~20 base pairs of LTR termini; sequence conservation varies by retrovirus (e.g., HIV-1 vs MLV vs ASLV).
  • Step 2: Joining reaction
    • Integrase makes staggered cuts in the host DNA at sites separated by about 4 ext{ to }6 base pairs; the 5' ends of proviral DNA are then ligated to the host DNA, creating partial gaps where the 5' ends are still unjoined (
      a gapped intermediate).
    • Integrase completes its catalytic role; following this, the enzyme dissociates from the DNA.
  • Step 3: Repair
    • Cellular DNA repair machinery fills in the single-stranded gaps and ligates the remaining ends, restoring a continuous double-stranded proviral-host DNA junction.
    • This repair is analogous to nonhomologous end-joining processes that follow double-strand breaks.
  • Proviral DNA organization after integration: proviral ends flanked by host DNA, with LTRs at both ends; integrated proviral DNA now relies on host transcriptional machinery to produce viral RNAs and proteins.

Integration Site Preferences and Host Interactions

  • Integration site determinants:
    • Integration site selection is only loosely dependent on primary host DNA sequence; there is some preference for weak palindromic motifs near ends of LTRs, variable by retrovirus identity.
    • Chromatin structure and DNA topology influence site selection: bends, underwinding, and nucleosome positioning influence where integration occurs.
  • Gene structure preferences:
    • HIV-1 tends to integrate within genes; MLV tends to integrate near transcription start sites.
    • Cellular factors can modulate integration preferences; for example, LEDGF/p75 (LEDGF) contributes to integration targeting toward active genes; loss of LEDGF reduces bias toward active genes.
  • Integrase structure and function:
    • IN is ~300 amino acids long and comprises three domains:
    • N-terminal domain (NTD): ~first 50 aa with zinc-chelating His and Cys residues; supports DNA binding.
    • Catalytic core domain (CCD): ~central region (~150 aa) containing the invariant catalytic motif D, D(35)E essential for catalysis.
    • C-terminal domain (CTD): last ~80–100 aa; important for DNA binding and multimerization.
    • Multimeric forms:
    • Dimers can complete processing in vitro but cannot perform all steps of integration alone.
    • Tetramers (intasomes) are required for concerted integration (both viral ends into target DNA).
    • Inner IN molecules are catalytically active in the tetramer; outer subunits’ roles are less clearly defined.
  • Cellular proteins participating in integration (~100 identified):
    • Partners include barriers to autointegration factor (BAF), LEDGF/p75 (LEDGF) and BRD family members, and other proteins involved in transcription, chromatin remodeling, and DNA repair.
    • BAF helps prevent autointegration (integration into other proviral DNA), acting as a pro-replication factor.
  • In vitro vs in vivo considerations:
    • In vitro assays provide mechanistic insights; in vivo integration is influenced by chromatin context, host factors, and nuclear trafficking of the preintegration complex.

HIV-1 Reservoirs and Pathogenesis Implications

  • Proviral reservoirs: integrated proviral genomes can persist in resting or latent cellular states; reactivation can occur under certain cellular conditions, leading to renewed viral replication even after apparent control.
  • Integration targeting and gene disruption: insertion into or near host genes can affect gene expression and contribute to pathogenesis or clonal expansion of infected cells.
  • Therapeutic implications: targeting integrase (IN) activity is a core component of antiretroviral therapy; understanding integration site preferences informs strategies to eradicate reservoirs and prevent persistence.

RT and Integration: Additional Context and Variants

  • Non-retroviral reverse transcription (retroid viruses):
    • Hepadnaviruses (e.g., hepatitis B virus) use reverse transcription to synthesize DNA from an RNA intermediate; these are non-retroviruses but share RT-based replication elements.
    • Endogenous proviruses in host genomes represent replication-defective vestiges of past retroviral infections.
  • Practical takeaway: RT and IN are unique to retroviruses and a central focus for therapy; understanding them also informs broader questions about genome evolution and host interactions.

Summary of Key Points

  • Retroviruses rely on two essential enzymatic activities: reverse transcription (RT) and integration (IN).
  • RT converts the RNA genome into double-stranded proviral DNA; the process is multi-step and involves template switching, RNase H activity, PPT, tRNA priming, and LTR formation.
  • RT fidelity is low, driving high mutation rates and the emergence of viral quasispecies and drug resistance.
  • Proviral DNA acquired by RT is integrated into the host genome by integrase, forming an enduring provirus flanked by LTRs and leading to viral gene expression via host transcription machinery.
  • Integration is a multi-step, enzyme-mediated process (processing, joining, repair) and is influenced by virus-specific end sequences, chromatin context, and host factors (e.g., LEDGF, BAF).
  • The proviral reservoir and integration targeting heavily influence HIV pathogenesis, persistence, and challenges to eradication.

Important Numbers and Formulas (LaTeX)

  • Genome length for HIV-1: L hicksim 9{,}000 ext{ bases}
  • RT polymerization rate: r allingdotseq 1 ext{ to } 1.5 rac{ ext{nucleotides}}{ ext{second}}
  • Time to complete reverse transcription (HIV-1): t hicksim 4 ext{ hours}
  • Genomic misincorporation rate: ext{error rate per nucleotide} hickapprox 10^{-4} ext{ to } 10^{-6}
  • Proviral DNA ends after processing: 5' ext{ overhangs} (requires intact proviral DNA)
  • PB s (primer binding site) ends: conserved region; PPT length: 13 ext{ to } 15 ext{ bases}
  • PPT-derived primer length: ext{approx. } 13 ext{--}15 ext{ nt}
  • PPT length relative to tRNA primer region: 18 ext{ nt} extension before elongation proceeds past tRNA region
  • End dinucleotides for integration ends: 5' - TG ext{ and } 3' - CA
  • Host DNA cleavage during joining: 4 ext{ to } 6 ext{ base pairs apart}
  • Integrase size: ext{≈} 300 ext{ amino acids}
  • Number of copies of RT per virion: ext{≈ } 50 ext{ to } 100 copies
  • Two viral genome copies per virion: 2 genomes per virion
  • Dimerization: 70S ribosome-like complex for genome packaging in virion particle
  • Integrase multimerization: tetramer (intasome) required for concerted integration

References and Further Reading

  • Primary sources: Nobel Prize 1975 for Temin and Baltimore work on tumor viruses and genetic material interaction.
  • Textbook reference: Principles of Virology, Volume I, Chapter 10 (5th Edition) and related reviews for HIV-1 and retroviral integration.
  • Online references cited in the source material: Nobel Prize summary (1975); RT and mutation references on general RT biology.