YH

Chapter 12: Genomes Overview

Chapter 12: Genomes

Core Concepts

  • Reflect on the Core Concepts of the chapter to deepen understanding.

12.3 DNA and Genome Sequencing

  • DNA replication principles: DNA sequencing involves processes similar to DNA replication but occurs in vitro.

    • Key components needed:
    • DNA polymerase
    • Primers
    • Deoxynucleotides (dNTPs)
    • Helicase and other factors from in vivo replication are not required.
  • Dideoxynucleotides (ddNTPs): Chain terminators used in DNA sequencing.

    • Differences between normal deoxynucleotides and ddNTPs:
    • Structure: ddNTPs lack a 3' OH group, preventing further nucleotide addition.
    • Functionality: Their incorporation stops the elongation of the DNA strand.
  • Example DNA strand:

    • Template: 5’-CTTCAG 3’-GAAGTCACCTCCCCTGAAACGAGGAA-5’
  • Primer Extension Scenarios:

    1. If A, G, T are dNTPs & C is ddC: Incorporate 10 nucleotides (including primer).
    2. If C, G, T are dNTPs & A is ddA: Incorporate 4 nucleotides.
    3. If A, C, T are dNTPs & G is ddG: Incorporate 2 nucleotides.
    4. If A, C, G are dNTPs & T is ddT: Incorporate 1 nucleotide.
  • Daughter strand lengths: Based on ratio of normal and ddNTPs in the mixture. E.g., for G with 85% dG and 15% ddG, the lengths produced may vary (8, 9, 11, 12, 13, 14, 20, 26).

  • Essential Ratio of ddNTPs to dNTPs: Keeping ddNTPs at a low proportion is crucial to avoid early termination of all strands. Too high a ratio leads to biased underrepresentation of certain sequences.

  • Fluorescent ddNTPs: Each type carries a unique fluorescent tag. Single daughter fragments generally do not carry more than one color due to termination by a ddNTP.

  • Purpose of Size Separation: Necessary to identify nucleotide sequences; shorter fragments indicate early sequencing events.

  • Sanger Sequencing Length: Typically about 1000 base pairs can be sequenced in a single reaction.

Genome Assembly Challenges

  • Definition of Genome: All genetic material; each cell contains two copies in somatic cells and one copy in gametes.

  • Sequencing challenge: Eukaryotic genomes are large (up to 10^9 base pairs). Short sequences must be compiled using methods like Sanger sequencing, which yield only a few hundred nucleotides at a time.

  • Shotgun Sequencing: A method to piece together complete genomes:

    • Break DNA into random fragments.
    • Sequence each fragment and assemble overlapping sequences to create consensus sequences.
  • Importance of Random Fragments: Overlapping sequences are necessary to reconstruct longer sequences accurately.

Sequencing Depth

  • Sequencing multiple times (10-50): Increases confidence in results and reduces errors.

  • Read Depth: This indicates how many times a specific region is sequenced; variables exist across the genome.

  • Improved Assembly with Related Genomes: Understanding a related genome can simplify the assembly process by providing reference points.

Repeated Sequences

  • Nature of Repeats: Repeated sequences complicate assembly since their length and location can vary, impacting accuracy.
  • Types of Repeats:
    • Dispersed: Scattered across the genome.
    • Tandem: Repeated in sequences next to one another.
    • Short repeats: Brief sequences repeated through the genome.

Genome Annotation

  • Functionality: Genome annotation categorizes sequences into functional elements, including genes and regulatory sequences.

    • Gene Structure: Typically includes promoters, exons, and introns.
    • Non-coding RNAs: Some essential for translation do not code for proteins.
  • Patterns for Recognition: Sequence motifs can identify important areas, such as promoters and open reading frames (ORFs). ORFs represent stretches of DNA that may code for proteins but do not guarantee functionality if not confirmed by additional evidence.

  • Transcriptome: mRNA sequences reveal which genes are expressed; comparing genomic DNA to mRNA facilitates understanding gene structure by indicating exon-intron relationships.

Genome Size & Complexity

  • Gene number vs. complexity: More genes do not necessarily correlate with biological complexity; different genomes can exhibit vast size disparities.
  • C-value Paradox: Refers to the lack of correlation between genome size and organismal complexity.
  • Polyploidy: Common in plants, it allows for more complexity despite similar or fewer gene numbers.