The Organization of the Genome

Basics of DNA and RNA Structure

  • Nucleotide Components: DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are macromolecules composed of nucleotides. Each nucleotide consists of:     * A phosphate group.     * A sugar (Ribose in RNA; Deoxyribose in DNA).     * A nitrogenous base.
  • Molecular Numbering and Bonding:     * Carbon atoms of the ribose/deoxyribose sugar are numbered 11 to 55.     * Covalent bonds form between the phosphate linked to the 55^{\prime} carbon of one nucleotide and the 33^{\prime} carbon of the sugar of the next nucleotide.     * This defines the directionality of the strand: a 33^{\prime} end and a 55^{\prime} end.
  • Nitrogenous Bases:     * DNA: Adenine (AA), Thymine (TT), Guanine (GG), Cytosine (CC).     * RNA: Adenine (AA), Uracil (UU), Guanine (GG), Cytosine (CC).
  • Strand Characteristics:     * DNA: Usually double-stranded. Hydrogen bonds form between complementary bases (ATA-T and CGC-G), resulting in a double helix structure.     * RNA: Usually single-stranded, but molecules can contain internal base pairs.
  • Genome Size: The human genome contains approximately 33 billion base pairs (bpbp) of DNA.
  • Information Flow Example:     * DNA Sequence: 5 GAATGCCTCGGTACG 35^{\prime}\text{ GAATGCCTCGGTACG }3^{\prime}.     * Complementary DNA: 3 CTTACGGAGCCATGC 53^{\prime}\text{ CTTACGGAGCCATGC }5^{\prime}.     * RNA Transcription: 5 GAAUGCCUCGGUACG 35^{\prime}\text{ GAAUGCCUCGGUACG }3^{\prime}.     * Translation (Met-Pro-Arg-Tyr): N M P R Y CN-\text{ M P R Y }-C.

Human Chromosomes: Number and Structure

  • General Chromosomal State:     * Every normal human body cell is diploid (2n2n), containing 2×232 \times 23 chromosomes.     * This consists of 2×222 \times 22 autosomal chromosomes (autosomes) and sex chromosomes (XXXX for females, XYXY for males).
  • Exceptions to Diploidy:     * Erythrocytes (no nucleus/DNA).     * Megakaryocytes.     * Cancer cells (often aneuploid or polyploid).
  • Homologous Chromosomes: A pair of chromosomes where one is inherited from the father and one from the mother.
  • Definitions of Ploidy:     * Haploid (nn): Each chromosome present once (e.g., germ cells like egg and sperm). The haploid human genome is roughly 33 billion bpbp.     * Diploid (2n2n): Each chromosome present twice (e.g., somatic/body cells).     * Polyploid: Terms include tetraploid, hexaploid, etc. Example: Epulopiscium fishelsoni possesses up to 200,000n200,000n.
  • C-value: The amount of DNA in a species' haploid genome.
  • Mitotic Chromosome Structure:     * Chromatid: One arm of the duplicated chromosome.     * Telomere: The end of the chromosome featuring a special protective structure.     * Centromere: The middle region where microtubules attach via the kinetochore during cell division.

Chromosomal Aberrations and Nomenclature

  • Gene Dosage Effect: Trisomies and deletions are detrimental because the lower or higher expression of genes leads to a biological imbalance.
  • Abnormal Chromosome Numbers:     * Triploidy (69 chromosomes): 69,XXX69,XXX; 69,XXY69,XXY; 69,XYY69,XYY. Extremely rare survival to term.     * Aneuploidy (Numerical variations):         * Trisomy 21 (Down syndrome): Survival may reach approximately age 4040.         * Trisomy 13 and 18: May survive to term.         * 47,XXX: Female.         * 47,XXY: Male (Klinefelter syndrome).         * 47,XYY: Male.         * 45,X: Female (Turner syndrome).
  • Karyotyping and Visualization:     * Karyotype: An organized profile of a person's chromosomes.     * Process: Samples (blood, amniotic fluid, tissue) are cultured. Colchicine is added to break down the mitotic spindle and arrest growth. Cells are fixed, treated with trypsin, and stained.     * G-banding: The most common staining method using Giemsa. The resulting pattern is characteristic and conserved.
  • Position Nomenclature (ISCN):     * Example: 7q31.27q31.2 denotes Chromosome 77, long arm (qq), region 33, band 11, subband 22.
  • Advanced Mapping Techniques:     * Spectral Karyotyping (SKY): Chromosomes are labeled with distinct fluorescent probes (different colors), allowing high-resolution detection of translocations and deletions.     * Chromosome Territories: SKY reveals that chromosomes occupy distinct, mutually exclusive places in the interphase nucleus.     * Array CGH (Comparative Genomic Hybridization): Compares healthy vs. tumor DNA; used in prenatal diagnosis and SNP microarrays.

Extranuclear and Extrachromosomal DNA

  • Endosymbiotic Theory: Mitochondria and chloroplasts are descendants of free-living bacteria.
  • Mitochondrial DNA (mtDNA):     * Found in all eukaryotes; circular structure.     * The human mitochondrion contains 11 circular chromosome of approx. 17kb17\,kb.     * Encodes 3737 genes: 1313 proteins, 2222 tRNAs, and 22 rRNAs.     * Exists in 2102-10 copies per mitochondrion.     * Inheritance: Exclusively maternal. Useful for evolutionary research.     * Disease: Mutations can lead to genetic diseases like Leber hereditary optic neuropathy (LHON).
  • Chloroplast DNA (cpDNA): Found in plants; circular structure.

DNA Packaging and Chromatin

  • Physical Constraints: DNA diameter is 2nm2\,nm; the length of haploid DNA is approx. 1m1\,m per cell. Packaging is required for stabilization and regulation.
  • Chromatin Composition:     * 1/31/3 DNA.     * 1/31/3 Histones.     * 1/31/3 Non-histone proteins.
  • Histones:     * Group of basic proteins: H2AH2A, H2BH2B, H3H3, H4H4 (core histones) and H1H1 (linker histone).     * Highly conserved and rich in positively charged Arginine and Lysine.     * Interact with negatively charged DNA to coil it.
  • The Nucleosome:     * Formed by an octamer of 88 histones (22 each of H2AH2A, H2BH2B, H3H3, H4H4).     * DNA wraps around the nucleosome ("beads on a string"), providing 6×6\times compaction.     * 30 nm Fiber: Histone H1H1 induces tighter wrapping to form this fiber.
  • Higher-Order Packaging:     * Loops are formed and bound to a protein scaffold (containing topoisomerases and condensins).     * Scaffold Associated Regions (SARs): Specific DNA sequences that determine attachment to the scaffold.
  • Chromatin States:     * Heterochromatin: Densely stained, DNA is highly condensed, and gene expression is low.     * Euchromatin: Less condensed, DNA is accessible, and gene expression is higher.

X-Chromosome Inactivation

  • Function: Overcomes the gene dosage effect in females.
  • Barr Body: The inactivated X-chromosome appears as a dense mass.
  • Mechanism:     * The XistXist (X-inactivation specific transcript) gene encodes an RNA molecule.     * RNA binds to the X-chromosome, leading to histone modification and heterochromatin formation.
  • Characteristics: Occurs randomly in mammals during the early embryo stage.
  • Example: The Tortoise shell cat, where different alleles (orange vs. black fur) on X-chromosomes are expressed in different cell patches due to random inactivation.

Genomic Content: Coding DNA

  • Definition of a Gene: Traditionally, a DNA part coding for proteins. Specifically, the region of DNA transcribed as a single unit.
  • Colinearity: The number of nucleotides in a gene is proportional to the number of amino acids in the protein, provided there is a continuous sequence.
  • Gene Structure Components:     * Promoter: DNA sequence where proteins bind to initiate transcription (upstream).     * RNA-coding region: The sequence transcribed into RNA.     * Terminator: The site where transcription ends (downstream).
  • RNA Genes: Some genes code for functional RNAs (rRNA, tRNA) that are never translated into proteins.
  • Splicing:     * Introns: Non-coding regions within a gene; transcribed into pre-mRNA but spliced out in the nucleus.     * Exons: Coding sequences that remain to make mature mRNA.     * Purposes of Splicing:         1. Alternative Splicing: Produces several different proteins from a single gene.         2. Evolution: Exon shuffling creates new protein combinations.         3. Regulation: Introns often contain regulatory sequences.
  • Regulatory DNA Regions:     * Enhancers/Activators: Increase transcription.     * Silencers/Repressors: Decrease/block transcription.     * Core Promoter: Contains the TATA box; binds TATA-binding protein and basal factors.

Genomic Content: Non-Coding DNA

  • The Paradox: Higher organisms have a smaller percentage of protein-coding DNA.     * Human Genome: Only 11.5%1-1.5\% codes for proteins.     * Conservation: 80%80\% of non-coding DNA is conserved between humans and mice, suggesting functional importance.     * Transcription: Approx. 80%80\% of the total genome is transcribed despite only 1.5%1.5\% being protein-coding.
  • Types of Non-Coding DNA:     1. Introns and Regulatory Sequences (24%24\%).     2. Repetitive DNA (59%59\%):         * Transposable Elements (44%44\%): Includes Alu elements (10%10\%).         * Repetitive DNA unrelated to transposons (15%15\%): Includes Simple sequence DNA (3%3\%) and Large-segment duplications (56%5-6\%).     3. Unique Non-coding DNA (15%15\%).

Functional and Regulatory RNAs

  • Protein Synthesis/Replication: rRNA, tRNA, SRP-RNA, snRNAs (splicing), snoRNAs, and telomerase RNA.
  • Regulatory RNAs:     * Long noncoding RNAs (lncRNA).     * microRNAs (miRNA).     * Small interfering RNAs (siRNA).     * Piwi interacting RNA (piRNA).     * Antisense RNA.
  • RNA Interference (RNAi):     * A method to destroy mRNA with complementary RNA, preventing protein synthesis (gene silencing).     * Mechanism: Long dsRNA is cleaved by the Dicer enzyme into siRNA. siRNA is loaded into the RISC complex to target and cleave mRNA.     * Nobel Prize (2006): Awarded to Andrew Z. Fire and Craig C. Mello for this discovery.

Repetitive DNA and Forensic Applications

  • Simple Sequence Repeats (SSRs) / Short Tandem Repeats (STRs):     * Repeats of 1100bp1-100\,bp (usually 27bp2-7\,bp sequence motifs).     * There are over 10,00010,000 STR loci in the human genome.
  • Polymorphisms:     * Sequence polymorphism: Differences in the actual nucleotide sequence.     * Length polymorphism: Differences in the number of times a motif is repeated.
  • DNA Fingerprinting:     * Based on different repetition numbers (alleles) per person.     * Single alleles are shared by 520%5-20\% of people, but testing 8158-15 loci makes the probability of a match between two unrelated individuals roughly 1:10121:10^{12} to 1:10181:10^{18}.     * Exception: Monozygotic (identical) twins.

Transposons and Genome Duplication

  • Transposons ("Jumping Genes"):     * DNA sequences that move within the genome (45%45\% of the human genome).     * Discovered by Barbara McClintock (Nobel Prize 19831983).     * DNA Transposons: "Cut and paste" mechanism; mostly silent in humans.     * Retrotransposons: "Copy and paste" via RNA intermediate.         * LINE (Long Interspersed Nuclear Element): Encodes reverse transcriptase; comprises 20%20\% of human genome (e.g., LINE-1).         * SINE (Short Interspersed Nuclear Element): Requires LINEs for movement; (e.g., Alu elements).
  • Genome Duplication and Pseudogenes:     * Genome parts duplicate during evolution. Duplicated genes can acquire new functions or become inactivated.     * Pseudogene ($\psi$ gene): A DNA sequence that resembles an ancestral gene but no longer encodes a functional protein.     * Gene Family: A group of related genes. Example: The Globin gene family includes alpha, beta, gamma, and delta chains, as well as myoglobin and neuroglobin.

DNA Replication Process

  • Timing: Occurs during the S-phase of the cell cycle.
  • Mechanism enzymes:     * Helicase: Unwinds the DNA double helix.     * Topoisomerase: Relieves over-winding strain ahead of the replication fork.     * Primase: Synthesizes an RNA primer; necessary because DNA polymerase cannot start de novo.     * DNA Polymerase: Adds complementary dNTPs to the 33^{\prime} end of an existing strand; works exclusively in the 535^{\prime} \rightarrow 3^{\prime} direction.     * Ligase: Joins DNA fragments (e.g., Okazaki fragments) together.
  • Leading vs. Lagging Strand:     * Leading strand: Synthesized continuously in the direction of the fork.     * Lagging strand: Synthesized discontinuously as Okazaki fragments in the opposite direction.
  • Centromeres and Cohesins: Replicated chromosomes (sister chromatids) are held together by cohesin proteins until separation in anaphase.

Telomeres and Telomerase

  • The End-Replication Problem: DNA polymerase cannot replicate the very ends of linear chromosomes. Telomeres shorten by 100200bp100-200\,bp per replication.
  • Senescence: After 5010050-100 divisions (the Hayflick limit), cells stop dividing (regulated by p53p53 and RbRb).
  • Telomere Structure:     * Repeated sequence: TTAGGGTTAGGG (320kb3-20\,kb length).     * Folds into a loop structure associated with protective proteins.
  • Telomerase:     * An enzyme expressed in embryonic cells, germ cells (testis), and certain proliferating cells (bone marrow, skin, hair follicles, GI tract).     * A ribozyme containing an internal RNA template complementary to telomere repeats, allowing it to extend chromosome ends.

Regulation of the Cell Cycle

  • Checkpoints:     * Restriction Point (G1/S): Influenced by growth factors, nutrients, cell size, and DNA damage.     * G2/M Transition: Influenced by cell size, DNA damage, and successful replication completion.     * Metaphase-Anaphase Transition: Influenced by proper chromosome attachment to the spindle.
  • Molecular Regulators:     * Cyclins: Phase-specific proteins.     * Cdks (Cyclin-dependent kinases): Activated by binding to cyclins; they phosphorylate target proteins to drive the cell cycle.
  • Correction: If DNA is damaged, the cell cycle arrests for repair. If fixed, mitosis proceeds with DNA condensation (prophase).

DNA Repair and Meiotic Recombination

  • Double Strand Break (DSB) Repair mechanisms:     * Non-homologous end joining (NHEJ): Direct ligation of broken ends.     * Homologous recombination repair (HRR): Uses a template; occurs only during S and G2 phases. Involves exonucleases, strand invasion, and branch migration (Holiday junctions).
  • Meiosis I: Homologous recombination occurs (crossing over) to create new combinations of genes on chromosomes.