The Organization of the Genome
Basics of DNA and RNA Structure
- Nucleotide Components: DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are macromolecules composed of nucleotides. Each nucleotide consists of:
* A phosphate group.
* A sugar (Ribose in RNA; Deoxyribose in DNA).
* A nitrogenous base.
- Molecular Numbering and Bonding:
* Carbon atoms of the ribose/deoxyribose sugar are numbered 1 to 5.
* Covalent bonds form between the phosphate linked to the 5′ carbon of one nucleotide and the 3′ carbon of the sugar of the next nucleotide.
* This defines the directionality of the strand: a 3′ end and a 5′ end.
- Nitrogenous Bases:
* DNA: Adenine (A), Thymine (T), Guanine (G), Cytosine (C).
* RNA: Adenine (A), Uracil (U), Guanine (G), Cytosine (C).
- Strand Characteristics:
* DNA: Usually double-stranded. Hydrogen bonds form between complementary bases (A−T and C−G), resulting in a double helix structure.
* RNA: Usually single-stranded, but molecules can contain internal base pairs.
- Genome Size: The human genome contains approximately 3 billion base pairs (bp) of DNA.
- Information Flow Example:
* DNA Sequence: 5′ GAATGCCTCGGTACG 3′.
* Complementary DNA: 3′ CTTACGGAGCCATGC 5′.
* RNA Transcription: 5′ GAAUGCCUCGGUACG 3′.
* Translation (Met-Pro-Arg-Tyr): N− M P R Y −C.
Human Chromosomes: Number and Structure
- General Chromosomal State:
* Every normal human body cell is diploid (2n), containing 2×23 chromosomes.
* This consists of 2×22 autosomal chromosomes (autosomes) and sex chromosomes (XX for females, XY for males).
- Exceptions to Diploidy:
* Erythrocytes (no nucleus/DNA).
* Megakaryocytes.
* Cancer cells (often aneuploid or polyploid).
- Homologous Chromosomes: A pair of chromosomes where one is inherited from the father and one from the mother.
- Definitions of Ploidy:
* Haploid (n): Each chromosome present once (e.g., germ cells like egg and sperm). The haploid human genome is roughly 3 billion bp.
* Diploid (2n): Each chromosome present twice (e.g., somatic/body cells).
* Polyploid: Terms include tetraploid, hexaploid, etc. Example: Epulopiscium fishelsoni possesses up to 200,000n.
- C-value: The amount of DNA in a species' haploid genome.
- Mitotic Chromosome Structure:
* Chromatid: One arm of the duplicated chromosome.
* Telomere: The end of the chromosome featuring a special protective structure.
* Centromere: The middle region where microtubules attach via the kinetochore during cell division.
Chromosomal Aberrations and Nomenclature
- Gene Dosage Effect: Trisomies and deletions are detrimental because the lower or higher expression of genes leads to a biological imbalance.
- Abnormal Chromosome Numbers:
* Triploidy (69 chromosomes): 69,XXX; 69,XXY; 69,XYY. Extremely rare survival to term.
* Aneuploidy (Numerical variations):
* Trisomy 21 (Down syndrome): Survival may reach approximately age 40.
* Trisomy 13 and 18: May survive to term.
* 47,XXX: Female.
* 47,XXY: Male (Klinefelter syndrome).
* 47,XYY: Male.
* 45,X: Female (Turner syndrome).
- Karyotyping and Visualization:
* Karyotype: An organized profile of a person's chromosomes.
* Process: Samples (blood, amniotic fluid, tissue) are cultured. Colchicine is added to break down the mitotic spindle and arrest growth. Cells are fixed, treated with trypsin, and stained.
* G-banding: The most common staining method using Giemsa. The resulting pattern is characteristic and conserved.
- Position Nomenclature (ISCN):
* Example: 7q31.2 denotes Chromosome 7, long arm (q), region 3, band 1, subband 2.
- Advanced Mapping Techniques:
* Spectral Karyotyping (SKY): Chromosomes are labeled with distinct fluorescent probes (different colors), allowing high-resolution detection of translocations and deletions.
* Chromosome Territories: SKY reveals that chromosomes occupy distinct, mutually exclusive places in the interphase nucleus.
* Array CGH (Comparative Genomic Hybridization): Compares healthy vs. tumor DNA; used in prenatal diagnosis and SNP microarrays.
Extranuclear and Extrachromosomal DNA
- Endosymbiotic Theory: Mitochondria and chloroplasts are descendants of free-living bacteria.
- Mitochondrial DNA (mtDNA):
* Found in all eukaryotes; circular structure.
* The human mitochondrion contains 1 circular chromosome of approx. 17kb.
* Encodes 37 genes: 13 proteins, 22 tRNAs, and 2 rRNAs.
* Exists in 2−10 copies per mitochondrion.
* Inheritance: Exclusively maternal. Useful for evolutionary research.
* Disease: Mutations can lead to genetic diseases like Leber hereditary optic neuropathy (LHON).
- Chloroplast DNA (cpDNA): Found in plants; circular structure.
DNA Packaging and Chromatin
- Physical Constraints: DNA diameter is 2nm; the length of haploid DNA is approx. 1m per cell. Packaging is required for stabilization and regulation.
- Chromatin Composition:
* 1/3 DNA.
* 1/3 Histones.
* 1/3 Non-histone proteins.
- Histones:
* Group of basic proteins: H2A, H2B, H3, H4 (core histones) and H1 (linker histone).
* Highly conserved and rich in positively charged Arginine and Lysine.
* Interact with negatively charged DNA to coil it.
- The Nucleosome:
* Formed by an octamer of 8 histones (2 each of H2A, H2B, H3, H4).
* DNA wraps around the nucleosome ("beads on a string"), providing 6× compaction.
* 30 nm Fiber: Histone H1 induces tighter wrapping to form this fiber.
- Higher-Order Packaging:
* Loops are formed and bound to a protein scaffold (containing topoisomerases and condensins).
* Scaffold Associated Regions (SARs): Specific DNA sequences that determine attachment to the scaffold.
- Chromatin States:
* Heterochromatin: Densely stained, DNA is highly condensed, and gene expression is low.
* Euchromatin: Less condensed, DNA is accessible, and gene expression is higher.
X-Chromosome Inactivation
- Function: Overcomes the gene dosage effect in females.
- Barr Body: The inactivated X-chromosome appears as a dense mass.
- Mechanism:
* The Xist (X-inactivation specific transcript) gene encodes an RNA molecule.
* RNA binds to the X-chromosome, leading to histone modification and heterochromatin formation.
- Characteristics: Occurs randomly in mammals during the early embryo stage.
- Example: The Tortoise shell cat, where different alleles (orange vs. black fur) on X-chromosomes are expressed in different cell patches due to random inactivation.
Genomic Content: Coding DNA
- Definition of a Gene: Traditionally, a DNA part coding for proteins. Specifically, the region of DNA transcribed as a single unit.
- Colinearity: The number of nucleotides in a gene is proportional to the number of amino acids in the protein, provided there is a continuous sequence.
- Gene Structure Components:
* Promoter: DNA sequence where proteins bind to initiate transcription (upstream).
* RNA-coding region: The sequence transcribed into RNA.
* Terminator: The site where transcription ends (downstream).
- RNA Genes: Some genes code for functional RNAs (rRNA, tRNA) that are never translated into proteins.
- Splicing:
* Introns: Non-coding regions within a gene; transcribed into pre-mRNA but spliced out in the nucleus.
* Exons: Coding sequences that remain to make mature mRNA.
* Purposes of Splicing:
1. Alternative Splicing: Produces several different proteins from a single gene.
2. Evolution: Exon shuffling creates new protein combinations.
3. Regulation: Introns often contain regulatory sequences.
- Regulatory DNA Regions:
* Enhancers/Activators: Increase transcription.
* Silencers/Repressors: Decrease/block transcription.
* Core Promoter: Contains the TATA box; binds TATA-binding protein and basal factors.
Genomic Content: Non-Coding DNA
- The Paradox: Higher organisms have a smaller percentage of protein-coding DNA.
* Human Genome: Only 1−1.5% codes for proteins.
* Conservation: 80% of non-coding DNA is conserved between humans and mice, suggesting functional importance.
* Transcription: Approx. 80% of the total genome is transcribed despite only 1.5% being protein-coding.
- Types of Non-Coding DNA:
1. Introns and Regulatory Sequences (24%).
2. Repetitive DNA (59%):
* Transposable Elements (44%): Includes Alu elements (10%).
* Repetitive DNA unrelated to transposons (15%): Includes Simple sequence DNA (3%) and Large-segment duplications (5−6%).
3. Unique Non-coding DNA (15%).
Functional and Regulatory RNAs
- Protein Synthesis/Replication: rRNA, tRNA, SRP-RNA, snRNAs (splicing), snoRNAs, and telomerase RNA.
- Regulatory RNAs:
* Long noncoding RNAs (lncRNA).
* microRNAs (miRNA).
* Small interfering RNAs (siRNA).
* Piwi interacting RNA (piRNA).
* Antisense RNA.
- RNA Interference (RNAi):
* A method to destroy mRNA with complementary RNA, preventing protein synthesis (gene silencing).
* Mechanism: Long dsRNA is cleaved by the Dicer enzyme into siRNA. siRNA is loaded into the RISC complex to target and cleave mRNA.
* Nobel Prize (2006): Awarded to Andrew Z. Fire and Craig C. Mello for this discovery.
Repetitive DNA and Forensic Applications
- Simple Sequence Repeats (SSRs) / Short Tandem Repeats (STRs):
* Repeats of 1−100bp (usually 2−7bp sequence motifs).
* There are over 10,000 STR loci in the human genome.
- Polymorphisms:
* Sequence polymorphism: Differences in the actual nucleotide sequence.
* Length polymorphism: Differences in the number of times a motif is repeated.
- DNA Fingerprinting:
* Based on different repetition numbers (alleles) per person.
* Single alleles are shared by 5−20% of people, but testing 8−15 loci makes the probability of a match between two unrelated individuals roughly 1:1012 to 1:1018.
* Exception: Monozygotic (identical) twins.
Transposons and Genome Duplication
- Transposons ("Jumping Genes"):
* DNA sequences that move within the genome (45% of the human genome).
* Discovered by Barbara McClintock (Nobel Prize 1983).
* DNA Transposons: "Cut and paste" mechanism; mostly silent in humans.
* Retrotransposons: "Copy and paste" via RNA intermediate.
* LINE (Long Interspersed Nuclear Element): Encodes reverse transcriptase; comprises 20% of human genome (e.g., LINE-1).
* SINE (Short Interspersed Nuclear Element): Requires LINEs for movement; (e.g., Alu elements).
- Genome Duplication and Pseudogenes:
* Genome parts duplicate during evolution. Duplicated genes can acquire new functions or become inactivated.
* Pseudogene ($\psi$ gene): A DNA sequence that resembles an ancestral gene but no longer encodes a functional protein.
* Gene Family: A group of related genes. Example: The Globin gene family includes alpha, beta, gamma, and delta chains, as well as myoglobin and neuroglobin.
DNA Replication Process
- Timing: Occurs during the S-phase of the cell cycle.
- Mechanism enzymes:
* Helicase: Unwinds the DNA double helix.
* Topoisomerase: Relieves over-winding strain ahead of the replication fork.
* Primase: Synthesizes an RNA primer; necessary because DNA polymerase cannot start de novo.
* DNA Polymerase: Adds complementary dNTPs to the 3′ end of an existing strand; works exclusively in the 5′→3′ direction.
* Ligase: Joins DNA fragments (e.g., Okazaki fragments) together.
- Leading vs. Lagging Strand:
* Leading strand: Synthesized continuously in the direction of the fork.
* Lagging strand: Synthesized discontinuously as Okazaki fragments in the opposite direction.
- Centromeres and Cohesins: Replicated chromosomes (sister chromatids) are held together by cohesin proteins until separation in anaphase.
Telomeres and Telomerase
- The End-Replication Problem: DNA polymerase cannot replicate the very ends of linear chromosomes. Telomeres shorten by 100−200bp per replication.
- Senescence: After 50−100 divisions (the Hayflick limit), cells stop dividing (regulated by p53 and Rb).
- Telomere Structure:
* Repeated sequence: TTAGGG (3−20kb length).
* Folds into a loop structure associated with protective proteins.
- Telomerase:
* An enzyme expressed in embryonic cells, germ cells (testis), and certain proliferating cells (bone marrow, skin, hair follicles, GI tract).
* A ribozyme containing an internal RNA template complementary to telomere repeats, allowing it to extend chromosome ends.
Regulation of the Cell Cycle
- Checkpoints:
* Restriction Point (G1/S): Influenced by growth factors, nutrients, cell size, and DNA damage.
* G2/M Transition: Influenced by cell size, DNA damage, and successful replication completion.
* Metaphase-Anaphase Transition: Influenced by proper chromosome attachment to the spindle.
- Molecular Regulators:
* Cyclins: Phase-specific proteins.
* Cdks (Cyclin-dependent kinases): Activated by binding to cyclins; they phosphorylate target proteins to drive the cell cycle.
- Correction: If DNA is damaged, the cell cycle arrests for repair. If fixed, mitosis proceeds with DNA condensation (prophase).
DNA Repair and Meiotic Recombination
- Double Strand Break (DSB) Repair mechanisms:
* Non-homologous end joining (NHEJ): Direct ligation of broken ends.
* Homologous recombination repair (HRR): Uses a template; occurs only during S and G2 phases. Involves exonucleases, strand invasion, and branch migration (Holiday junctions).
- Meiosis I: Homologous recombination occurs (crossing over) to create new combinations of genes on chromosomes.