RNA-Based Gene Regulation
mRNA Secondary Structures
RNA molecules can fold into secondary structures, which can regulate gene expression. Examples include:
- Terminator sequence in the 5' UTR of the trp operon (attenuation).
- Riboswitches.
Some structures can be unwound by helicases, while others directly influence translation or transcription.
Riboswitches
- Found in bacteria, archaea, fungi, and plants (not animals).
- Located in the 5' UTR of an mRNA.
- Bind a ligand (regulatory molecule like ions, nucleotides, or amino acids).
- Upon binding, the structure of the mRNA changes (refolds), affecting:
- Transcription: May cause premature termination (in prokaryotes).
- Translation: May hide or expose the ribosome binding site (in prokaryotes and eukaryotes).
- Riboswitches contain an aptamer domain that binds the ligand.
Antisense RNA (asRNA)
- Long non-coding RNAs that are complementary to a target mRNA.
- Bind to mRNA to prevent translation (can span 50–1000+ nucleotides).
- May originate from another gene or the non-template strand of the target gene.
- Found in both prokaryotes and eukaryotes.
RNA Interference (RNAi)
- Evolved as a defense against double-stranded RNA viruses.
- Key enzymes and complexes:
- Dicer: Cuts dsRNA into small interfering RNAs (siRNAs) or microRNAs (miRNAs).
- RISC (RNA-induced silencing complex): Uses siRNAs/miRNAs to bind complementary mRNA.
- Binding results in mRNA degradation or translation inhibition.
miRNA (microRNA)
- ~21–25 nucleotides long, single-stranded RNA.
- Transcribed from different genes (not from the target).
- Binds to mRNA imperfectly, blocking translation without degradation.
- Functions in gene regulation.
siRNA (small interfering RNA)
- ~21–25 nucleotides, derived from the target gene itself.
- Binds perfectly to the target mRNA.
- Causes mRNA degradation, leading to no translation.
- Can also act via RITS (RNA-induced transcriptional silencing) to:
- Recruit methyltransferase enzymes.
- Methylate DNA → epigenetic silencing.
Experimental RNAi
- Scientists can artificially introduce dsRNA into cells or tissues.
- Dicer processes this RNA, which activates the RNAi pathway.
- Used for targeted gene knockdown in research or medicine.
- Effects are temporary and localized to the treated area.
Long Noncoding RNA (lncRNA)
- Transcripts ≥200 nucleotides (can be over 10 kb).
- Can bind to DNA, RNA, or proteins to regulate gene expression.
- Example: Xist (X-inactive specific transcript).
- Coats one X chromosome in females.
- Leads to heterochromatin formation → X inactivation.
- Inactivation is random (Lyon hypothesis) and happens in early development.
- Explains mosaic expression of X-linked genes.
Chromosomal Variants and Rearrangements
General Concepts
- Chromosomal variants are larger-scale mutations compared to point mutations.
- Can be:
- Balanced (no net gain/loss of genetic info).
- Unbalanced (deletion/duplication = gene dosage change).
- Common forms include duplications, deletions, inversions, and translocations.
Duplication
- A chromosomal segment is copied and inserted.
- Types:
- Tandem: adjacent to the original.
- Dispersed: elsewhere on the chromosome or genome.
- Can create copy number variants (CNVs) and paralogous genes.
- Increases gene dosage, which may alter phenotype.
- Paralogs: duplicated genes within a species that evolve new functions.
Deletion
- Loss of a chromosome segment.
- May be:
- Visible on a karyotype if large.
- Lethal if homozygous.
- May expose recessive alleles in heterozygotes (pseudodominance).
Inversion
- A chromosome segment is flipped 180°.
- Types:
- Paracentric: does not include centromere.
- Pericentric: includes centromere.
- In heterozygotes:
- Crossing over within the inversion can produce:
- Unbalanced gametes (deletions/duplications).
- Infertility (3–5% of infertile couples carry inversions).
Translocation
- Movement of a segment to a nonhomologous chromosome.
- Types:
- Reciprocal: exchange between chromosomes.
- Non-reciprocal: one-way movement.
- Can create complex pairing during meiosis, leading to unbalanced gametes.
- Robertsonian translocation: occurs at or near centromeres of acrocentric chromosomes (common in humans).
Mechanisms of Rearrangement
- Chromosomal breaks and faulty repair.
- Unequal crossing over.
- Transposable elements (TEs):
- DNA transposons: cut and paste via transposase.
- Retrotransposons: copy and paste via reverse transcription.
Polyploidy
Autopolyploidy
- Multiple sets of chromosomes from one species (e.g., 3N, 4N).
- Common in plants; rare and often lethal in mammals.
- Leads to:
- Larger phenotypes due to gene dosage.
- Sterility, especially in triploids.
Allopolyploidy
- Combines chromosome sets from two or more species.
- Initial hybrids are often sterile but may become fertile if chromosome pairing is preserved.
- Requires similar chromosome number and gene order (synteny).
Genomics
Genomics is the study of the content, organization, function, and evolution of genetic material across entire genomes.
Types of Genomics
- Structural Genomics: Focuses on the sequence and arrangement of genes within a genome.
- Key questions: What is the full sequence of a genome? Where are genes located, and how are they arranged?
- Methods: DNA sequencing (e.g., Illumina), bioinformatics for assembly and annotation.
- Products: Assembled genomes, gene annotations.
- Functional Genomics: Explores how genetic variation influences phenotypic traits.
- Key questions: What proteins and RNAs are encoded by genes? When and where are they expressed?
- Methods: SNP analysis, RNA expression profiling, GWAS, statistical comparisons.
- Products: Candidate genes linked to traits for further testing.
- Comparative Genomics: Compares gene content and structure across species to understand evolutionary changes.
- Looks at similarities and differences in genes and their organization among organisms.
Genome Assembly
A genome assembly is the most current version of the entire genome sequence of an organism.
Steps in Genome Assembly
- Shotgun Sequencing:
- Tissue is collected, DNA is extracted, and fragmented.
- Short DNA pieces are sequenced using high-throughput technologies like Illumina.
- Alignment of Reads:
- Short reads are aligned using sequence overlaps to build contigs (continuous sequences).
- Regions with high read depth (number of times a base is sequenced) have greater alignment confidence.
- Repetitive sequences are difficult to align due to similar overlapping regions.
- Scaffold Construction:
- Contigs are aligned to known genetic markers and mapped onto chromosomes.
- Gaps in scaffold sequences are often denoted with “N”s.
Gene Annotation
Gene annotation identifies and describes genes and functional regions in the genome.
Ab Initio Prediction
- Uses bioinformatics to search for necessary gene components:
- Open reading frames (ORFs): sequences starting with AUG and ending with TAA, TAG, or TGA, capable of coding proteins.
- Regulatory motifs:
- Prokaryotes: promoters, terminators, Shine-Dalgarno sequences.
- Eukaryotes: TATA box, regulatory promoters, splice site signals (5' and 3'), poly-A signals, and CpG islands (often at gene 5' ends).
Homology-Based Annotation
- Uses known expressed genes or protein domains to predict new gene functions.
- Aligns genome sequence to mRNA/cDNA (e.g., from RNA-seq).
- Identifies only expressed genes in a given tissue.
- Can also detect conserved protein domains (e.g., zinc finger = DNA-binding protein).
Functional Genomics
Functional genomics identifies genetic variants associated with phenotypes.
Key Goals
- Associate traits (phenotypes) with specific genetic variants.
- Understand gene expression: what genes are turned on/off, when, where, and how much.
How Trait Association Works
- Choose contrasting groups (differ by phenotype):
- Family-based studies: within the same lineage.
- GWAS (Genome-Wide Association Studies): compare unrelated individuals.
- Quantitative breeding experiments: cross individuals with trait differences to study offspring.
- Characterize Genetic Variation:
- Study SNPs (single nucleotide polymorphisms).
- SNPs can be inherited or arise from mutations.
- Used as markers, even if not causative.
- Use Statistics to Find Associations:
- Determine if a SNP is significantly more common in affected individuals.
- Represented using Manhattan plots:
- X-axis: SNP position on genome.
- Y-axis: significance of association.
- SNPs in close proximity may be associated due to linkage (inherited together).
Technologies to Detect Genetic Variants & Expression
DNA Microarrays (DNA Chips)
- Detect an individual's SNPs at thousands of known locations.
- Method:
- Single-stranded DNA probes fixed on a glass slide.
- Sample DNA is fragmented, amplified, denatured, and hybridized to probes.
- Fluorescently labeled nucleotides bind to reveal SNP identity.
- Example: Affymetrix 6.0 DNA chip (906,600 SNPs on autosomes, X/Y, mitochondria).
Genomic Resequencing
- No prior SNP knowledge needed.
- Whole-genome sequencing of an individual using Illumina.
- Align to a reference genome to find new SNPs and CNVs (Copy Number Variants) based on read depth.
RNA Microarrays
- Measures gene expression levels across samples.
- Steps:
- Extract RNA → reverse transcribe to cDNA.
- cDNA labeled and hybridized to probes.
- Fluorescent signal reflects transcription level.
- Competitive hybridization can be used to compare expression between two samples.
RNA Sequencing (RNA-seq)
- Provides a quantitative and comprehensive look at transcription.
- Steps:
- RNA → cDNA via reverse transcription.
- cDNA sequenced via Illumina.
- Aligned to genome:
- Shows which genes are expressed.
- Reveals alternative splicing and exon usage.
- Read depth = expression level.
Visualizing Gene Expression
Volcano Plots
- Each point = a gene.
- X-axis: log-fold change in expression between groups.
- Y-axis: statistical significance.
- Shows both magnitude and reliability of gene expression differences.
Heat Maps
- Grid format with color-coding of expression intensity.
- Columns: samples (grouped by condition).
- Rows: genes.
- Color scale reflects expression level per gene per sample.
- Useful for detecting expression patterns and clusters.
What Comes After Identifying Trait-Associated Variants
- Correlation ≠ Causation → test experimentally.
Experimental Validation Techniques
- CRISPR-Cas9: Edit SNP or sequence to test its role in phenotype.
- RNAi: Temporarily silence a gene to observe trait change.
- Transgenic/Mutant Models: Introduce mutations or genes into organisms (e.g., mice, cell lines) to assess effect on phenotype.
Genomics in Clinical Medicine
- Clinical genetic testing uses trait-associated variants to inform healthcare decisions.
Applications of Clinical Genomics
- Diagnosis of disease or identifying genetic basis of symptoms.
- Determining severity of conditions.
- Personalized medicine: Matching patients with treatments based on genetic profile.
- Risk prediction: Identifying individuals at increased risk for future disease.
Caveats in Genomic Medicine
- Most studies overrepresent European descent.
- Lack of diversity limits the ability to generalize findings to other populations.
- Important because allele frequencies, gene-environment interactions, and heterogeneity differ across populations.
Genetic Variation and Evolution
- Changes to genetic information (mutations) produce variation.
- Mutations accumulated across generations underlie current genetic diversity.
- Genetic variation patterns can be used to reconstruct evolutionary relationships.
Influence of Genetic Transmission, Expression, and Change on Populations
- Genetic study of populations involves:
- Tracking allele frequencies
- Understanding speciation
- Using genetics to explore evolutionary relationships
Population Genetics
- Population genetics studies how genetic composition changes over time in response to evolutionary forces.
- A population is a group of individuals of the same species capable of interbreeding.
- Populations may be physically isolated or may exist across a continuous distribution.
Genotype and Allele Frequencies
- Used to characterize genetic composition of populations.
- Comparing frequencies between populations allows detection of population differentiation.
- Tracking changes over generations helps detect evolution.
Quantifying Genetic Frequencies
Genotypic Frequency
- Determined from data when genotypes can be distinguished (e.g., codominance or using DNA sequences).
Allelic Frequency
- Can be calculated from observed genotypes.
Hardy-Weinberg Equilibrium (HWE)
A model that assumes an idealized population to predict genotype frequencies from allele frequencies.
Assumptions of HWE
- No migration
- Random mating
- No natural selection
- No mutation
- Infinitely large population
HWE Equations
- Allele frequency: p + q = 1
- Genotype frequencies: p^2 + 2pq + q^2 = 1
- p^2 = frequency of homozygous dominant genotype
- 2pq = frequency of heterozygous genotype
- q^2 = frequency of homozygous recessive genotype
Determining Whether a Population Is Evolving
- Define a null hypothesis: genotype frequencies are as predicted under HWE.
- Compare observed genotype counts with HWE expectations.
- Significant differences suggest evolutionary change.
Deviation from HWE in Real Populations
- Real populations may not meet all assumptions of HWE.
- Genomic analysis (e.g., in a Japanese population) showed that only 0.63% of SNPs deviated significantly from HWE.
Processes That Cause Deviation from HWE
Each factor represents a real-world evolutionary force that changes allele or genotype frequencies.
Non-Random Mating
- Mating is not random with respect to genotype at a given locus. Types:
- Assortative mating: individuals mate with others of similar phenotype/genotype.
- Disassortative mating: individuals mate with those of dissimilar phenotype/genotype.
- Inbreeding: mating between related individuals (a specific case of assortative mating).
Coefficient of Inbreeding (F)
- Also called the fixation index.
- Represents the probability that two alleles are identical by descent.
- F ranges from –1 to 1
Genetic Drift
- Random changes in allele frequencies due to sampling error in small populations.
- More pronounced in smaller populations.
- Founder events and bottleneck events are major causes of drift.
Natural Selection
- Occurs when individuals with different phenotypes have different survival or reproductive success. Key formulas:
- Relative Fitness (W): W = \frac{\text{avg. number of offspring of genotype}}{\text{avg. number of offspring of most fit genotype}}
- Values range from 0 to 1
- Higher W = greater fitness
- Selection Coefficient (s): s = 1 - W
- Measures strength of selection against a genotype
- Values range from 0 to 1
- Higher s = stronger selection
Migration (Gene Flow)
- Movement of alleles between populations. Effects:
- Alters allele frequencies.
- Prevents genetic divergence between populations.
- Increases genetic variation within populations.
- Absence of migration can lead to speciation (genetic isolation).
- Restoration of gene flow may:
- Reduce population uniqueness
- Help spread beneficial alleles (e.g., disease resistance)
Mutation
- A change in DNA that results in a new allele.
- Key points:
- Mutation is the ultimate source of all genetic variation.
- Can be:
- Beneficial
- Neutral
- Deleterious
Mutation frequency (μ):
- Probability that an allele is altered by a new mutation.
- Typical range: \mu = 10^{-5} \text{ to } 10^{-6} \text{ per gene per generation}
- Especially common for loss-of-function mutations