RNA molecules can fold into secondary structures that regulate gene expression. Examples include the terminator sequence in the 5' UTR of the trp operon (attenuation) and riboswitches. Some structures can be unwound by helicases, but others influence translation or transcription directly.
Found in bacteria, archaea, fungi, and plants (not animals). They are located in the 5' UTR of an mRNA. Riboswitches bind a ligand (regulatory molecule like ions, nucleotides, or amino acids). Upon binding, the structure of the mRNA changes (refolds), affecting transcription (may cause premature termination in prokaryotes) or translation (may hide or expose the ribosome binding site in prokaryotes and eukaryotes). Riboswitches contain an aptamer domain that binds the ligand.
Long non-coding RNAs that are complementary to target mRNA. They bind to mRNA to prevent translation (can span 50–1000+ nucleotides), and may originate from another gene or the non-template strand of the target gene. Found in both prokaryotes and eukaryotes.
Evolved as a defense against double-stranded RNA viruses. Key enzymes and complexes include:
Dicer: cuts dsRNA into small interfering RNAs (siRNAs) or microRNAs (miRNAs).
RISC (RNA-induced silencing complex): uses siRNAs/miRNAs to bind complementary mRNA, resulting in mRNA degradation or translation inhibition.
~21–25 nucleotides long, single-stranded RNA. Transcribed from different genes (not from the target). Binds to mRNA imperfectly, blocking translation without degradation. Functions in gene regulation.
~21–25 nucleotides, derived from the target gene itself. Binds perfectly to the target mRNA, causing mRNA degradation, leading to no translation. Can also act via RITS (RNA-induced transcriptional silencing) to recruit methyltransferase enzymes and methylate DNA → epigenetic silencing.
Scientists can artificially introduce dsRNA into cells or tissues. Dicer processes this RNA, which activates the RNAi pathway and is used for targeted gene knockdown in research or medicine. Effects are temporary and localized to the treated area.
Transcripts ≥200 nucleotides (can be over 10 kb). Can bind to DNA, RNA, or proteins to regulate gene expression. Example: Xist (X-inactive specific transcript), which coats one X chromosome in females, leading to heterochromatin formation → X inactivation. Inactivation is random (Lyon hypothesis) and happens in early development, which explains mosaic expression of X-linked genes.
Chromosomal variants are larger-scale mutations compared to point mutations and can be balanced (no net gain/loss of genetic info) or unbalanced (deletion/duplication = gene dosage change). Common forms include duplications, deletions, inversions, and translocations.
A chromosomal segment is copied and inserted. Types include tandem (adjacent to the original) and dispersed (elsewhere on the chromosome or genome). Can create copy number variants (CNVs) and paralogous genes. Increases gene dosage, which may alter phenotype. Paralogs are duplicated genes within a species that evolve new functions.
Loss of a chromosome segment, which may be visible on a karyotype if large, or lethal if homozygous. May expose recessive alleles in heterozygotes (pseudodominance).
A chromosome segment is flipped 180°. Types include paracentric (does not include centromere) and pericentric (includes centromere). In heterozygotes, crossing over within the inversion can produce unbalanced gametes (deletions/duplications) and infertility (3–5% of infertile couples carry inversions).
Movement of a segment to a nonhomologous chromosome. Types include reciprocal (exchange between chromosomes) and non-reciprocal (one-way movement). Can create complex pairing during meiosis, leading to unbalanced gametes. Robertsonian translocation occurs at or near centromeres of acrocentric chromosomes (common in humans).
Chromosomal breaks and faulty repair.
Unequal crossing over.
Transposable elements (TEs):
DNA transposons: cut and paste via transposase.
Retrotransposons: copy and paste via reverse transcription.
Multiple sets of chromosomes from one species (e.g., 3N, 4N). Common in plants; rare and often lethal in mammals. Leads to larger phenotypes due to gene dosage and sterility, especially in triploids.
Combines chromosome sets from two or more species. Initial hybrids are often sterile but may become fertile if chromosome pairing is preserved. Requires similar chromosome number and gene order (synteny).
Genomics is the study of the content, organization, function, and evolution of genetic material across entire genomes.
Structural Genomics: Focuses on the sequence and arrangement of genes within a genome.
Key questions: What is the full sequence of a genome? Where are genes located and how are they arranged?
Methods: DNA sequencing (e.g., Illumina), bioinformatics for assembly and annotation.
Products: Assembled genomes, gene annotations.
Functional Genomics: Explores how genetic variation influences phenotypic traits.
Key questions: What proteins and RNAs are encoded by genes? When and where are they expressed?
Methods: SNP analysis, RNA expression profiling, GWAS, statistical comparisons.
Products: Candidate genes linked to traits for further testing.
Comparative Genomics: Compares gene content and structure across species to understand evolutionary changes.
Looks at similarities and differences in genes and their organization among organisms.
A genome assembly is the most current version of the entire genome sequence of an organism.
Shotgun Sequencing: Tissue is collected, DNA is extracted, and fragmented. Short DNA pieces are sequenced using high-throughput technologies like Illumina.
Alignment of Reads: Short reads are aligned using sequence overlaps to build contigs (continuous sequences). Regions with high read depth (number of times a base is sequenced) have greater alignment confidence. Repetitive sequences are difficult to align due to similar overlapping regions.
Scaffold Construction: Contigs are aligned to known genetic markers and mapped onto chromosomes. Gaps in scaffold sequences are often denoted with “N”s.
Gene annotation identifies and describes genes and functional regions in the genome.
Uses bioinformatics to search for necessary gene components:
Open reading frames (ORFs): sequences starting with AUG and ending with TAA, TAG, or TGA, capable of coding proteins.
Regulatory motifs:
Prokaryotes: promoters, terminators, Shine-Dalgarno sequences.
Eukaryotes: TATA box, regulatory promoters, splice site signals (5' and 3'), poly-A signals, and CpG islands (often at gene 5' ends).
Uses known expressed genes or protein domains to predict new gene functions. Aligns genome sequence to mRNA/cDNA (e.g., from RNA-seq). Identifies only expressed genes in a given tissue. Can also detect conserved protein domains (e.g., zinc finger = DNA-binding protein).
Functional genomics identifies genetic variants associated with phenotypes.
Associate traits (phenotypes) with specific genetic variants.
Understand gene expression: what genes are turned on/off, when, where, and how much.
Choose contrasting groups (differ by phenotype):
Family-based studies: within the same lineage.
GWAS (Genome-Wide Association Studies): compare unrelated individuals.
Quantitative breeding experiments: cross individuals with trait differences to study offspring.
Characterize Genetic Variation:
Study SNPs (single nucleotide polymorphisms).
SNPs can be inherited or arise from mutations.
Used as markers, even if not causative.
Use Statistics to Find Associations:
Determine if a SNP is significantly more common in affected individuals.
Represented using Manhattan plots:
X-axis: SNP position on genome.
Y-axis: significance of association.
SNPs in close proximity may be associated due to linkage (inherited together).
Detect an individual's SNPs at thousands of known locations.
Method:
Single-stranded DNA probes fixed on a glass slide.
Sample DNA is fragmented, amplified, denatured, and hybridized to probes.
Fluorescently labeled nucleotides bind to reveal SNP identity.
Example: Affymetrix 6.0 DNA chip (906,600 SNPs on autosomes, X/Y, mitochondria).
No prior SNP knowledge needed. Whole-genome sequencing of an individual using Illumina. Align to a reference genome to find new SNPs and CNVs (Copy Number Variants) based on read depth.
Measures gene expression levels across samples.
Steps:
Extract RNA → reverse transcribe to cDNA.
cDNA labeled and hybridized to probes.
Fluorescent signal reflects transcription level.
Competitive hybridization can be used to compare expression between two samples.
Provides a quantitative and comprehensive look at transcription.
Steps:
RNA → cDNA via reverse transcription.
cDNA sequenced via Illumina.
Aligned to genome:
Shows which genes are expressed.
Reveals alternative splicing and exon usage.
Read depth = expression level.
Each point = a gene. X-axis: log-fold change in expression between groups. Y-axis: statistical significance. Shows both magnitude and reliability of gene expression differences.
Grid format with color-coding of expression intensity. Columns: samples (grouped by condition). Rows: genes. Color scale reflects expression level per gene per sample. Useful for detecting expression patterns and clusters.
Correlation ≠ Causation → test experimentally.
CRISPR-Cas9: Edit SNP or sequence to test its role in phenotype.
RNAi: Temporarily silence a gene to observe trait change.
Transgenic/Mutant Models: Introduce mutations or genes into organisms (e.g., mice, cell lines) to assess effect on phenotype.
Clinical genetic testing uses trait-associated variants to inform healthcare decisions.
Diagnosis of disease or identifying genetic basis of symptoms.
Determining severity of conditions.
Personalized medicine: Matching patients with treatments based on genetic profile.
Risk prediction: Identifying individuals at increased risk for future disease.
Most studies overrepresent European descent.
Lack of diversity limits the ability to generalize findings to other populations.
Important because allele frequencies, gene-environment interactions, and heterogeneity differ across populations.
Changes to genetic information (mutations) produce variation. Mutations accumulated across generations underlie current genetic diversity. Genetic variation patterns can be used to reconstruct evolutionary relationships.
Genetic study of populations involves tracking allele frequencies, understanding speciation, and using genetics to explore evolutionary relationships.
Population genetics studies how genetic composition changes over time in response to evolutionary forces. A population is a group of individuals of the same species capable of interbreeding. Populations may be physically isolated or may exist across a continuous distribution.
Used to characterize the genetic composition of populations. Comparing frequencies between populations allows detection of population differentiation. Tracking changes over generations helps detect evolution.
Determined from data when genotypes can be distinguished (e.g., codominance or using DNA sequences).
Can be calculated from observed genotypes.
A model that assumes an idealized population to predict genotype frequencies from allele frequencies.
No migration
Random mating
No natural selection
No mutation
Infinitely large population
Allele frequency: p + q = 1
Genotype frequencies: p^2 + 2pq + q^2 = 1
p^2 = frequency of homozygous dominant genotype
2pq = frequency of heterozygous genotype
q^2 = frequency of homozygous recessive genotype
Define a null hypothesis: genotype frequencies are as predicted under HWE.
Compare observed genotype counts with HWE expectations.
Significant differences suggest evolutionary change.
Real populations may not meet all assumptions of HWE. Genomic analysis (e.g., in a Japanese population) showed that only 0.63% of SNPs deviated significantly from HWE.
Each factor represents a real-world evolutionary force that changes allele or genotype frequencies.
Non-Random Mating
Mating is not random with respect to genotype at a given locus. Types include:
Assortative mating: individuals mate with others of similar phenotype/genotype.
Disassortative mating: individuals mate with those of dissimilar phenotype/genotype.
Inbreeding: mating between related individuals (a specific case of assortative mating).
Also called the fixation index. Represents the probability that two alleles are identical by descent. F ranges from –1 to 1.
Genetic Drift
Random changes in allele frequencies due to sampling error in small populations. More pronounced in smaller populations. Founder events and bottleneck events are major causes of drift.
Natural Selection
Occurs when individuals with different phenotypes have different survival or reproductive success.
Relative Fitness (W): W = \frac{\text{avg. number of offspring of genotype}}{\text{avg. number of offspring of most fit genotype}}
Values range from 0 to 1
Higher W = greater fitness
Selection Coefficient (s): s = 1 - W
Measures strength of selection against a genotype
Values range from 0 to 1
Higher s = stronger selection
Migration (Gene Flow)
Movement of alleles between populations. Effects: alters allele frequencies, prevents genetic divergence between populations, and increases genetic variation within populations. Absence of migration can lead to speciation (genetic isolation). Restoration of gene flow may reduce population uniqueness or help spread beneficial alleles (e.g., disease resistance).
Mutation
A change in DNA that results in a new allele. Key points: mutation is the ultimate source of all genetic variation and can be beneficial, neutral, or deleterious.
Probability that an allele is altered by a new mutation. Typical range: \mu = 10^{-5} \text{ to } 10^{-6} \text{ per gene per generation}. Especially common for loss-of-function mutations.