Genetics and Humans

Introduction to Human Genetics

Application of genetic concepts to human experiments, aimed at understanding human genetic variation, disease susceptibility, and inheritance patterns.
- Main Categories of Human Experiments:
1. Pedigrees - Used for tracking inheritance patterns within families and determining mutations underlying diseases by analyzing co-segregation.
2. Linkage disequilibrium - Utilized to identify specific genomic regions that are associated with a disease trait, inferring the proximity of disease-causing variants to known genetic markers.

Genetic Basics

Study of phenotypes based on genetic traits, examining how genes influence observable characteristics.
- Example:
- Purple eyes - associated with the allele pr, Short wings - associated with the allele vg
- Wild-type: Red eyes (PR+), Normal wings (VG+) -- these represent the dominant alleles.
- Given the cross:
- Wild-type (wt) X Purple eyes (pr) X Short wings (vg)
- Expected Phenotypes in F2 Generation:
- Expected ratios (e.g., Mendelian 9:3:3:1 for independently assorting dominant/recessive traits, or 1:1:1:1 for a test cross) are calculated based on whether the genes are linked or assort independently. Deviations from these ratios can indicate linkage.

Experiment Design

Questions:
- Why do or don we look like our parents?
- Design an experiment to find the genetic cause of Coronary Artery Disease (CAD).
- Challenges in applying Mendelian genetics in humans:
- Ethical concerns: Restrictions on experimental crosses, gene manipulation, and data collection due to privacy and human rights considerations.
- Long generations: The human life cycle (generation time) is prolonged, making multi-generational studies difficult, time-consuming, and expensive.
- Limited control over mating: Researchers cannot dictate breeding pairs, relying instead on existing family structures for genetic analysis.

Types of Human Studies

Cross-sectional Studies:
- Analyze individuals at one point in time to determine the prevalence of a trait or disease and its association with genetic factors.
- Pros:
  - Cost-effective and time-efficient for studying prevalence and associations; large sample sizes increase statistical power.
- Cons:
  - Cannot establish cause-and-effect relationships; historical variability in exposures or environments is not accounted for; difficulty in distinguishing multiple influences or temporal precedence.
Longitudinal Studies (Cohort Studies):
- Track individuals over time, observing changes in genetic traits or disease development in response to genetic and environmental factors.
- Pros:
  - Can establish temporal relationships (exposure precedes outcome); accounts for historical variability, environmental, and developmental changes; valuable for studying disease incidence and progression.
- Cons:
  - Expensive and complex to conduct over long periods; high attrition rates; hard to control for many confounding factors that change over time; susceptible to bias from repeated measurements.
Randomized Controlled Trials (RCTs):
- Involves two groups (experimental and control), with participants randomly assigned to receive an intervention or a placebo, to assess the effect of a specific variable.
- Pros:
  - Considered the gold standard for isolating variables and establishing causality; minimization of confounding through random assignment; blinding can reduce bias.
- Cons:
  - High cost and logistical complexity; often not feasible or ethical for genetic studies (e.g., assigning a disease-causing gene); typically focuses on a single variable or intervention.
Natural Experiments / Case Studies:
- Utilizes naturally occurring events or existing population structures (like families in pedigree analysis) to study the effects of genetic variations.
- Pros:
  - Allows for the isolation of variables by studying unique circumstances (e.g., families with rare diseases); can investigate rare diseases or specific inheritance patterns.
- Cons:
  - Small sample sizes limit the generalizability of findings; multiple influencing factors (genetic and environmental) can be difficult to control or isolate.

Pedigrees

Useful for:
- Determining inheritance patterns: Identifying whether a trait is autosomal dominant, autosomal recessive, X-linked (dominant or recessive), or mitochondrial inherited within a family.
- Probability of disease inheritance: Calculating recurrence risks for family members to inherit a genetic condition.
- Identifying genes linked to diseases: By tracking the co-segregation of a disease trait with known genetic markers across generations.
- Methods:
1. Identifying locations of mutations in the genome: Pinpointing specific chromosomal regions or genes where disease-causing variants potentially reside.
2. Establishing distances in the genome - Linkage Disequilibrium: Utilizing the non-random association of alleles at different loci to narrow down candidate regions for disease genes, based on their close physical proximity and shared inheritance over time.

Linkage Disequilibrium and Genetic Linkage

Genetic Linkage:
- Genes in close proximity on the same chromosome tend to be inherited together as a unit because recombination events between them are rare.
- This phenomenon is an exception to Mendel's law of independent assortment, which applies to genes located on different chromosomes or very far apart on the same chromosome.
Linkage Disequilibrium (LD):
- Refers to the non-random association of alleles at different polymorphic loci (e.g., a specific allele at one locus co-occurring with a specific allele at another locus more frequently than expected by chance).
- It is a measure of how tightly genetic markers and disease genes have been associated over evolutionary time and in a specific population.
- Useful in disease mapping: High LD between a marker and a disease allele suggests that they are physically close on a chromosome and have been inherited together for many generations without significant recombination. This allows researchers to use common genetic markers (like SNPs) to indirectly locate disease-causing genes.

Understanding Linkage and Recombination

Recombination Frequency: The proportion of recombinant gametes formed during meiosis. It is a direct measure of the genetic distance between two loci.
- If two genes are not linked: They assort independently, resulting in 50% parental and 50% recombinant gametes (recombination frequency = 0.5 or 50%). This represents the maximum observed recombination frequency for unlinked or very far apart genes.
- Linkage distance is measured in centiMorgans (cM).
- 1 cM implies 1% recombination probability between genes: This means for every 100 meiotic events, approximately 1 recombinant gamete will be produced between genes separated by 1 cM. For short distances, cM values approximate physical distance.
Calculation:
- If two genes are 1 cM apart, it means there is a 1% chance of recombination occurring between them during meiosis, and thus a 99% probability of them being inherited together from a parent.

Case Studies on Genetic Markers

Example of recombination in phenotypes (Test Cross):
- F1 Cross: Purple eyes, vestigial wings (pr_vg/PR+_VG+ -- heterozygous for both traits) X Purple eyes, vestigial wings (pr_vg/pr_vg -- homozygous recessive for both traits)
- Resulting phenotype counts (Hypothetical for Independent Assortment):
  - Red eyes, normal wings (PR+_VG+): 709
  - Purple eyes, vestigial wings (pr_vg): 709
  - Red eyes, vestigial wings (PR+_vg): 709
  - Purple eyes, normal wings (pr_VG+): 709
  - If the observed ratios were approximately equal (as shown with 709 for each), this would strongly suggest independent assortment of the genes.
Non-Homogenous Data:
- If there were significant deviations in these phenotype ratios (e.g., significantly higher counts for parental combinations and lower counts for recombinant types), this would indicate genetic linkage between the genes and allow for the calculation of recombination frequency.

Genetic Mapping Calculation

Mapping Units (cM or Morgan):
- Mathematically: \text{Recombination Frequency (RF)} = \frac{\text{Number of Recombinant Progeny}}{\text{Total Number of Progeny}} \times 100\%
- Example of Calculations:
  - If a test cross yields 380 parental progeny (e.g., carrying original parental allele combinations AB or ab) and 20 recombinant progeny (e.g., carrying new combinations Ab or aB) out of a total of 400 individuals, the recombination frequency would be calculated as: (20 / 400) \times 100\% = 5\% . This result indicates that the two genes are 5 cM apart on the chromosome.

Polymorphisms

Definition: Unique variable sites in genomes that can differ among individuals in a population. These variations contribute to individual genetic differences and serve as essential genetic markers.
- Types:
1. Restriction Fragment Length Polymorphisms (RFLPs): Variations in the presence or absence of a specific restriction enzyme recognition site, caused by a point mutation at that site. When DNA is cut with the enzyme, different fragment lengths are produced, detectable by Southern blotting.
2. Microsatellite Markers (SSLPs/STRs - Simple Sequence Length Polymorphisms/Short Tandem Repeats): Consist of short (2-6 base pair) DNA sequences repeated multiple times in tandem. The number of repeats is highly variable among individuals, making them highly polymorphic and useful in linkage analysis and forensic identification. Detected by PCR and gel electrophoresis.
3. Single Nucleotide Polymorphisms (SNPs): A variation at a single nucleotide position in a DNA sequence, where different individuals in a population may have different bases (e.g., A vs. G) at that specific site. SNPs are the most common type of genetic variation in humans and are extensively used in genome-wide association studies (GWAS) due to their abundance. Detected by various methods including DNA sequencing and microarrays.
Similar to alleles in functionality: Each variant form present at a polymorphic site can be considered an allele.
- Example of homozygous and heterozygous sequences for a SNP:
- Homozygous Sequence A (A/A): \dots G G C \underline{A} T A C C G \dots (Both chromosomal copies have 'A' at the SNP site)
- Heterozygous Sequence AB (A/T): \dots G G C \underline{A} T \underline{T} C C G \dots (One copy has 'A', the other has 'T' at the SNP site)
- Homozygous Sequence B (T/T): \dots G G C \underline{T} T T C C G \dots (Both copies have 'T' at the SNP site)

Application of Findings

By utilizing genetic polymorphisms (such as SNPs or microsatellites) that show strong linkage disequilibrium with a disease, researchers can narrow down the specific region on a chromosome that is likely to contain the disease-causing mutation. This then allows for more targeted sequencing and functional studies to pinpoint the actual causal gene and variant, contributing to understanding disease mechanisms and developing potential therapeutic strategies.