Genetic Diseases and Mutations
Principles of genetic Variation
Genetic variation is the difference between DNA sequences that are being compared
specific ones were found after the human
composed of library with large amount of people including twins
mutations are a large source of genetic variation and mutations are generally thought to be caused by external sources but it’s mostly the endogenous cell environment
most are not inherited unless early dev. of embryo or in germline cells
genome projects typically used a number of DNA donors with Craig Venten’s was the 1st whole genome
there are 2 main categories of genetic variation
changes that don’t affect the DNA content
there is no net gain or loss of nucleotides think single nucleotide replaced with another
changes in the copy number
think insertion or deletion of single or a small number of nucleotides
most common type of changes are small scale by larger variations (>50bo) have a larger affect
SNV (single nucleotide variation) → changes in a single nucleotide frequency <0.01; SNP Frequency>0.01
most common type of change in the genome
Indel →nucleotide is present of absent ie insertion or deletion variation (polymorphism) most of which are not new and come from evolutionary changes or ancestry
Variable tandem repats →sequences is repeated after another and generally unstable
DNA variants that differ in the number of tandem repeats
aka variable number of tandem repeats (VNTR) polymorphism
Satellite DNA: Array of 20 kb – hundreds of kb
Located at centromeres and other heterochromatic regions
Minisatellite DNA: Array of 100 bp – 20 kb
Primarily at telomeres and subtelomeres
microsatellite DNA: array of ~100 bp or >
repeat of ~2-4 nucleotides and is widely distributed
aka shor tandem repeats (STRs)
highly variable with several different alleles present in the population
used in forensics and high output PCR
Strand slippage→ happens during DNA replication where an STR is inserted or deleted during DNA replication
aren’t caught by the double checking mechanism cause of they are repeats
Balanced structural variation→ Same DNA content but the sequences are located in different positions
think translocations, inversions
SNP is an example of balanced genetic change
Unbalanced changes→ changes the copy numbers can result in diseases or metabolism phenotypes
can also change drug dosing
think CNV (copy number variations) and deletion or duplication
most variations has neutral affect cause proteins have multiple loci; 2 x amount of chromos; hard to actually change the amino acid that has an affect; only small amount is non coding and there is a large amount of redundancy in the genome
variation is beneficial when it increases survival or increases the fitness
becomes prevalent
think brain devleopment, cognitive function etc. for humans vs. gorillas
negative/ purifying selection happens when is leads to a lower fitness
different geographic regions, exposure to pathogens, and altered environments shaped the genome through selection
Infectious disease are considered to be one of the strongest selective pressures in human evolution
think the black plague those with a certain gene mutation were more likely to survive than those without it →Gene ERAP1/ ERAP2 has a C
positive change tracked with increase in those survived and increase in those who died
in gene study was ERAP 1 and 2 which helps T cell recognize toxins and their anitigens
changes in regulators is the most deferable in apes and humans are expectation is myostatin heavy chain 16→ changes jaw structure which increases skull capacity
context matters for natural selection
example people with higher HLPC expression that were to protect against animal pathogens but now people don’t have to worry about that so it’s not selected for now
can also be the reason why people allergies
UV radiation stimulates a phot lytic reaction in dermis that make vitamin D3 but also causes DNA damage (sun burn= cell death) so melaninated skin was needed
when people migrated north there was less sun which means less vitamin D3 this “lead” to the mutation in SLC24A5 gene a calcium transporter that regulates melanin that reduced skin pigmentation and become prevalent in Europeans
selective sweep→ positive selection for advantageous DNA can leave a signature in the genome
population variation is reduced at sequences immediately next to the gene variant
occurs when a mutation that is subject to positive selection increases in frequency to become the common allele and the adjacent variant hitchhike with it
chromosome with mutation is not passed down as a unit cause of recombination
also seen in SLC24A5 gene
changes to diet →high starch and consuming milk life long → ancestorial trait to digest lactose for nutrition till a kid a change in the promoter region of the gene (c→T) so that it turns on and makes lactose all the time
amylase in saliva is made by AMYIA→ breaks down starch and theres a copy number variations between chimps have 2; 2-15 diploid copy number CNV
more copies higher the enzyme concentration
used high res FISH to see how many copy og the genes there are
gene duplication →generations may have slightly different forms of a protein
positive selection means copy number stays the same over generations
largest protein coding gene is olfactory receptor family in humans which are used for food and not the environment
olfactory receptors are different in everybody each person makes about 500 variants this impacts food taste and preference
Single Gene Disorders: inheritance patterns, phenotype variation, and allele frequencies
a single locus can be primary determinant → like Mendel or Monogenic genetics
a particular genotype at a single locus is necessary and sufficient for a particular phenotype under normal circumstances
traits can be dominant or recessive
although rare single gene disorders are important contributors to disease and can be visualized using a pedigree
pedigree symbols to know
highlight: consanguineous mating → 1st or 2nd cousins married; diamond means unknown statecarriers may not always be shown in a pedigree
autosomal dominant →almost always heterozygous people are affected; unaffected individuals usually don’t transmit the disease and both sexes are affected equally
default assumption cause two means dead or in super serious conditions
think 2 people with dwarfism has kids if homozygote
usually parent is affected seen in every gen
achondroplasia is the most common cause of dwarfism
caused by mutation in FGFR3 that cause liganand independent activation of FGFR3 ie it is overactive
normally it reduces bone growth
typcially G→A
homozygous is lethal if heterozygous you got it and the other allele is compensating for the dysfunctional one
de novo mutations → new mutations in the parents and almost always come from the dad
occur in paternal germline and are associated with a high paternal age
comes from mutation of stem cells cause of age
autosomal recessive inheritance→ inherited in 2 genes aka 2 mutant alleles at the locus
one comes from each parent
keep in mind if parents are related is huge for autosomal genes suually 2nd cousins or 1st cousins and since genomes are related with 1/8th -1/32 high chance of having same allele
affects both sexes equally and two carriers have a 25% chance of having an affected child
for disorders with unknown inheritance, parental consanguinity is a strong indicator of autosomal recessive inheritance
consanguinity produces affected people with identical mutant alleles due to common ancestry
if recessive disorder is frequent in population there are a lot of carriers
two unrelated parents can carry different mutant alleles
compound heterozygous 2 different mutation present in the gene
typically, compound heterozygotes are phenotypically similar unless they are at different severities
Example of autosomal recessive disorder→ cystic fibrosis → mutations in CFTR gene encodes chloride channel and loss of CFTR function leads to abnormal fluid and electrolyte transport which results in common symptoms below
Failure to reabsorb chloride in the sweat
Depletion of airway surface liquid in the lung
Defective secretion of pancreatic enzyme
reduced fertility
x linked recessive disorders→ mostly males are affected born to unaffected parents
mother of affected male is carrier and has affected male relatives
IS NOT TRANSMITTED FROM DAD TO SON
dad with mutation will transfer the allele to all daughters who will be carriers
can
if mom is carrier and dad is unaddected → son has it; if next child is son there is 50% chance the son has it or is unaffected and if the next child is a female zero chance of having it and 50% chance of being a carrier
complications with inbreeding cause it makes it harder to tell if its Y linked or X-linked until a female gets the disorder
X- linked dominant disorders → one parent is affected and affected individuals can be male or female
higher chance of being an affected female than males
affected females typically have milder disease then males
if the father is affected all daughteres will have the disorder and none of the sons (depending on the mother) will be affected
if the mother is affected the transmission is equal in the sexes
in this case A causes disease
females can be heterozygous and unaffected cause of scilencing of the affected X
she has a genetic mosaic
males are constitutionally hemizygous (1 allele) and females are functionally hemizygous have 2 alleles but only use the one
Klinefelter’s syndrome will have 2+ silenced genes
consequence of genetic mosaic females that are affected will have partial functionally compared to males who are affected with 0/ defective products
25-20% of silenced genes escaped
skewed x inactivation→ the favoring of silencing 1 gene over another
Ex. if quite gene makes functional protein may be more favored for it not to be silenced
can occur cause inactivation of mutant X provides survival advantage to cells, inactivation of normal X provides advantage, and random chance
if x linked recessive disease and the mutant X is activated then bad luck if x linked dominant and the dominant allele is silenced then good luck
male lethality→ being homo/ hemizygous is so severe that the male dies but the x inactivation in males allows them to live
lethality in males → is X linked dominance disease
pseudoautosomal regions → 2 gene containing regions in common present on X and Y or X and X
don’t produce X and Y linked patterns
pair during male meiosis and undergo recombination, including obligate crossover in major PAR
contain ~29 genes
mutation will be autosomal
pseudo autosomal inheritance is like autosomal
gene pairs in pseudoautosomal regions are effectively alleles
copies are present on both X and Y and can move between chromosomes by recombination so don’t act X- linked or Y- linked
males can pass it on to either female/ male offspring
resembles autosomal inheritance ie dominant inheritance of a mutation in the PAR
one person can be affect but another won’t be cause of crossing over
Y linked inheritance→ male specific region of the Y chromosome makes 31 different proteins
proteins are involved in the normal development of testes, germ cells, and fertility
no known trait gives stereotypical Y-linked pedigree
Y chromosome infertility typically caused by new mutations
IVF fertility can’t get passed on for male infertility
pedigree features → only males are affected; exclusive father to son transmission
females have 2 allele for all genes in the pseudo autosomal regions both alleles are active; in males both are active and have an allele on each chromosome
mtDNA is more prone to mutation than nuclear DNA
disorders primarily affect tissues with high energy requirements think muscle, brain
not packed with histones and stuff it’s circular, prone to mutation from ROS
inheritance is matrilineal → affected individuals can be either sex but affected males don’t transmit the condition
most people with mtDNA disorder have a mix of normal and mutated mtDNA
aka they are heteroplasmy
homoplasy→ all the DNA is mutated (rare)
disease pheontype is only observed if the proportion of mutant exceeds threshold
typically 60-807 mutations
the ratio of normal to mutant mtDNA can change over a person’s lifetime due to relaxed replication of mtDNA
if most of the mtDNA is mutant means it sloppy replication
mtDNA disorders can have a highly variable phenotype within families due to variable heteroplasmy
oocytes contain >100.000 mtDNA molecules but during development go through a bottleneck stage with little mtDNA so the expansion of these few can create different levels of mutant mtDNA in the oocytes
Oocytes are immature egg cells found in the ovaries. They are the female germ cells that can develop into mature eggs through the process of oogenesis.
will see skipping of generation in the pedigree
if mom doesn’t have the disorder probably cause the amount of DNA passed on through the bottleneck was large
example of mtDNA disorder leigh syndrome→ severe neurological disorder that manifests in 1st year of life cause patches of damaged tissue develop in the brain leading to progressive loss of movement
dead by the age of 3
caused by mutation in more then 75 different genes that function in ATP production
~20% of people carry a mutation in mtDNA
most common mtDNA mutation affects mt-ATP6 which is apart of ATP synthase
can take mitochondria DNA from healthy eggs and is allowed in the UK and Australia but banned in Canada cause of ethical concerns
Genetic Variation and Disease causing abnormalities
Pathogenic DNA variation can cause disease in two broad ways:
Change in sequence of gene product
Total loss of function or reduced normal function
Altered or new function that is harmful (= gain of function)
Change in the amount of gene product through:
Altered copy number
large >50 bp; small change <50
most common is small scale changes.
more affective is when it’s a large change or a lot of small changes
Change in gene regulation
Premature termination codon
Loss of function mutations:→ final gene product is not produced/ produced at low levels or is made but doesn’t work properly
Results in recessive disease when both alleles are pathogenic variants
May sometimes result in dominant diseases with one pathogenic allele
Effects include haploinsufficiency and dominant-negative effects
Example: Cystic fibrosis
Caused by loss-of-function mutations in CFTR both of which enode for a chloride channel
normal stimulates water movement and produces thin, freely flowing mucus
Disease occurs when both alleles have a loss-of-function mutation bringing function close to zero
Haploinsufficiency: genes that are very sensitive to dosage and 50% is not sufficient for normal function
Reduction from 100% to 50% of gene product results in disease
Results in dominant disease observed in heterozygotes with 1 mutant allele
Example: Brachydactyly mental retardation syndrome→ associated with large deletions at 2q37
one copy remaining of the deletion for all genes means it’s down 50%
Caused by haploinsufficiency of HDAC4
causes intellectual and behavioral issues, short fingers and short toe
Dominant-Negative effects: NOT A DELETION
Mutation produces a non-functional mutant protein that inhibits the function of the normal protein
occurs because protein forms a multimer that becomes non-functional when mutant is incorporated
Disease results because there is <50% functional product
Example: Osteogenesis imperfecta→ brittle bones collagen is gone
depends on severity mild means there is 50% collagen present while severe will have an abnormal 1A1→3/4 polypeptide sequence baby will die in utero or 1-2 day after birth
Shows the genetic mechanism involving COL1A1 and COL1A2 mutations
Gain of function mutations:
Results in gene products with new or harmful functions
protein might not properly respond to regulatory signals
protein that exhibits inappropriate expression
eg: over expression ro expressed at wrong time in wrong tissue or in response to wrong signal
enzyme with alter target specify
protein that forms toxic aggregates
Leads to dominant disease even with one copy of the pathogenic mutation or other allele is producing the correct product
Example: Achondroplasia
Caused by mutations in FGFR3 leading to ligand-independent activation of signaling
Unstable expansion of repeats:
Some tandem repeat sequences can expand and result in disease aka dynamic mutations
disease results due to production of toxic protein or RNA that is harmful
is gain of function dominant disease
Anticipation:
Individuals with high repeat numbers are more severely affected
transmission of a mutant allele in a family can lead to increase repeat number and disease severity with each generation =Anticipation
Disease severity increases with each generation due to repeat expansion
can exhibit a premutation stage → repeat array has expanded to a size that is unstable
won’t result in disease but may readily increase in length causing disease in subsequent generations
Example: Huntington disease→ motor abnormalities, personality changes, gradual loss of cognition and death
CAG repeats 6-26 is normal
CAG repeats>40 is huntington disease
36-39 repeats is incomplete penetrance (ie may or may not have it)
27-35 repeats is risk of expansion
sperm is more tolerant to these changes
Symptoms and age of onset correlate with increased number of repeats
higher expansion when transmitted by father
pedigree with square and : inside to show symptoms cause they have more than 40 repeats
with dominate diseases can expect there to still be normal in kids
recessive diseases →always a loss of function
duplication think gain of function
difference between dominant negative→ interacting with another protein that’s good causing it to lose function→ key it functions in a dimer
DNA forensics
use DNA to find criminals, heritage and parents
OG way to do this DNA fingerprinting → different cut sites and banding leads to a lot of DNA
0.5 to a ml of sample is needed and is not well automated
needed digestion of DNA, gel electrophoresis and Southern blotting of fenomic DNA
probes hybridized to hypervariable minisatellites spread across the genome
found to prouce individual specific fingerprints
is super time consuming
PCR became gold standard with the use of microsatellites (STRs) cause little DNA is needed
STRs have variable amount of repeats in each person the more sites you compare the lower the chance that there are multiple people that have that DNA sequence in population
these repeats come from slippage and inheritance
king tut’s family→ sequenced with pcr to find patents/ others with number of repeats and lineage
PCR is touchy and sensitive cause DNA from researchers can interfere with the results
in this technique your mostly calculating probability (unless mismatch)→ can’t say for sure it’s X’s DNA cause you looking at some markers and not the whole genome
especially when DNA amount is small/ degraded
coduses (dataset and standard of gene markers in population) are base on the population
match probability is 1/10 trillion
new tech for DNA samples→ can PCR different samples at the same time and find results
can use hardy weinberg equation and multiplication to get frequency of the allele in population depending on allele and frequency present use propitiate part of equation then put value to -1 and get occurrence in population
hardy weinberg equation=p²( for AA) +2pq (for net)+q² (for aa)
then multiply all the net frequency’s for the replicates
inverse the value to get repeats in population
Familial DNA searching→ only works for close relatives
then they try to zone in on the killer using other factors or using y chromosome which has conserved DNA from father to son
based on IBDs or identical by descent → more closely related people will have greater IBD than more distantly related people
rec. DNA databases which is not in codus look for +700,000 SNPs using microarrays before SNP of interest
genomic DNA is amplified and fragmented, then hybridized to a microarray contain 100s of 1000s of probes attached to beads
probe sequence ends one base before the SNP of interest
single BP extension adds a single labelled base to the probe complementary to the SNP
DNA is then washed away and the signals are read
most DTC test don’t voluntarily cooperate with law enforcement but thrid party application that provide further analysis like GEDmatch and FAmilyTreeDNA do depending on the case and the person opting in or out of the program
y chromosome markers stay the same since they are transferred from father to son only and are germline cells
Journal article notes
What are DTC tests?
tests that are sold to consumer directly instead of under doctor recommendations
what type of info do they give and what information do they get?
get raw data and then go to databases to analyze them
raw data is the 700,000 SNP/ genetic variants
can give limited amount of analysis
difference between DTC and 3rd party services like GEDmatch?
3rd party services are used for further analysis of the data while DTC will get the raw data and give you an analysis
IBD segments
large sections of DNA that are retained with in a family as time passes these sections become smaller
the larger the sections are the more related the people are
segments become smaller during meiosis cause of gene cross over and recombination
long range familial searches
used by law enforcement to find suspects
public data bases are used to find long distance relatives of the suspect and narrowing it down to the criminal
relative must be in data base and this is what is referred to as a match
DTC databases look at more matches then those in the forensics database
was used to find the golden state killer
DTC and forensic databases ethical biases
forensic databases are mostly people of color and immigrants cause they are more likely to be arrested as suspects in crimes
DTC are more white and wealthy people cause these people want to do the tests and can afford it
if you are white or wealthy you are more likely to have a match in the DTC then people of colour
Figure 1a:
red line made using all public DTC databases and grey is only a portion so that they could get a confidence interval
x-axis shows relatedness to target
chance of finding a match increases when the relatedness decreases
allow for more variation that allows for more ancestors
Figure 1b
the larger the database is the easier it is to find a more closely related match
optimal fast plateau with small database size
Figure 2A:
match is the relative in the databases
people in red are the potential suspect with range of cM at the bottom of box
the average number of relative at the positions add in blue
add them together to get total number of suspects
Figure 2B:
how close or far way they are from each other to narrow down location and limit this radius according to the match and the scene of the crime
Figure 2C and D:
showed amount of matches and their age relative to the match →used to narrow down suspects
closer the resolution of year the less suspects present
Pipeline of these searches:
entire population→ gene match→ geography→ sex (gained from DNA sample)→ age (10 year resolution 16-17 people; 1 year resolution 1-2 people )
Main barrier of technique
it is not finding a match it’s the time and resources it takes to find a target
also requires info about the family tree that is public
Privacy concerns exist:
wanted to see how hard it would be to re-find the people in studies and their ancestry couple
found couple with in an hour and target with-in the day
anyone could do this and locate people used in studies
gene data isn’t classified as identifiable and should be; DTC companies should encrypt txt files to ensure that 3rd party is authorized to have access
Figure 3: 
black circle is the target person; dots are distant causes that are matched made by going back to find most recent common ancestor