1/57
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
what is bayes theorem
P(mother | farther, child) =
P(M | F, C)
P( child | mother, farther) P(mother | farther) / P( child | farther)
P(C | F, M)P(M | F) / P(C | F)
What is law of total probability
P(child=gg | farther=gg) =
P(child=gg | farther=gg, mother=gg)P(mother=gg | farther=gg)
+ P(child=gg | farther=gg, mother=Gg)P(mother=Gg | farther=gg)
+ P(child = gg | farther=gg, mother = GG)P(mother=GG | farther =gg)
but we know its impossible for mother to be GG is G is a domentent allele so we get
= P(child=gg | farther=gg, mother=gg)P(mother=gg)
+ P(child=gg | farther=gg, mother=Gg)P(mother=Gg)
the farther is removed from the end probabilities because we have no interbreeding
what is mendels law
P(child=gg | farther=gg, mother=gG) = ( ½ × ½ ) + ( ½ x ½ ) = ( ½ )
if Tay–Sachs is a recessive disease. and t is the disease allele and T is the normal allele, then what is the genotype and phenotype for having a disease.
a recessive disease means the T is a dominent allele.
meaning you must have genotype gg to get phenotype t, which is the disease.
what is hardy- weinbergs equilibrium
Hardy-Weinberg principle states that allele and genotype frequencies in a population remain constant across generations if there's no evolution (no mutation, genetic drift etc)
p, q, r and so on
what are homozygotes
AA or TT or tt
what are heterozygotes
Tt or AB or BC ya get me
What is the E step of an EM algorithm
estimating the genotype counts from the phenotypes using current allele frequencies, for example
calculating the missing data perhaps NAB(k) from NA x 2p(k)q(k) / 2p(k)q(k) + p(k)² if A were dominant over B and we know NA.
what is the M step in a EM algorithm
the maximising step where you updating the allele frequencies
so p^(k+1) = 2NAA + NAB / 2N
how do we get initial estimates of p, q, r in the EM algorithm
Berstein estimates / methods of moments
if we have n individuals where NB have phenotype BB, NBb have pt Bb and Nbb have pt bb then how do we find the maximum likelihood of estimator p
also what assumptions do we have to make
(NBB, NBb, Nbb)^T ~ Mu(n, p², 2pq, q²)
write the equations from the dis sheet ignoring all the ! bits
log
dif in terms of p and set =0
can do second div to check its less then 0 so its the max
Locus is in H-W equilibrium and that q=1-p
what makes something a gene-counting estimator
if it estimates the allele or genotype frequency by counting how many times each gene (allele) or genotype appears in the sample
give me the steps to show that just because two subgroups (fraction F in subpop 1) H-W equalibrium doesn’t mean that the whole pops does
freq of allele A in subpop i be pi
prop of allele A in whole pop = Fp1 + (1-F)p2
if whole pop in H-W we would have prop of genotype AA be (Fp1 + (1-F)p2)² = alpha
prop of AA in subpops is pi² so prop of AA in whole pop is actually Fp1² + (1-F)p2² = beta
consider a binary random variable Y = p1 with prob F and p2 with prob (1-F)
then E(Y²) = beta E(Y)²= alpha
using Var(Y) > 0 then we get beta - alpha > 0
so beta > alpha
under what condition would these subpops give a whole pop in H-W equalibrium
p1 = p2
what does this tell us about what happens when you have two such subpops
there is excessing homozygosity and decline in heterozygosity
what are the units of the mutation rate and why
1/time so that µδt is dimensionless
what doe exp(-2mut) =
by what reasoning
= 1 - 2mut + (2mut)² / 2! - (2mut)³/3!
= 1 - 2mut (imagine its wiggly equals)
this is the taylor expansion
if we have dna with length l, period of time t and number of positions with a change of nucliode since the start m
then what is the likelihood of t
P(same)^l-m P(different)^m whyyyyy,
if sites are indep then each site is a bernouli trial with m success and l-m failures
each site presents a difference with prob d and the same with prob s=1-d usually given in a transition matrix
then L(t) d^m s^l-m
in a bernoli trial the max likelihood of probability of success in the case above is what
d^ = number of success / number of trials
d^ = m / l
how can you use d^ to get t^
by the invariance of ML estimates you can just make the t in d, t^ in d^ and then rearrange
if a monkey and human have been evolving independently for j years since the comman ancestor then they’re total branch length =
and how does this link to t
2j
so j = ½ t
whats the test stat for testing a hypothesis to do with divergence and what do we compare it to
T = 2 log (L1/L0) = 2( Log(L1) - Log(L0))
z²(1) againnnnn
whats the dis of Tk
~ Expo ( k 2)
express the length of trees L in terms of Tk
L = sum(n, k=1) k Tk
(k 2) =
k(k-1)/ 2
what does the infinite sites model mean
Mutation occurs along branches of the tree as a poisson process of rate mu. when a mutation occurs it changes the nucleotide at a position in the sequence that has never changed before.
give me the law of iterated E and Var using S|L
E[S] = E[E[S|L]]
Var[S] = E[Var[S|L]] + Var[E[S|L]]
what is Wattersons estimator
theta^ = Sn / sum(n-1, k=1) 1/k
what is Tajimas estimator (also the pairwise difference)
(n 2)^-1 sum(n, i=1)sum(n, j=i+1)dij
where dij is the number of positions different between sample i and sample j
if we know B is dominant over b and we have a random sample size of N how do we find the maximum likelihood of p^
well we know
Nb ~ binomial(n, q²)
the log it, differentiate, set to 0, solve to find MLE of q and then use 1-q = p
(heads up that may be worth changing q² for q to do all the maths and then plug q² back in)
What are E and M in the EM algorithm
E = estimate the expected genotype counts/ frequencies of the missing data given current estimates of p
M = maximise the likelihood using these complicated geno counts
what exactly is a berstein estimate and how do you compute it
its a method of moments estimate
an example is you basically say
NZ = E[NZ] = N(r^²) when we have X Y and Z phenotypes and Z is recessive for both X and Y thus the only way to achieve Z is with zz meaning it has r² as the proportion, then we rearrange to get a estimate of r^
You then repeat this to find NX and NY to get estimates of p^ and q^
if you decide to use the EM algorithm to try to find maximum likelihood estimates of p, q and r. Write down the likelihood function that you are aiming to maximize. (where X is dominant over Y and Z and Y is dominant over Z)
we set up an MLE based on multinomial model with k = 3
so L(p,q,r) = multinomial (Nx, Ny, Nz)
= N!/Nx!….. (p² + 2pr + 2pq)^Nx (q²+2qr)^Ny (r²)Nz
when can you not distinguish H-W equalibrium or not
there is no test for these hypothesis when we only have phenotype data, the test fails. we need genotype counts
if we know that
P(C1= SS| M=SS, F=Ss) = 0.5
then what is P(C1= SS C2 = SS| M=SS, F=Ss)
or even all k children are SS
and why is this
= ( ½ )²
or ( ½ )^k
were using mendels and the fact that the children are independent transmissions
under the Wright–Fisher model without mutation, homozygosity satisfies the recurrence relation ?
two cases
decendants of the same ancector 1/2N
different ancestors 1-1/2N that happen to the be same (homozygosity of the previous generation) gt
meaning g(t+1) = 1/2N + (1/2N)gt
how are Allele proportion estimates effected by generations
they arent
they stay consistent
so the expected allele proportion should be the same in the original.
how can we apply the law of total probability for
(GC = RR | GD = QR , GM = QR, D= RR)
GC = grandchild
GD = granddad (moms side)
GM = grandmother (moms side)
D = dad
where Q is dominant over R
use law of total probability and factor in the mothers probability
basically of the form prob mum is blank due to grandparents multiplied by prob child is blank due to parents
= P(M=RR | GD, GM )P(GC = RR | M=RR, F=RR) +
P(M=QR | GD, GM )P(GC = RR | M=QR, F=RR) +
P(M=QQ | GD, GM )P(GC = RR | M=QQ, F=RR)(obvs this last line is 0)
by invariance of ml estimates what can we say
that r²^ = (r^)²
if r is a recessive allele how can we find the mle of r^
NR ~ bin(n, r²)
MLE of r² = NR/n using H-W equalibrium
using the invariance of ml estimates and that NR = n -NQ
r^ = sqrt[(n-NQ) /n ]
define heterozygosity
the probability that two gene copies sampled from the population are different
if were computing heterozygosity or homozygosity with mutation, whats a trick to consider when expanding out
that mu is very small and N is large so that we can reject/ignore mu² and mu/N
what is the mutation drift parameter theta
4 x mu x n
what is an expression for equilibrium heterozygosity
Hn − Hn−1 = 0
when Hn = Hn-1 we could say this happens a H so in our Hn − Hn−1 equation we let the H thingy be H and set equal to zero and rearrange
Watsons estimator =
Sn / sum(n-1, k=1) 1/k
where Sn is the number of segregated sites
and n= the number of samples A, B, C, D
what is a segregated site
a column where not all entries are the same
e.g position 0.13 has 0110110 then its a segregated site as it has a mixture of 0’s and 1s
what is Tajima’s π estimate
= sum(i>j) dij / (n 2)
where dij is the sum of difference between positions
so basically add up the number of difference between A and B, and A and C and B and C and for all of them basically!
n choose 2 =
n! / 2!(n-2)!
how can we use watsons (theta^) and tajimas estimates to work out effected population size
we know that mutation drift parameter is theta = 4Nmu
so we can do that
theta^ / 2mu or π / 2mu = 2N
half it to get the number of diploids
if we know we have sample size of 6, gen time of 28 years, and a effected population size of 30417. the what is the expected time to MRCA?
= E[TMRCA] x 30417 × 28
E[TMRCA]
= 2(1-1/n)
mutation occurs as a poisson process with (?)
(total length of branches(t) x theta ) / 2
like theta could be watsons
what makes incompatible sites
if any two columns i,j of the genotype matrix contain the pattern
00
01
10
11
then the corresponding sites are incompatible
for Hudson– Kaplan lower bound if out h= 0.2
is h ∈ (0.2, 0.7]
no h is not in it as ( means that 0.2 is not in the interval and hence we would set h = 0.7
what must happen at incompatible sites
there must be one recombination event between a pair of incompatible sites!