past papers

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/57

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

58 Terms

New cards

what is bayes theorem

P(mother | farther, child) =

P(M | F, C)

P( child | mother, farther) P(mother | farther) / P( child | farther)

P(C | F, M)P(M | F) / P(C | F)

New cards

What is law of total probability

P(child=gg | farther=gg) =

P(child=gg | farther=gg, mother=gg)P(mother=gg | farther=gg)

+ P(child=gg | farther=gg, mother=Gg)P(mother=Gg | farther=gg)

+ P(child = gg | farther=gg, mother = GG)P(mother=GG | farther =gg)

but we know its impossible for mother to be GG is G is a domentent allele so we get

= P(child=gg | farther=gg, mother=gg)P(mother=gg)

+ P(child=gg | farther=gg, mother=Gg)P(mother=Gg)

the farther is removed from the end probabilities because we have no interbreeding

New cards

what is mendels law

P(child=gg | farther=gg, mother=gG) = ( ½ × ½ ) + ( ½ x ½ ) = ( ½ )

New cards

if Tay–Sachs is a recessive disease. and t is the disease allele and T is the normal allele, then what is the genotype and phenotype for having a disease.

a recessive disease means the T is a dominent allele.

meaning you must have genotype gg to get phenotype t, which is the disease.

New cards

what is hardy- weinbergs equilibrium

Hardy-Weinberg principle states that allele and genotype frequencies in a population remain constant across generations if there's no evolution (no mutation, genetic drift etc)

p, q, r and so on

New cards

what are homozygotes

AA or TT or tt

New cards

what are heterozygotes

Tt or AB or BC ya get me

New cards

What is the E step of an EM algorithm

estimating the genotype counts from the phenotypes using current allele frequencies, for example

calculating the missing data perhaps NAB(k) from NA x 2p(k)q(k) / 2p(k)q(k) + p(k)² if A were dominant over B and we know NA.

New cards

what is the M step in a EM algorithm

the maximising step where you updating the allele frequencies

so p^(k+1) = 2NAA + NAB / 2N

New cards

how do we get initial estimates of p, q, r in the EM algorithm

Berstein estimates / methods of moments

New cards

if we have n individuals where NB have phenotype BB, NBb have pt Bb and Nbb have pt bb then how do we find the maximum likelihood of estimator p

also what assumptions do we have to make

(NBB, NBb, Nbb)^T ~ Mu(n, p², 2pq, q²)
write the equations from the dis sheet ignoring all the ! bits
log
dif in terms of p and set =0
can do second div to check its less then 0 so its the max
Locus is in H-W equilibrium and that q=1-p

New cards

what makes something a gene-counting estimator

if it estimates the allele or genotype frequency by counting how many times each gene (allele) or genotype appears in the sample

New cards

give me the steps to show that just because two subgroups (fraction F in subpop 1) H-W equalibrium doesn’t mean that the whole pops does

freq of allele A in subpop i be pi
prop of allele A in whole pop = Fp1 + (1-F)p2
if whole pop in H-W we would have prop of genotype AA be (Fp1 + (1-F)p2)² = alpha
prop of AA in subpops is pi² so prop of AA in whole pop is actually Fp1² + (1-F)p2² = beta
consider a binary random variable Y = p1 with prob F and p2 with prob (1-F)
then E(Y²) = beta E(Y)²= alpha
using Var(Y) > 0 then we get beta - alpha > 0
so beta > alpha

New cards

under what condition would these subpops give a whole pop in H-W equalibrium

p1 = p2

New cards

what does this tell us about what happens when you have two such subpops

there is excessing homozygosity and decline in heterozygosity

New cards

what are the units of the mutation rate and why

1/time so that µδt is dimensionless

New cards

what doe exp(-2mut) =

by what reasoning

= 1 - 2mut + (2mut)² / 2! - (2mut)³/3!

= 1 - 2mut (imagine its wiggly equals)

this is the taylor expansion

New cards

if we have dna with length l, period of time t and number of positions with a change of nucliode since the start m

then what is the likelihood of t

P(same)^l-m P(different)^m whyyyyy,

if sites are indep then each site is a bernouli trial with m success and l-m failures
each site presents a difference with prob d and the same with prob s=1-d usually given in a transition matrix
then L(t) d^m s^l-m

New cards

in a bernoli trial the max likelihood of probability of success in the case above is what

d^ = number of success / number of trials

d^ = m / l

New cards

how can you use d^ to get t^

by the invariance of ML estimates you can just make the t in d, t^ in d^ and then rearrange

New cards

if a monkey and human have been evolving independently for j years since the comman ancestor then they’re total branch length =

and how does this link to t

so j = ½ t

New cards

whats the test stat for testing a hypothesis to do with divergence and what do we compare it to

T = 2 log (L1/L0) = 2( Log(L1) - Log(L0))

z²(1) againnnnn

New cards

whats the dis of Tk

~ Expo ( k 2)

New cards

express the length of trees L in terms of Tk

L = sum(n, k=1) k Tk

New cards

(k 2) =

k(k-1)/ 2

New cards

what does the infinite sites model mean

Mutation occurs along branches of the tree as a poisson process of rate mu. when a mutation occurs it changes the nucleotide at a position in the sequence that has never changed before.

New cards

give me the law of iterated E and Var using S|L

E[S] = E[E[S|L]]

Var[S] = E[Var[S|L]] + Var[E[S|L]]

New cards

what is Wattersons estimator

theta^ = Sn / sum(n-1, k=1) 1/k

New cards

what is Tajimas estimator (also the pairwise difference)

(n 2)^-1 sum(n, i=1)sum(n, j=i+1)dij

where dij is the number of positions different between sample i and sample j

New cards

if we know B is dominant over b and we have a random sample size of N how do we find the maximum likelihood of p^

well we know

Nb ~ binomial(n, q²)

the log it, differentiate, set to 0, solve to find MLE of q and then use 1-q = p

(heads up that may be worth changing q² for q to do all the maths and then plug q² back in)

New cards

What are E and M in the EM algorithm

E = estimate the expected genotype counts/ frequencies of the missing data given current estimates of p

M = maximise the likelihood using these complicated geno counts

New cards

what exactly is a berstein estimate and how do you compute it

its a method of moments estimate

an example is you basically say

NZ = E[NZ] = N(r^²) when we have X Y and Z phenotypes and Z is recessive for both X and Y thus the only way to achieve Z is with zz meaning it has r² as the proportion, then we rearrange to get a estimate of r^
You then repeat this to find NX and NY to get estimates of p^ and q^

New cards

if you decide to use the EM algorithm to try to find maximum likelihood estimates of p, q and r. Write down the likelihood function that you are aiming to maximize. (where X is dominant over Y and Z and Y is dominant over Z)

we set up an MLE based on multinomial model with k = 3

so L(p,q,r) = multinomial (Nx, Ny, Nz)

= N!/Nx!….. (p² + 2pr + 2pq)^Nx (q²+2qr)^Ny (r²)Nz

New cards

when can you not distinguish H-W equalibrium or not

there is no test for these hypothesis when we only have phenotype data, the test fails. we need genotype counts

New cards

if we know that

P(C1= SS| M=SS, F=Ss) = 0.5

then what is P(C1= SS C2 = SS| M=SS, F=Ss)

or even all k children are SS

and why is this

= ( ½ )²

or ( ½ )^k

were using mendels and the fact that the children are independent transmissions

New cards

under the Wright–Fisher model without mutation, homozygosity satisfies the recurrence relation ?

two cases

decendants of the same ancector 1/2N
different ancestors 1-1/2N that happen to the be same (homozygosity of the previous generation) gt

meaning g(t+1) = 1/2N + (1/2N)gt

New cards

how are Allele proportion estimates effected by generations

they arent

they stay consistent

so the expected allele proportion should be the same in the original.

New cards

how can we apply the law of total probability for

(GC = RR | GD = QR , GM = QR, D= RR)

GC = grandchild
GD = granddad (moms side)
GM = grandmother (moms side)
D = dad

where Q is dominant over R

use law of total probability and factor in the mothers probability
basically of the form prob mum is blank due to grandparents multiplied by prob child is blank due to parents

= P(M=RR | GD, GM )P(GC = RR | M=RR, F=RR) +

P(M=QR | GD, GM )P(GC = RR | M=QR, F=RR) +

P(M=QQ | GD, GM )P(GC = RR | M=QQ, F=RR)(obvs this last line is 0)

New cards

by invariance of ml estimates what can we say

that r²^ = (r^)²

New cards

if r is a recessive allele how can we find the mle of r^

NR ~ bin(n, r²)

MLE of r² = NR/n using H-W equalibrium

using the invariance of ml estimates and that NR = n -NQ

r^ = sqrt[(n-NQ) /n ]

New cards

define heterozygosity

the probability that two gene copies sampled from the population are different

New cards

if were computing heterozygosity or homozygosity with mutation, whats a trick to consider when expanding out

that mu is very small and N is large so that we can reject/ignore mu² and mu/N

New cards

what is the mutation drift parameter theta

4 x mu x n

New cards

what is an expression for equilibrium heterozygosity

Hn − Hn−1 = 0

when Hn = Hn-1 we could say this happens a H so in our Hn − Hn−1 equation we let the H thingy be H and set equal to zero and rearrange

New cards

Watsons estimator =

Sn / sum(n-1, k=1) 1/k

where Sn is the number of segregated sites

and n= the number of samples A, B, C, D

New cards

what is a segregated site

a column where not all entries are the same

e.g position 0.13 has 0110110 then its a segregated site as it has a mixture of 0’s and 1s

New cards

what is Tajima’s π estimate

= sum(i>j) dij / (n 2)

where dij is the sum of difference between positions

so basically add up the number of difference between A and B, and A and C and B and C and for all of them basically!

New cards

n choose 2 =

n! / 2!(n-2)!

New cards

how can we use watsons (theta^) and tajimas estimates to work out effected population size

we know that mutation drift parameter is theta = 4Nmu

so we can do that

theta^ / 2mu or π / 2mu = 2N

half it to get the number of diploids

New cards

if we know we have sample size of 6, gen time of 28 years, and a effected population size of 30417. the what is the expected time to MRCA?

= E[T_MRCA] x 30417 × 28

New cards

E[T_MRCA]

= 2(1-1/n)

New cards

mutation occurs as a poisson process with (?)

(total length of branches(t) x theta ) / 2

like theta could be watsons

New cards

what makes incompatible sites

if any two columns i,j of the genotype matrix contain the pattern

then the corresponding sites are incompatible

New cards

for Hudson– Kaplan lower bound if out h= 0.2

is h ∈ (0.2, 0.7]

no h is not in it as ( means that 0.2 is not in the interval and hence we would set h = 0.7

New cards

what must happen at incompatible sites

there must be one recombination event between a pair of incompatible sites!