1/52
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
posterior dis look like what using pi and x
p(pi|x) = f(x|pi)p(pi)
posterior odds
prior odds
whats the corresponding bayes factor
= P(H1|x)/P(H0|x)
= P(H1)/P(H0)
= BF10 = Posterior odds/prior odds
if we have
zi, a, b, pi and y then what is the joint posterior dis of all parameters
P(a,b,pi,z | y) proportional to
P(y, z | a, b, pi) P(z| pi) p(a) p(b) p(pi)
p(zi = 1| pi, a, b, yi) how can we find this using bayes theorm
= p(zi = 1|pi, a, b)p(yi| pi, a, b, zi = 1) /
SUM(1, j=0) p(zi = j|pi, a, b)p(yi|pi, a ,b , zi = j)
say this equals qi then how can we sample zi
from bernoulli dis with probability qi
yi |µ, σ2 ∼ N(µ, σ² ),
p(µ) ∝ 1 and σ² ∼ Gamma(α, β).
then full conditional posterior distribution
p(µ|σ 2 , y1, . . . , y11)====
p(µ) (∏n, i=1)f(yi|µ, σ² )
as we only care about mu
what is the steps for using metropolis hasting algorith for say σ2
Let σ2t be the t-th iterate of σ2 in the Metropolis-Hastings algorithm. For sampling σ2 at step (t + 1) of the algorithm, we do the following:
Propose a new sample σ2* from a proposal density q(σ2*| σ2t)
Accept σ2* as the new sample with probability r
what would r be in the case where
r = p(σ2* | µ, y) q(σ2t |σ2*) / p(σ2t | µ, y) q(σ2* |σ2t)
Why would a prior for the regression coefficients in a model chosen be N(0, 100)
just a lucky guess that seems reasonable lol
In The matrix of the natural log Bayes Factors
how would you calculate the value in row m1 and collumn m2.
Thus what would be the value in row m2 and collumn m1
= log marginal likelihood of m1 - log marginal likelihood of m2 = alpha
thus the second is
= - log(BF12) = -(alpha) easyyyyy
In BFab how do you choose a and b
a is the row b is the collumn
how do you write log marginal likelihood of m2 in probabiloty
logP(y|m2)
how do you chose what models fits best to the data
highest marginal likelihood.
say model B, then compare this to second best model (model C) by looking as
exp(BFBC)
if
exp(logBF32) = BF32 = 780 what does this mean
data are about 780 times more likely under Model 3 than under Model 2.
prefer model 3!
What should I conclude if effective sample sizes from an MCMC output are slightly below the total number of iterations? Should I worry about convergence?
If effective sample sizes are slightly below the total number of MCMC iterations (e.g. 7,000–9,000 out of 10,000), this typically does not indicate a problem. It may just reflect some autocorrelation in the chain.
However, if the effective size is very low (e.g. a few hundred), that might indicate poor mixing or lack of convergence, and you'd want to run longer chains, thin them, or re-diagnose.
✅ If sizes are moderately high (e.g. 7,500+), you're probably fine — but you could run longer for reassurance, especially for key parameters.
what is the general form of joint posterior dis
p(parameters∣data)∝p(data∣parameters)⋅p(parameters)
if we have joint posterior distribution
p(α, β, γ | u, v, w, x, y, z) ∝ exp {− 1/2 [ uα² + vβ² + wγ² − xαβ − yβγ − zαγ]}
Write down the posterior full conditional densities for α, β, γ needed, in order to sample from their joint posterior distribution using a Gibbs sampler. also how would you find them
p( α| u, v, w, x, y, z, β, γ),
p(β|u, v, w, x, y, z, α, γ) and
p( γ|u, v, w, x, y, z, α, β)
you would find them by taking the terms from the joint posterior density with only that term in it
we have two models
M1 and M2
what is BF12
in terms of maarginal likelihood
P(y|M1)/P(y|M2)
what is P(y|M)
marginal likellihood
p(y∣M)=∫p(y∣θ,M) p(θ∣M) dθ= int( {product [f(yi|θ)]} p(θ) )
if we are intergrating the beta pdf so
∫ Beta(a,b) =
B(a,b)
B(a, b) = gamma(a)gamma(b)/gamma(a+ b), and gamma(a + 1) = a x gamma(a)
when can the marginal likelihood p(y∣M3) not be computed analytically
if the priors are not conjugate to the likelihood
how do we know when to do inverse gamma
if it looks like it follows a gamma
but our coef, say x is 1/x and not x, then its inverse gamma.
meaning if we have
(x)-(n+p)/2 - a - 1 …. as our pdf
then this looks like it follows
inv-gamma((n+p)/2 + a, …)
in the r package/code
MCMCregress(…..)
what are the default priors for
σ²
and
β
σ²∼Inverse-Gamma(ν0/2, (v0 x σ²0)/2)
β ∼N(0,σ²B0-1)
if log marginal bayes factor is = 1 which model do we prefer
the simpler one
using the code bvs1$modelsprob
what does the last collumn this gives tell us
(prob)
and how do we choose a model/variables using this
gives a list of the posterior prob of each of the models given the observed data.
we choose the model with the highest posterior prob, and choose the variables in that model.
exp(θxi)zi =
exp(θxizi)
how do you do you write down the complete data likelihood of a model
we do the sum of the f(y|…)
but plug in ANYTHING we know
so say we now have a zi function follows bernoulli(pi) of being 1 and 0 if not then multiply any pi^zi and (1-pi)^1-zi
same for f1 and f2
if logBFij
gives us a positive then
gives us a negative then
A positive log BF means model Mi is better than Mj
A negative log BF means model Mj is better than Mi
what does the BVS code do in r
does baysian variable selection, which checks all possible combinations of predictors and gives the posterior probability of each model.
say we had 5 predictors this would mean if we were to complete this mannually we would need to do 25 BFs which would be longgggg init
Inclusion probability in R means???
is the chance that a variable is included in the best models, based on the data
BF =
posterior odds/ prior odds
cade pgamma(2, 0.1, 0.5) tells us what
λ≤ 2 given λ ~ Gamma(0.1,0.5)
Posterior Predictive Distribution:
p(y~∣y)=
∫p(y~∣θ)p(θ∣y)dθ
by definition of the gamma function:
∫inf λa−1e−bλdλ=
Γ(a)/ba
importance sampling estimator equation
ɸ^ = 1/M ( sum(m, i=1)wi ɸ(xi))
what is wi =
f(xi) / g(xi) or h(xi)
how do you then find the maximum value of the impartance sampling weight wi
log(wi)
take deriv with respect to x
solve to find max x
plug this x back into our wi and thats it baby
in a MCMCregress code in R,
if we state that V = 1/10
and then that B0 = V
what exactly is this telling us
and how can we use this to find the priors on B
that the prior precision is 1/10
and that the priors on B are N(0, 1/B0)
= N(0, 10I) where I is the identity matrix
because the precision is the inverse of the variance, and it tells us how stongly we believe to prior for the regression coef B.
if there is no specification on the prior precision what do we set the priors of B to be
~ N(0, I)
in MCMCregression
what prior do we assume for σ²
~ inv_gamma(with default hyper parameter settings)
what happens when we set B0 = 0
means we have infinite variance, and know nothing about B and the prior is improper so we cant calculate marginal likelihood.
how do we choose which model is best using there log marginal likelihood
choose the model with the highest log marginal likelihood
how do we compare two models using the log bayes factor
say we have log(BF34) = 3.668
then BF34 = ex(3.668) = 39.17 which shows that model 3 is a significant improvment from model 4
if we have 3 parameters how many possible models do we have/ covariate combinations
2³ = 8
okay so say we have 3 covariates and we have done BF for 4 possible models, if this enough to choose the best model. if not what should we do to find the best one
no because we havent tested every possible combination of models.
instead we should carry out a bayesian variable selection procedure e.g BVS
what does BVS calculate
posterior probability of a model
we choose the model with the highest one.
if μ∼N(3,1)
then what are the prior odds of
odds:
P(μ≤3)/P(μ>3)
= 0.5 / 0.5 = 1
because the normal dis is symetric around it mean then it have the same probability of being greater or being less then 3.
how do you perform a BF using hyposthesis
BF10 = Posterior odds / prior odds =
P(H1|data)/P(H0|data) / P(H1)/P(H0)
if very very large we favour H1
if we know µ|Y = 6 ~ Normal(5, 1/3).
then how do we calculate
P(µ ≤ 3|Y = 6)
Standardize so
𝐳 = 3-5/sqrt(1/3)
then find P(Z ≤ 𝐳 )
potential scale reduction factor
PSRF > 1 means what 2 things
between chain variance is larger than within chain variance
lack of convergence, i.e., chains are still over-dispersed and haven’t mixed well.
At the end of a run of the Accept-Reject algorithm, we get a set of
independent draws from the target density.
what are the constaints that must hold for candidate density g(x)
if we aim to use it to sample from more complex dis f(x)
f and g have compatible supports (e.g if f(x) >0 then g(x)>0)
there is a constant M so that f(x)/g(x) <= M for all X