1/73
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Bernoulli
Can only take on values 0 or 1
Discrete
Can only take a finite number of values
Continuous
Can take infinitely many values
Probability distribution function
A function that describes how probabilities are assigned to possible values of a random variable
Discrete —> probability mass
Probability density function
Summarizes the information on the possible outcomes of X and corresponding probabilities
Here the the values are infinitely
So the distribution converges to density
X converges towards a continuous random variable
Cumulative distribution function
Gives the probability that X is less than or equal to a value x
F(x) = P(X <= x)
X and Y independent
P(X = x, Y = y) = P(X = x) * P(Y = y)
Conditional probability
P(X | Y) = P(X ^ Y) / P(Y)
Joint PDF is P(X ^ Y) = P(X | Y) * P(Y)
Mean
E(X) = mu is the expected value, the mean of a random variable x
Median
The middle value
Less affected by outliers than the mean
Mode
The most frequently occurring value in the dataset
Variance
Sigma²
Var(X) = E[(X - mu)²]
Larger variance = more spread
Standard deviation
Sigma
Square root of the variance
Covariance
Measures how two random variables move together relative to their means
Cov(X,Y) = E[(X - mux)(Y - muy)]
Cov(X,Y) > 0, When X is above its mean, Y is also above
Cov(X,Y) < 0, When X is above its mean, Y tends to be below
Cov(X,Y) = 0, No linear relationship
Correlation
Standardizes covariance to measure the strength and direction of the linear relationship between two random variables
Corr(X,Y) = rho = Cov(X,Y) / sigmax*sigmay
-1 <= rho <= 1
Normal distribution
A continuous, symmetric, bell-shaped function
X ~ N(mu, sigma²), X is normally distributed with mean mu and variance simga²
Standardization
Z = (X - mu) / sigma ~ N(0,1)
Standard normal cumulative distribution function
Φ(z)
Gives the probability that the normal random variable X is less than or equal to x is Φ(z)
Chi-square
Distribution of the sum of squared standard normal variables
X = SUM Z², Z ~ N(0,1)
With n degrees of freedom
t distribution
A random variable T has a t distribution with n degrees of freedom, denotes as T ~ tn
T = Z / sqrt(X/n)
Approaches normal distribution as n —> inf
F distribution
A random variable F has an F distribution with (k1, k2) degrees of freedom
F = (X1/k1) / (X2/k2)
Matrix definitions
Scalar: single number
Vector: one-dimensional array of numbers
Matrix: two-dimensional array of numbers
dimension written as R x C
Symmetric matrix
Square matrix that is symmetric along the leading diagonal
A = A’, equal to its transpose
Diagonal matrix and identity matrix
Diagonal is a square matrix with non-zero elements only on the leading diagonal
Identity is a diagonal matrix with 1 on the leading diagonal and 0 elsewhere
Transpose
Switching the rows and columns
C1 —> R1
C2 —> R2
A’
Full rank matrix
Rank is number of independent rows or columns
Full rank if all rows and columns are independent
Rank is equal to its dimension
Matrix addition and subtraction
Matrices need same RxC
A + B = a11 + b11
A - B = a11 - b11
Matrix multiplication and division by scalar s
s * A = s *a11
A / s = a11 / s
Multiplication of two matrices
If A is m x n and B is n x p, then AB is m x p
So if A is 2 × 2 and B is 2 × 2
Than AB is 2 × 2
Upper left of AB is A11 * B11 + A12 * B21
Inverse matrix
A x A^-1 = Identity matrix
A^-1 = 1 / (ad - bc) (d -b,
-c a)
ad - bc is called the determinant, if it is zero thatn the matrix is singular and therefore the inverse does not exist
Bivariate linear regression model
yi = alpha + Beta xi + ui
y and x are variables that we observe
alpha and beta are coefficients that we want to find
ui are all the other factors affecting y other than x, which are unobserved to us
OLS
Ordinary Least Squares
Takes the vertical distances, defined as ^ui, between each point in the graph and each potential candidate fitted line
Take the square of each distance and sums them SUM ^ui²
Find the estimated coefficients ^alpha and ^beta that minimize the sum of the squared residuals
^ui = yi - ^yi
^alpha = y - ^Beta * x
^Beta = ^Cov(y,x) / ^ Var(x)
Standardized coefficients
Beta * sigmax / sigmay
1 std increase in X increases Y by % of its std
log-level
log(yi) = a + Bxi + ui
B is the proportionate change in y as x increases by one percentage point
g = y’ - y / y
g = log(g + 1)
% dy = 100 * B dx
log-log
log(yi) = a + Blog(xi) + ui
Elasticity, 1% change in x gives B% change in y
% dy = B % dx
Level-log
y = a +Blog(xi) +ui
1% change in x gives B/100 % change in y
dy = (B/100)% dx
Homoscedasticity
The variance of the error u is constant and finite for any value of the explanatory variable x
Var(u|x) = sigma² < inf
Standard error
Measures the precision of an estimator
In regression, the SE of ^B tells us how much ^B would vary across different random samples
SE(^B) = SQRT sigma² / SUM(xi - x)²
Goodness of fit
RSS = (yi - ^yi)²
ESS = (^yi - ey)²
TSS = (yi - ey)²
R² is a standard goodness of fit
= ESS/TSS = 1 - RSS/TSS
the variation in x only explains 100*R² % of the variation in y
Assumptions OLS
A1 - The population model is linear in parameters
A2 - We have a random sample from the population
A3 - We have sample variation in the explanatory variable
A4 - The error u has an expected value of zero
A5 - The variance of the error u is constant and finite
A6 - Normality: The population error u in independent of the explanatory variables x and is normally distributed
t-Test
Determine the statistical significance of B
H0: B0 = 0
t = (^B - B0) / se(^B)
Compare t to a critical value at chosen significance level
If t > c, reject H0yes
p-value
Is the smallest significance level at which we would reject the null hypothesis
If the test is one sided, divide the pval by two
Confidence intervals
Upper: B + c * se(B)
Lower: B - c * se(B)
If CI includes 0 —> not statistically significant
Multivariate OLS assumptions
A1 - Population model is linear in parameters
A2 - We have a random sample from the population
A3 - We have sample variation in the explanatory variables, and there are no exact linear relationships among them, also known as no perfect collinearity assumption
A4 - The error u has an expected value of zero
A5 - Homoscedasticity, the variance of the error u is constant and finite
Cases where error u has an expected value of zero can fail to hold
Omitting relevant variables
Simultaneity, one or more of the explanatory variables is jointly determined with y
Measurement error, one or more of the explanatory variables is measured with some error
Omitted variable bias
Suppose true model is
y = B0 + B1X1 + B2X2 + u
Omit X2
y = a + sigmaX1 + v
Part of X2 effect leaks into slope of X1
Sigma = Cov(x1,y) / Var(x1)
Bias formual
E(sigma) - B1 = B2 * Cov(x1,x2)/Var(x1)
If x1 and x2 are positively correlated and B2>0, the bias is upward
Breusch-Pagan test
Test whether the variance of the errors is constant or depend on the values of the regressors
Estimate original regression yi = a + B1X1 + B2X2 + u
Obtain the residuals, square them and estimate
^u² = gamma0 + gamma1X1 + gamma2X2 + ei
Use F test to test the null hypothesis of homoscedasticity
H0 : gamma1 = gamma2 = 0
Clustering
Errors are correlated within groups, but independent across groups
OSL coefficients remain unbiased, but se are biased
Adjusted R²
R²adj = 1 - (RSS / (n-k-1)) / (TSS / (n-1))
F test
Test joint significance of several coefficients
H0: B1 = B2 = 0
H1: At least one B is not equal to 0
F = (RSSr - RSSur)/q / RSSur/(n-k-1)
Multiplicative dummy
Multiplying a dummy variable with another regressor
Allows the slope of a variable to differ across groups
Interactive dummy
Multiplying two dummy variables
Capture effects that occur only when two conditions are true at the same time
Basic paned data problem
yit = Bxit + ai + uit
ai: unobserved unit effect
If ai is correlated with xit, OLS is biased
Fixed Differences
Subtract previous year to remove ai
dyit = Bdxit + duit
Uses changes over time to estimate effect
Fixed Effects
Remove unit effect by subtracting each unit’s mean
Take the time-series mean of each entity
yi = SUM yit / T, xik = SUM xitk / T
Subtract this from the values of the variable
yit - yi = B (xit - xi) + (uit -ui)
Time Fixed Effects
Instead of controlling for uni-specific effects, we can control for time-specific effects lambda
Useful when yit changes over time on average but not due to unit-specific factors
yit = a +lambdat + B1X1it + …. + uit
Lambda is the time-varying intercept capturing shocks common to all units
FE or FD
Balanced sample
If T=2, FE and FD are identical
If T>2, bias due to measurement error or mild violations of strict exogeneity may shrink with T under FE
Unbalanced sample
FE typically preserves more data than FD in unbalanced panels
Random Effects
RE allows different intercepts for each entity, constant over time
Under RE, intercepts are random, drawn from a common distribution
Model the intercept for unit i as:
ai = a + Varepsiloni
Heterogeneity in the cross-section dimension occurs via Varepsiloni, not via dummies
Single-Dimension clustering
Allow correlation within firm over time
Cov(uis, uit) not= 0
Allow correlation within year across firms
Cov(uit, ujt) not= 0
Two-Way clustering
Cluster by both firm and year
Allows
Cov(uis, uit) not= 0
Cov(uit, ujt) not=
Assumes
Cov(uis, ujt) = 0
Endogeneity bias
Occurs when regressor is correlated with the error term
Omitting relevant variables
Simultaneity, one or more explanatory variables is jointly determined with y
Measurement error, one ore more of the explanatory variables is measured with some error
Instrumental variables
Solution when regressors are endogenous
Need an instrument z that satisfies:
Relevance Cov(z,x) not= 0
Exogeneity Cov(z,u) = 0
Two-stage least squares
Regress x on z
Regress y on predicted x
Randomized controller trial
Scientific experiment popular for clinical trials to test the effectiveness of drugs, where each individual from a sample is randomly assigned to one of two groups
Treatment group
Control group
Difference-in-Differences
Mimics RCT using natural experiments
Compare before-after differences between treatment and control groups
yit = B0 + B1(Postt x Treati) + B2Treati +B3Postt + uit
B1 is DiD estimator
Key assumption, parallel trends —> without treatment, treated and control would have evolved similarly
Regression Discontinuity Design
Exploit cutoff that assign treatment
Compares only outcomes of agents very close to the threshold
Assumptions:
Assignment to treatment occurs through known and measured deterministic rule x < x’ vs x > x’
Both potential outcomes E(y(0)|x) and E(y(1)|x) are continuous in x at x’