Looks like no one added any tags here yet for you.
Random variable
a numerical value assigned to an event in the sample space
Expected value
Mean, average or expected value for discrete random variable, Y
Mx+y
Mx + My
Mx-y
Mx- My
If a & b are constants: M(a+bY)
a+ bµY
sigma(a+by)
b²sigma²y
IF x+y are independent: sigma²x+y
sigma²x + sigma²y
sigma²x-y
sigma²x + sigma²y
Binomial Random Variable Characteristics
1) There are only 2 possible outcomes for each trial: “success” and “failure”
2) There are N trials
3) Trials are independent
4) Prob. of success= P is constant
5) Interested in # of successes
Density Curves
Smooth histograms for continuous data
Probability= area under the curve
Total area under the curve= 1
Most widely used continuous variable
Center of distribution is the mean (mean= median= mode)
Symmetric distribution
Normal Random Variable
Z score
represents the number of standard deviations above or below the mean
Why can’t we assume that P(Y=a) or P(Z=a) is an integer above zero?
We cannot find area at singular points underneath the curve, they always must be from a range
Ways to assess Normality
Calculate the mean of Y ± 1sd, 2sd, 3sd, see where it approximates as a percentage (should be around 68%, 95%, 99.7%)
Q-Q plot: theoretical data on x axis, actual data on y-axis→ closer the points are to straight line of theoretical data= more confident we are about normal assumption
Sample mean
way to estimate the population mean, Y
Random variable with some distribution
Ybar= random variable denoting ALL possible values of sample mean, ybar= observed value based on sample
Ybar= random, ybar= fixed→ sampling distribution of the sample mean
The spread of the means is always…
less than the spread of Y
Central Limit Theorem
if sample of size n≥30, is taken, then Ybar= normal distribution regardless of distribution of Y.
as long as sample size= large enough, the averages will distribute normally
SEybar vs sigma ybar
sigma y bar= variability of the data from the mean
as n→ ∞, ybar converges to Mu, s onverges to sigma, and sy/sqrtn converges to 0. This means there will be no difference between the mean and sd.
IF we have the entire population, our error is 0 since ybar= mu, and s= sigma
t-distribution
the standardized distances of sample means to the population mean when the population sd= unknown, observations come from a normally distributed population.
Properties of t-distribution
same general shape of curve; symmetric but spread wider
spread relies on n; smaller n= more spread in distribution
t⍺= t-value w ⍺ in the upper tail @ df= n-1
since t is always wider than z, t⍺>z⍺