Looks like no one added any tags here yet for you.
Contingency Table
Table that summarizes data for two categorical variables
Stacked bar plot
Graphical display of contingency table information, compare sto variables on top of each other *not added
Side by side bar plot
Compares two variables side by side
When is stacked, side by side, or standardized stacked bar plot the most useful?
Stacked bar plot: assign one variable as explanatory and other as response
Side by side: does not show which variable causes the other, easy to compare cases but requires more horizontal space, also not favorable if groups are different sizes
Standardized bar plot: used if primary variable is relatively imbalanced, shows proportions
Mosaic plot
Resembles standardized stacked bar lot
(Widths = proportions)
Use areas to represent number of cases in each category
Side by side plot
Traditional tool for comparing across groups
Hollow histograms
Compare numerical data across groups (shown with outline)
Hollow histograms vs side by side plots
Hollow histograms: useful for seeing distribution and skew
Side by side boxplots: useful for comparing centers and spreads
Scatterplot
Case by case data for two numerical variables
Dot plot
One variable scatterplot
Formula for end of whiskers
Q1-1.5 times IQR or Q3+1.5 times IQR
Should it surprise you if the actual outcome of a potential random variable problem is slightly below or under the probability you calculated?
No— there is natural variability
Random variable
Random process with numerical outcome
Expected value
Sum of X*P(X)
Expected value can also be known as
Mean
How to find proportion of certain bin in continuous distribution (histogram)
See count divide by total sample size
Probability density function
Smooth curve over bars on histogram
total area under a curve in density distribution
1
Two types of random variables
Discrete and continuous
Probability distribution in the real world shows you
Your chances of winning
Columns for probability distribution
x, p(x), x*p(x), x²*p(x)
When you have money you should always round
Up
Variability
Difference in value of random variable (how much variation is expected)
Why do you need to find variability?
To find standard deviation
Formula for variability (to get ______)
SD*2= sum of x²*p(x)-(x*p(x))²
Fair game
costs as much as payout so expected profit =0
How do you prove a game is fair?
Sum of X*p(x) =0
Normal distribution
Most variable are normally distributed aka uni-modal and bell shaped
Two ways to prove symmetry
Mean= median=mode
Peason’s index
Pearson’s Index
I=[3(mean-median)]/SD
*must be between -1 and 1
What does z score tell you
How far away you are from the mean
Why do you need z score?
Standardize data to compare
Percentile
Percentage of data below specific data point
How do you round percentile?
Always round up
Gold standard for z-score
Mean =0
SD = 1
Z score formula
Z= (given-average)/SD
How to use z-score
Find z score then look at chart for probability and percentile
unusual vs outlier stats wth z score
|z|>2 = unusual
|z|>3= outlier
Empirical Rule
68-95-99.7
1 SD-2 SD- 3 SD
Area under curve
Probability
How do you get right side of a Norma distribution
Subtract from 1
How to look at normal distribution graph
Left to right (DO NOT START FROM MEAN)
How do you find cutoff?
Give you probability => look for closest z score (NUMBER IN PROBLEM IS NOT Z SCORE)
Plug into equation (x or given should be blank)
Solve algebra
Upper vs lower limit questions normal distribution
Will be 2 SD away BECAUSE otherwise they are outliers and we do not plot outlier (only use 3 SD if it says to include outliers)
How to write up normal distribution with parameters
N (mean =0, SD =1)
How to compare z scores
Higher |z| score = unusual
probability
proportion of times the outcome would occur if we observed the random process an infinite number of times
Law of Large Numbers
the tendency of proportion of outcomes to stabilize around calculated probability (empirical => classical)
think experiments
how to write probability of xx happening
p(xx)
disjoint
two outcomes cannot occur at the same time
another term for disjoint is
mutually exclusive
how do you calculate disjoint probability
add up probabilities of each thing occuring
what word is associated with disjoint outcome
or
addition rule
P(A1 or A2)= P(A1)+P(A2)-P(A1andA2)
events
sets/collections of outcomes
ex: group a: if you roll 1 or 2, group b: if you roll 2 or 3
nondisjoint outcomes
when outcomes can overlap
ex: roll 2 and even
face card
jack, queen, king
how many cards in a deck of cards
52
Or is
inclusive (so it is and/or)
Why would (P and B) be 0 if outcome is disjoint
because they would never overlap
probability distribution
table of all disjoint outcomes and their probabiliteis
rules for probability distributions*
outcomes listed must be disjoint
each probability must be between 0 and 1
probabilities must total 1
sample space
all possible outcomes
complement of x
all outcomes that are not x
P(x or x1)= 1
independent
when outcome of one provides no useful info about another outcome (like flipping a coin and rolling a dice)
when do you use multiplication rule vs addition rule
multiplication: independent
addition: disjoint
multipplication rule
P (A and B) = P(A) x P (B)
who is the father of probability
Jerome Cardan
classical probability
#ways e can occur / total possible outcomes
empirical probability
#ways E occured/total # attempts
limitation of probability
cannot PROVE anything with probability
can disjoint outcome also be independent?
no because disjoint occurs at differnet times and independent is at same time
@ least 1 means
1 OR MORE
complement of at least 1 is
0 cuz otherwise you have no options
Contingency table includes
cases are horizontal and results are vertical
marginal probabilities
probability based on single variable
joint probability
probability of outcomes for two ore more variables
table proportions
data presented shows a proportional relationship between two variables
conditional proabability
compute probability under a condition
two parts of conditional probability
outcome of interest and condition
how to read out P( X| Y)
probability of x given y
conditional probability formula
P(A|B)= P(A and B)/P(B)
general multiplication rule
P(A and B)= P(A|B) x P (B)
gamblers fallacy
casinos post last several outcomes of betting games to trick gamblers into believing odds are in their favor
ex: all black last time, you believe it is unlikely you will get black
tree diagram
organize outcome and proability around structure of data
primary v secondary branch
primary: first branch (split)
secondary: other splits
false negative and false positive
shows up as positive/negative (true/untrue) even when it isn’t
without replacement
you cannot sample the same case twice
what do you have to do with a “without replacement” situation
remove from possibilities
when should you use at least rule
when you need to calculate the likelihood of an event happening at least once in a given set of trials
when sample size is nearly less than 10% of pop, observations are
independent
If P(A and B) = P(A) P(B), A and B are
independent
If P(B) = P(B |A), then A and B are
independent
Binomial distribution
Describes number of success in fixed number trials (usually one or the other)
Difference between binomial and geometric distribution
Geometric: describe number of trails you must wait before success
Binomial: number of success in fixed number trials
Binomial distribution formula
(N!/K!(n-k)!)*p^k(1-p)^n-k
What do variables in formula mean?
N= number of trials
K= number successes
Mean, variance, and SD formula of observed successes
Mean: np
SD²=np(1-p)
Four conditions to check if binomial
Independent trials
Number of trials n is fixed
Each trial can be classified as success or failure
Probability of success p is the same for each trial
How to solve binomial distribution with normal distribution steps
ONLY TO BE USED WITH LARGE SAMPLES!!!
Treat all steps normal except add 0.5