1/59
vocab and formulas + examples
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Treatment group
the group/subject that received the treatment
ex: drug for sleep; receive a drug that shows real effect
Control group
the group/subject that did not receive the treatment
ex: drug for sleep; receive a placebo (sugar pill) instead of real drug
Randomization
randomly assigning participants to treatment or control groups
Why is randomization important?
Helps reduce bias in both control and treatment groups
confounding factors
a variable that influences both the independent variable (what you’re testing) and the dependent variable (the outcome), making it hard to tell if the observed effect is real
ex: people who carry lighters have higher rates of lung cancer, CF = smoking
basically other factors that explain the results of the experiment
placebo effect
fake treatment that has no actual effects
double blind
neither the participants nor the researchers know who is in the treatment group and who is in the control group;
actual vs placebo no one knows until the end
explanatory variable
the subject that is changed/controlled in a study; often manipulated by researchers
response variable
the result/subject that is observed
experimental study
researchers control and randomly assign participants (treatment vs control), where variables clearly affect another variable
ex/key words: randomly, measure, experiments, “effect of..”, “applied/given”
observational study
researchers observe and collect data
ex/key words: study, survey, observe, “association”
association
Apply to: observational and experiment
Def: two variables are associated/one doesn’t necessarily effect the other
ex: A and B happen together, but A doesn’t necessarily make B happen; glass is broken and water is spilled
causation
Apply to: experiments that are random and controlled; CANNOT be OBSERVATIONAL
Def: directly affects another variable
ex: A actually makes B happen (cause effect); push glass and it’ll break
confounding variable examples
sunscreen can increase risk of skin cancer; A and B are associated
CF: genetics of skin cancer
Variable A: Using sunscreen
Variable B: risk of skin cancer
population/population of interest
the entire group of the study
average number of hours high school student sleep
ex: all high school students
sample
the group that is being studied
average number of hours high school student sleep (150 students)
ex: the number of students in this study = 150 students
representative
choosing sample (participants in study) at random; will be more likely to be representative
Identify population, sample, explanatory and response variable, population of interest, type of variables (numerical, ordinal, categorical) in the study:
Average number of hours all 150 high school students in the U.S. sleep per night
Population: All high school students in the U.S.
Sample: 150 high school students
Explanatory variable: none, because it’s observational.
Response variable: Number of hours of sleep per night
Types of variables:
Number of hours of sleep: Numerical (quantitative, continuous)
Grade level (if collected): Ordinal (9th, 10th, 11th, 12th)
Gender (if collected): Categorical (male, female, other)
simple random sampling
every member has a chance of being selected
convenience sampling
individuals that are selected who are easily accessible/convenient
selection bias
when sample (individuals) collected not representative of the population (topic of study)
In a data set, row is____, the column is _____?
row = observational unit/case
column = variable
Numerical variable — include what two types? (Quantity)
discrete and continuous
Definition: measure/record numerical data
ex: Number of hours students sleep per night — 7.5, 8 hrs
Age of people → 18, 25, 40 years
Height in cm → 160, 175, 182 cm
Test scores → 85, 92, 78
Categorical variable — consist of what two types? (Quality)
ordinal and nominal
Definition: represent categories/groups NO NUMBERS
ex: Gender → male, female, non-binary
Eye color → blue, brown, green
Type of pet → dog, cat, bird
Yes/No responses → yes, no
ordinal variable def/examples
meaningful order
ex: Education level → High School < Bachelor’s < Master’s < PhD
Satisfaction rating → Unsatisfied < Neutral < Satisfied < Very Satisfied
nominal variable def/ex
no natural order
ex: Eye color → Blue, Brown, Green
drinks —> pepsi, sprite
Type of pet → Dog, Cat, Bird
discrete def/ex
numerical value that is countable/separate values
ex: Number of siblings → 0, 1, 2, 3…
Number of cars in a household → 0, 1, 2…
continuous def/ex
def: any value in range (measured)
ex: height, weight
numerical vs categorical + subsections
Numerical — can be measured (quantity)
discrete: countable numbers (whole numbers) ; ex: number of siblings
continuous: any value of range; ex: height, weight, fraction, decimal
Categorical — category of group; no numbers, quality
nominal: no order; ex: eye color, gender, pet
ordinal: has order; ex: ratings (satisfied neutral bad), educational level (freshman soph junior senior)
Dot plot question examples
fewest total number in data set = lowest value on chart; ex: 1-4, one is the lowest
largest total number = highest value on chart
most frequently observed total number = has the most/highest values on chart; ex: column 3 has the most people voted
least frequently observed = the shortest/least value on chart
intervals (x axis)
intervals = range of data set
500-1000
1000-1500
frequency (y axis)
number/count in the intervals
500-1000 has 2
1000-1500 has 5
symmetric vs right vs left vs bell/no bell
symmetric = data clustered in middle w/ bell
right skewed = tail on right
left skewed = tail on left
not bell but symmetric = two bell, split in half
parameters
values calculated from population
ex: average number of hours student sleep
statistics
values calculated from samples
ex: 150 students randomly selected from diff schools
Identify population vs sample vs parameter vs stat example
population: all high school students
sample: 150 students randomly selected
parameter: average hours for all student
statistic: average hours for 150 students
mean formula
x bar = sum of all values/total number of values
ex: 1 2 3
x=1+2+3/3
median formula
middle value
odd = middle single value; ex: 123, median = 2
even = middle two values/2; ex: 123456, median = 3+4/2
Reading the mean vs median on a histogram
symmetric diagram: mean = median
right skewed: mean greater than median
left skewed: mean less than median
mean is the tail on chart
range formula
range = max - min value
IQR Interquartile Range Formula
IQR = Q3-Q1
Percentile Ranges: 25, 50, 75
25 = median of lower half data, Q1
50 = median (middle line on boxplot)
75 = upper half after the median, Q3
Finding IQR Examples
50, 51, 56, 61, 70, 71, 80, 84
Q1: 51,56/2
Q3: 71,80/2
IQR: Q3-Q1
Notations for Parameters and Statistic
Parameter
Mean: u
variance: o²
SD: o
proportion: p
Statistics
Mean: x bar
variance: s²
SD: s
proportion: p^
deviation formula
Sample deviation: x-x bar
Population deviation: x-u
squared deviation formula
(x-x bar)² OR (x-u)²
variance formula
measure average square deviation
Population (o²) = sum of (x-u)²/n
sample variance (s²) = sum of (x-x bar)²/n-1
SD formula; square root variance
Population SD: square root population variance
sample SD: square root sample variance
proportion formula
cases in category/total number of cases
ex: 20 out of 50 students have blue eyes
p=20/50
Proportion tyoes: observed sample vs population
sample = p hat
population = p
probability formula
number of desired outcomes/total number of possible outcomes
ex: chances of rolling a 4 ; 123456 = 1/6
Probability is always between 0 and 1
0 = unlikely
0.5 / 50% = half likely/not likely
1 = positive will occur
sensitivity vs specificity
true positive vs true neg
specificity = 1-value
base rate def
proportion of population
independent conclusion
two variables A and B, is independent if A is not affected by B
null hypothesis (Ho)
no change, difference, or relationship between variables
if null is true = due to chance
null value = zero means independent
alternative hypothesis (Ha)
there is change, a difference, or relationship between variables
= not due to chance
observed difference
P (value A) - P (value B)
p-value
a null model used to calculate probability
helps to decide whether to reject the null hypothesis
P-Value Chart
greater than 0.10 = little evidence
between 0.05 and 0.10 = some evidence
between 0.01 and 0.05 = strong evidence
between 0.001 and 0.01 = very strong evidence
less than 0.001 = extreme strong evidence