1/128
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Population
ALL that we are examining
Be specific, #’s, species, etc
It’s the WHOLE GROUP, not just the ones being experimented on
Sample
Portion of the population that we are actually examining
Smaller group within population
Usually the group that’s being experimented on
Cencus
Method for collecting data from every individual in the population
Kinda like population (the experimentees)
Individuals info
Sample design
Method used to pick the sample from the population
Voluntary response sample
“Do you want to volunteer in this experiment?”
When someone or a group decides themselves to take part in/respond to the suvey/experiment
BIASED
Convenience sample
Choosing people that are easy to reach
Your sample survey should be your unbiased estimate of your parameter population
Always underrpresentative of the population
Bias
Systematically favoring certain outcomes
Deviates away from true population parameter
Systematic random sample
Choosing every nth person
Using a chance process
Random bias (bias is questioned)
Probability sample
Each member of the population is given a known chance to be selected
The “chance” isn’t necessarily equal
EX) Choose a sample that consists of 20% cats and 80% dogs
Sample survey
A study that chooses a sample that represents a specific population
Simple random sample
A method of selecting participants for a study where every member of the larger population has an equal chance of being chosen for the smaller sample
A sample of size n is chosen from a population such that every group of n individuals in the population has an equal chance to be selected as the sample
In theory, this should be an equal shot, but it’s about the sample size
Every 10th person, not every single person, had a chance
How to choose an SRS
Names out of a hat (small group)
Avoids bias
Random Digits
Random Digits
Used to randomize a list of individuals before selection for a sample
Give everyone in the group a number based on name (alphabetically)
The line will already be given to you, so say the first numbers to come up will go in group one, and continue until no more spots, then go to the next groups
Make sure to say no repeated numbers and the numbers you will omit
EX) Will choose numbers 01-30, will omit any repeated numbers as well as 00 and 31-99
Stratified random sample
A sampling that separates population into groups (stratas) then chooses a separate SRS from each group
Those that are selected in each group are then combined to form one complete sample
EX) Countries can be stratified into rural, suburban, and urban. From each we can find an SRS and then combine our results into one larger sample
This will always have someone in each group, an SRS has the possibility of having everyone in one big group
Multistage sample
Samples chosen in stages
Find an SRS of states in the US
Find an SRS of high schoolers in those states
From these high schoolers, find an SRS of students from each school
NOT A COMMON SAMPLE MAKER
Cluster sample
Classify the population into groups of individuals that are located near each other (clusters). Choose an SRS of the clusters. All individuals in the clusters are included in the sample
Cluster samples are often chosen for ease, and thus they may have as much variability as the population itself
What’s the difference betwen CLUSTER + STRATIFIED?
Each cluster looks like the population but on a smaller scale
Each stratum contains similar individuals but there are larger differences between strata
Undercoverage bias
When people are left out of the sampling process
EX) Homeless people, college students, people without phones, etc
Non-response bias
Occurs when people can’t be contacted (away from phone, etc) or refuse to participate
Response bias
A systematic pattern of incorrect responses in a sample survey, Sometimes thought of as a liar or being influenced by the interviewer
Poorly worded questions bias
EX) Should people slaughter cute baby seals for their fur?
Observable study
Measurements representing a variable of interest are observed and recorded without controlling any factor that night influence their values
No treatment is imposed
EX) Didn’t stay up late cuz Mr. Collins gave them a pill
No testing
Experimental study
Observations after imposing some treatment on a group of subjects to measure their response
What’s the difference between OBSERVABLE + EXPERIMENTAL studies?
Experiments are the only source of fully convincing data if our goal is to understand cause+effect
Observational evidence can NOT do this
Experimental units
Individuals that are being experimented on
EX) 300 male volunteers testing drugs A and B, the EU would be the 300 male volunteers
Treatments
The combo of the variables being used on the individuals
EX) 25g of sugar + drug A, 25g of sugar + drug B, etc
Factors
The explanatory variables in the experiment
EX) 0% sugar, 10% sugar, 25% sugar, the factor is the sugar concentration and there would only be 1 factor in this example
Levels
The specific values of the factors
EX) 0% sugar, 10% sugar, 25% sugar, those would be the levels
Confounding variables
Variables that may effect the experiment that are out of their control
EX) 400 men on weight loss drugs, outside factors like lifestyle, genetics, etc would be the confounding variables
Explanatory variables
Pretty much what factors are
EX) Sugar concentration instead of the actual percentages of sugar concentration
When do you use a blocked web design instead of just a normal web design?
When there may be differences in ages or gender that will effect how the treatments effect them
Why use block design (sentence)?
Blocking accounts for variability in -what’s being measured-that may arise due to the variable you are blocking
Matched pairs
Subjects are their own control groups
Pair 2 people by a common characteristic - randomize who gets the treatment and who goes to the control group - Test - Compare - Replicate
Each person is their own control. Randomize the treatment order for each person. Before and After study. Compare treatment and replicate
Why must treatments be randomized in matched pairs?
So one doesn’t have an effect on the other, no order bias
Variability
Simulations
Figure out what people are saying without asking
EX) At WHS, 20% take art, 30% take child lab, and 50% take band
Let 0-1 take art, 2-4 take child lab, and 5-9 take band
Single blind
Participants don’t know but data receivers do (placebo)
Double blind
Neither the participants nor the data receivers know (placebo)
What are the 4 principles of experiment design?
Compare, control, randomize, replicate (not only replicate experiment, but more importantly the sample sizes which must be large for more accuracy)
What is statistical inference?
Drawing a conclusion about a population from information obtained by a random sample of that population
What is the importance of “blinding” in an experiment
To reduce bias
When is the best time to use a cencus?
With a small population where the only data being collected is facts (like their age, gender, etc)
Quantitive data
Numerical data, can be measured
Categorical data
Records the group an individual belongs to
Distribution
Distribution of a variable tells us what values the variable takes, and how often it takes them
Outliers
An individual value that falls outside the overall pattern of data
Potential outlier
An outlier that you yourself didn’t calculate, no matter how obvious the outlier is
Inference
Drawing conclusions that go beyond the data at hand
When do you make the bars on a bar graph touch and vis versa?
Touch when data is in groups and related, don’t touch in all other circumstances
Mean
The average of all numbers, may not reflect the data well due to potential outliers
It is nonresistant which means it is sensitive to the influence of extreme potential outliers
Median
The middle most number
It is resistant meaning it’s not impacted by extreme values
Can “eliminate” outliers
Typically used when the data is skewed
Mode
Most frequently observed value
Bimodal
When there are 2 modes
Multimodal
Usually accepted for multiple modes
What’s the mean and median when data is symmetrical?
Mean and median are equal
What’s the mean and median when the data is skewed left?
Mean is less than the median
What’s the mean and median when the data is skewed right?
Mean is greater than the median
Marginal distributions
The totals (usually in percents)
Conditional distribution
Locked into a certain variable/group on the table
Assosiation
Two variables are related if knowing the value of one variabe helps predict the value of the other
How do you describe data plots?
SOCS
Shape, outliers, center, spread
How does the data skew?
Are there any outliers? Make sure to say there are no outliers if there aren’t any
What is the middle most number?
Smallest to largest number
INCLUDE IQR!!!
Questions to ask yourself when looking at data
Who/what are the individuals the information is about?
What are the variables used to describe any characteristic of these individuals?
How do you find the IQR?
IQR = Q3-Q1 = # - # = IQR
What is the sentence for what an IQR is?
This means that the range of the middle half (what’s trying to be found) is (IQR #)
What is the sentence for what standard deviation tells us?
The (theme) typically vary from the mean by ABOUT (standard deviation)
What is the IQR related to?
Median (use if data is skewed)
What is standard deviation related to?
Mean (use if data is symmetrical)
Equation to find outliers
Q1 - 1.5(IQR) = __
Q3 - 1.5(IQR) = __
How to find a number from the 25th percentile in a bar graph
Add up total numbers from bar graph (how high the bar goes)
Find 25% of the number
Go the number over on the x-axis
What is a usual problem with a bar graph?
There isn’t a 0 on y-axis appropriately
Weird range (cherry picking)
Axis numbers are inconsistent
Looks distorted
How do you know what you’re finding percents for in conditional distribution?
EX) Find the conditional distribution of pizza preferences for each movie type
Whatever comes after each will be what you’re finding percentages for
Percentiles
the nth percentile of a distribution is the value with n percent of the observations LESS THAN the value in question
A way to describe someones location within a group (their relative standing)
Z-Scores
Another way to describe someone’s position within a distribution is to tell how many standard deviations above or below the mean their value is
Called standardizing
Only use if data is normal!!
What is the Z-Score formula?
z = (x - mean)/standard deviation
What does the Z-Score formula measure?
This value (z) measures the number of standard deviations above or below the mean
How are percentiles and z-scores related?
Percentiles and z-scores compare two or more values from different distributions
Standardizing can more fairly compare these values
How do you use percentiles in context?
EX) Boy with 22 pairs of shoes is more odd because the 85th percentile which means 15% of boys have as many or more shoes than him. The girls value (25th percentile) is closer to the median
What is the difference between percentiles and percents?
60% means he got a 60 out of 100, 60th percentile means he did better than 60% of others
What happens to measures of center (mean, median, and mode) in linear transformations (adding/subtracting and multiplying/dividing)?
Adding and subtracting from the data affects the measures of center (if add 5 then all add 5)
Multiplying and dividing multiplies/divides measures of center by amount and measures of location and spread (SHAPE DOESN’T CHANGE)
What happens to measures of variability (IQR, standard deviation) in linear transformations (adding/subtracting and multiplying/dividing)?
Adding and subtracting from the data doesn’t change the measures of variability
Multiplying and dividing changes the measures of variability
How do you find cumulative frequencies?
Add previous frequencies to each other
EX) 2, 7, 13, 12
CF 1 = 2, CF 2 = 9, CF 3 = 22, CF 4 = 34
How do you find cumulative percentages?
Frequency divided by total cumulative frequency
EX) F = 2, Total CF = 44
Cumulative % = (2/44) x 100 = 4.5%
How do you make a histograph from a line graph?
Estimate cumulative frequencies (the dots)
Find differences between the cumulative frequencies
Corresponds with the right bar
The coordinates are (x-axis, C%)
The C% is from the previous number
Empirical rule
1 sd = 68%
2 sd = 95%
3 sd = 99.7%
what rules must a density curve follow to be a density curve?
it exists above or below the x-axis (no negative)
the area under the curve equals 1 (100%)
the area under the curve to the left of a given point is equal to the proportion of values that fall below that given point
mean and median when skewed right
mean is greater than median
mean and median when skewed left
mean is less than median
mean and median when symmetrical
mean equals median
names for normal distributions
symmetric
single-peaked
bell-shaped
what is mu
the populations mean
what is sigma
the populations standard deviation
steps for normal distribution calculations
define x
define normality (x~N(m,sd))
write probability is applicable (P(x<#), etc)
standardize z (plug in #’s and use table A)
show “1 minus” when z>#
write conclusion in context (TTQA)
what does a graph look like when it’s normal
an approximately straight line
what does the graph look like when it’s not normal
curved
fun facts about z-scores
normal data? distribution of z-scores is too! z~N(0,1)
non-normal data? distribution of z-scores is NOT normal even though mean of z-scores still 0 and standard deviation is 1
cumulative frequency
add to previous numbers
cumulative percentages
frequency over total cumulative frequency
tools to turn cumulative frequency graph into histograph
plot using the percents, always use the previous percent when graphing
when histograph, subtract the dot by the previous dot
EX) What is the data above or below 8?
there is no data at a single point, only above or below (it would be 0)
when do you subtract a number from table A by 1?
when z is grater than the number
EX) find z when 85% observations fall above it
find 15% (.1500) on table A and that’s z
If shading in graph to the right of z, subtract by 1