1/60
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Parameter
a numerical summary of a variable for the entire population (typically unknown)
Variable
the characteristic of the units that we want to learn about
Statistic
the numerical summary of a variable for a sample (used to estimate the parameter)
Sampling Frame
the list of units from which a sample is selected
Census
a special case when every unit in the population is measured or surveyed
often difficult or impossible to conduct
Sample
the smaller group
part of the population we actually examine in order to gather information
represented as n in equations
Population
the entire group of items or individuals (units) that we want information about.
Randomization
necessary to ensure a meaningful inference
The Sample Statistic is used to estimate...
the population parameter
Random Sampling
uses chance mechanism which avoids bias
we are more likely to have a sample statistic that better reflects the population parameter
Appropriate inference is only assured when a random sample is selected.
A biased sample may produce sample statistics that are...
consistently higher or lower than the population parameter.
Voluntary Response Sample
This is a bad sampling method.
only those who volunteer to participate are included in the sample
people tend to have stronger opinions than the general population
ex. online polls
Convenience Sample
This is a bad sampling method
the most convenient or readily available group is considered as the sample
ex. people walking by the brickyard
Simple Random Sample (SRS)
This is a good sampling method
every different possible sample of the desired size has the same chance of being selected.
Stratified Random Sample
This is a good sampling method
when the population is first divided into non-overlapping groups called strata
Then, a random sample is selected from each group.
Within a stratum, every person has the same chance of being selected.
Cluster Sample
This is a good sampling method
the population is first divided into overlapping groups called clusters
Then a random sample of clusters is selected and all the individuals in the selected cluster are included in the sample
Every cluster has the same chance of being selected; however sometimes not all groups are represented in the sample.
Systematic Sample
This is a good sampling method
when the population is a list divided into consecutive segments
One individual is randomly selected from the first segment and the same position is selected from each of the remaining groups
select every Kth unit from the random starting point
Selection Bias
when sample participants tend to systematically differ from the population of interest.
Undercoverage
Type of survey bias.
the tendency for a sample to differ from the corresponding population because the sampling frame excludes some parts of the population
minority
Nonresponse Bias
Type of survey bias
the tendency for a sample to differ from the corresponding population because a subset of the sample cannot be contacted or does not respond.
Response Bias
Type of survey bias
the tendency for a sample to differ from the corresponding population because participants respond differently from how they truly feel.
Categorical Variable
places a unit into one of several groups or categories such as major, car type, hair color, or letter grades (A, A+, B, B-).
displayed using pie charts or bar graphs.
Quantitative Variable
takes numeric values for which arithmetic operations such as adding and averaging make sense such as height, age, exam score, and points
displayed using histograms, dot plots, or box plots.
Left Skewed
the median is bigger than the mean
all data towards the right of graph
Right Skewed
the mean is bigger than the median
all data towards the left of graph
Symmetric
the mean and median are equal
bell curve shape
Mean
the average
add all the observed values and divide by the number of observations
sensitive to outliers
moves in direction of skeweness.
Median
the middle value of ordered data
order all the values from smallest to largest and find the middle number.
not sensitive to outliers
not effected by skewness
implies there is 50% of data above it and 50% below it.
Center
the middle of the data, or where the distribution would balance
Measures include mean (average) and median (the middle value)
Measures are affected by adding, subtracting, multiplying, and dividing the original data by the same value.
Spread
the measure of variability
measures include range, interquartile range (IQR) and Standard Deviation
Measures are only affected by multiplying and dividing.
Deviations
Deviations from overall pattern
Look for possible outliers, or unusual points that are not consistent with the rest of the data.
Interquartile Range (IQR)
a single number equal to the third quartile minus the first quartile.
Standard Deviation
a measure of how far, on average, the data values are from the mean
cannot be negative, and is rarely zero (which means there is no variation because they are all the same number)
a measure of Spread and is affected by multiplying or dividing all the values by the same number
takes into account all of the data.
Normal Distribution
bell shape
symmetric
characterized by its mean, which is at the center of the distribution, and its standard deviation.
The Empirical Rule
for any bell shaped curve 68% of the data will fall within 1 standard deviation of the mean
95% of the data will fall within 2 standard deviations of the mean in either direction
99.7% of the observations will fall within 3 standard deviations of the mean in either direction.
Standardize / Z Score
the distance between an observation and the mean, measured in terms of number of standard deviations
used find percentages or probabilities for a normal distribution with any mean and any standard deviation.
values that are above (greater than) the mean will have positive z-scores
values that are below (less than) the mean will have negative z-scores
Most z-scores will be between -3 and +3
follows a standard normal distribution with a mean of zero and a standard deviation of 1.
Finding Probabilities from Table Z
P( Z < number) = use number in table
P(Z>number) = use 1 - number in table.
The Standard Normal Distribution
has a mean of ZERO and a standard deviation of ONE.
Sample Proportion
the number of items that fall into a given category divided by the total number of observations in your sample
Categorical Data
It is shown as P Hat
Answers a yes or no question.
Sampling Variability
the variation in sample statistics that results from selecting different random samples.
p
population proportion
There can be only one value of p.
p hat
sample proportions
There can be several values of p hat.
Sampling Distributions are..
predictable
Distributions need to have...
shape, center, and spread
(Z*) Z Multiplier
tells us how many standard deviations away we believe our estimate is from the true parameter
Use Table Z or Table t.
Confidence Level
if we take many samples from the same population, the proportion of samples that will produce a confidence interval that contains the true population parameter.
If an outlier is present in a data set it...
can make the mean and median very different from each other
If N is less than 30 and the population is skewed then the sampling distribution will be..
skewed but not as much as the population.
If N (sample) is more than 30 and the population is skewed then the sampling distribution will be...
approximately normal (m)
If N (sample) is less than 30 and the population is normal then the sampling distribution will be...
approximately normal (less)
If N (sample) is large and the population is normal then the sampling distribution will be...
approximately normal (la)
Does the sampling distribution always have more or less variability than the population?
less variability
As the sample size INCREASES the variability in the sample mean...
decreases
Central Limit Theorem (CLT)
states if the variable Y follows ANY distribution with mean and standard deviation and the sample size is large, or more than 30, then Y BAR from a simple random sample follows a NORMAL DISTRIBUTION.
If the parent population is normal then the sampling distribution will be...
normal no matter the sample size.
Inference
the process of using sample information to make conclusions about the population of interest.
Standard Error
an estimate of the standard deviation of the sampling distribution
only depends on sample quantities
key in calculating confident intervals
Margin of Error
the distance from the population parameter that will include most of the possible values of a sample statistic
The “box” in a boxplot indicates…
the start and end of the middle 50% of the data
The “whiskers” in a boxplot indicate…
the range of the data, ending at the minimum and maximum values that are not considered outliers
or at 1.5 times the Interquartile Range (IQR)
A boxplot is constructed of…
the minimum value
the first quartile
the median
the third quartile
the maximum value