1/57
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Parameter
A parameter is a numerical summary of a variable for the entire population (typically unknown)
Variable
A variable is the characteristic of the units that we want to learn about
Statistic
A statistic is the numerical summary of a variable for a sample (used to estimate the parameter)
Sampling Frame
A sampling frame is the list of units from which a sample is selected
Census
A census is a special case when every unit in the population is measured or surveyed. It is often difficult or impossible to conduct a census.
Sample
A sample is the smaller group. A sample is part of the population we actually examine in order to gather information. In equations, sample size is represented as n.
Population
A population is the entire group of items or individuals (units) that we want information about.
Randomization
Randomization is necessary to ensure meaningful inference.
The Sample Statistic is used to estimate...
the sample statistic is used to estimate the population parameter.
Random Sampling
Random Sampling uses chance mechanism which avoids bias. By Random Sampling we are more likely to have a sample statistic that better reflects the population parameter. Appropriate inference is only assured when a random sample is selected.
A biased sample may produce sample statistics that are...
A biased sample may produce sample statistics that are consistently higher or lower than the population parameter.
Voluntary Response Sample
This is a bad sampling method. In a Voluntary Response Sample only those who volunteer to participate are included in the sample. Voluntary Response Sample people tend to have stronger opinions than the general population. (online polls).
Convenience Sample
This is a bad sampling method. In Convenience Samples the most convenient or readily available group is considered as the sample. (People walking by the brickyard).
Simple Random Sample (SRS)
This is a good sampling method. A Simple Random Sample is every different possible sample of the desired size has the same chance of being selected.
Stratified Random Sample
This is a good sampling method. A Stratified Random Sample is when the population is first divided into non-overlapping groups called strata. Then, a random sample is selected from each group. Within a stratum, every person has the same chance of being selected.
Cluster Sample
This is a good sampling method. In Cluster Samples, the population is first divided into overlapping groups called clusters. Then a random sample of clusters is selected and all the individuals in the selected cluster are included in the sample. Every cluster has the same chance of being selected; however sometimes not all groups are represented in the sample.
Systematic Sample
This is a good sampling method. A Systematic Sample is when the population is a list divided into consecutive segments. One individual is randomly selected from the first segment and the same position is selected from each of the remaining groups. (select every Kth unit from the random starting point)
Selection Bias
Selection Bias is when sample participants tend to systematically differ from the population of interest.
Undercoverage
Type of survey bias. Undercoverage is the tendency for a sample to differ from the corresponding population because the sampling frame excludes some parts of the population. (minority)
Nonresponse Bias
Type of survey bias. Nonresponse Bias is the tendency for a sample to differ from the corresponding populatio because a subset of the sample cannot be contacted or does not respond.
Response Bias
Type of survey bias. Response Bias is the tendency for a sample to differ from the corresponding population because participants respond differently from how they truly feel.
Categorical Variable
A Categorical Variable places a unit into one of several groups or categories such as major, car type, hair color, or letter grades (A, A+, B, B-). Categorical Data is displayed using pie charts or bar graphs.
Quantitative Variable
A Quantitative Variable takes numeric values for which arithmetic operations such as adding and averaging make sense such as height, age, exam score, and points. Quantitative data is displayed using histograms, dot plots, or box plots.
Left Skewed
in left skewed data, the median is bigger than the mean
Right Skewed
in right skewed data, the mean is larger than the median
Symmetric
in symmetric data, the mean and median are equal.
Mean
Mean is the average; to get the mean you add all the observed values and divide by the number of observations. The Mean is sensitive to outliers. The Mean moves in direction of skeweness.
Median
Median is the middle value of ordered data. To find the Median, order all the values from smallest to largest and find the middle number.
The Median is not sensitive to outliers. The Median is not effected by skewness. The median implies there is 50% of data above it and 50% below it.
Center
the Center is the middle of the data, or where the distribution would balance. Measures of center include mean (average) and median (the middle value). Measures of center are affected by adding, subtracting, multiplying, and dividing the original data by the same value.
Spread
the spread is the measure of variability. measures of spread include range, interquartile range (IQR) and Standard Deviation. Measures of Spread are only affected by multiplying and dividing.
Deviations
Deviations from overall pattern. Look for possible outliers, or unusual points that are not consistent with the rest of the data.
Interquartile Range (IQR)
The interquartile range is a single number equal to the third quartile minus the first quartile.
Standard Deviation
The Standard Deviation is a measure of how far, on average, the data values are from the mean. The Standard Deviation cannot be negative, and is rarely zero (which means there is no variation because they are all the same number). Standard Deviation is a measure of Spread and is affected by multiplying or dividing all the values by the same number. Standard Deviation takes into account all of the data.
Normal Distribution
Normal Distributions have a bell shape. Normal Distribution is symmetric and bell shaped.
The Normal Distribution is characterized by its mean, which is at the center of the distribution, and its standard deviation.
The Empirical Rule
In the Empirical Rule for any bell shaped curve 68% of the data will fall within 1 standard deviation of the mean. 95% of the data will fall within 2 standard deviations of the mean in either direction. 99.7% of the observations will fall within 3 standard deviations of the mean in either direction.
Standardize / Z Score
The Z Score is the distance between an observation and the mean, measured in terms of number of standard deviations. We use the Z Score to find percentages or probabilities for a normal distribution with any mean and any standard deviation.
Z -Score values that are above (greater than) the mean will have positive z-scores. Z- Score values that are below (less than) the mean will have negative z-scores. Most z-scores will be between -3 and +3. The Z-score follows a standard normal distribution with a mean of zero and a standard deviation of 1.
Finding Probabilities from Table Z
P( Z < number) = use number in table. P(Z>number) = use 1 - number in table.
The Standard Normal Distribution
The Standard Normal Distribution has a mean of ZERO and a standard deviation of ONE.
Sample Proportion
Sample Proportion is the number of items that fall into a given category divided by the total number of observations in your sample. Categorical Data. It is shown as P Hat. Answers a yes or no question.
Sampling Variability
Sampling Variability is the variation in sample statistics that results from selecting different random samples.
p
population proportion. There can be only one value of p.
p hat
sample proportions. There can be several values of p hat.
Sampling Distributions are..
Sampling Distributions are predictable.
Distributions need to have...
Distributions need to have shape, center, and spread.
(Z*) Z Multiplier
the Z multiplier tells us how many standard deviations away we believe our estimate is from the true parameter. Use Table Z or Table t.
Confidence Level
Confidence Level is if we take many samples from the same population, the proportion of samples that will produce a confidence interval that contains the true population
parameter.
If an outlier is present in a data set it...
If an outlier is present in a data set it can make the mean and median very different from each other
If N is less than 30 and the population is skewed then the sampling distribution will be..
If N is less than 30 and the population is skewed then the sampling distribution will be skewed but not as much as the population.
If N (sample) is more than 30 and the population is skewed then the sampling distribution will be...
If N is more than 30 and the population is skewed then the sampling distribution will be approximately normal
If N (sample) is less than 30 and the population is normal then the sampling distribution will be...
If N is less than 30 and the population is normal then the sampling distribution will be approximately normal
If N (sample) is large and the population is normal then the sampling distribution will be...
If N is large and the population is normal then the sampling distribution will be approximately normal.
Does the sampling distribution always have more or less variability than the population?
The sampling distribution always has less variability than the population.
As the sample size INCREASES the variability in the sample mean...
As the sample size INCREASES the variability in the sample mean DECREASES.
Central Limit Theorem (CLT)
The Central Limit Theorem states if the variable Y follows ANY distribution with mean and standard deviation and the sample size is large, or more than 30, then Y BAR from a simple random sample follows a NORMAL DISTRIBUTION.
If the parent population is normal then the sampling distribution will be...
If the parent population is normal then the sampling distribution will be normal no matter the sample size.
Inference
Inference is the process of using sample information to make conclusions about the population of interest.
Standard Error
Standard Error is an estimate of the standard deviation of the sampling distribution. The Standard Error only depends on sample quantities. Standard Error is key in calculating confident intervals
Margin of Error
The Margin of Error is the distance from the population parameter that will include most of the possible values of a sample statistic