1/63
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Frequency distribution (frequency table)
Often helpful in organizing and summarizing data
Measure of center
A value at or near the center or middle of a data set. Measures of center are often interpreted as “typical” values for a group
Mean, medium, mode
What are the most common measures of center?
∑
Denotes a sum. Pronounced “sigma”
𝑥
Denotes an individual data value
n
Denotes the number of values in a sample. Also called the sample size
N
Denotes the number of values in a population
𝑥̅
Denotes the sample mean. Pronounced “x bar”
𝜇
Denotes the population mean. Pronounced “mew”. The mean of the entire population. The value is generally unknown
mean
The … of a data set is found by adding all values and dividing by the number of values in the set
Median
The … of a data set is the value that is the middle when listed in ascending order. It shows what number separates the bottom 50% of the data from the top 50%. Roughly half of all values are below it and half are above it.
Mode
If it exists, it is the value that occurs with the greatest frequency. A data set may have none of this.
unimodal
A dataset with one mode is called ….
multiple
A data set may have … modes
modes
If there are multiple values tied for most frequent, they are all….
bimodal
A dataset with two modes is called….
multimodal
A data set with more than two is called…
This depends on your data?
Which measure of center is appropriate/best?
Outlier
A value that does not fit in with the overall pattern of the dataset may be considered an ….
Mean
Uses every data value/Highly affected by outliers/Not good for skewed data sets (but is best for symmetric data)
Median
Not affected by outliers/Can use with any data set
Mode
Not necessarily in the center/Not affected by outliers/Only useful for multimodal or qualitative data
histogram
The graph of a frequency distribution is called a … which can make it easier to interpret patterns
concepts
Bars of equal width drawn adjacent to each other (unless there are gaps in the data)/A horizontal scale representing classes of quantitative data values/A vertical scale (height of the bars) represents frequency. These are all the … of a histogram.
Dotplot
Shows each value in a dataset as a dot above a number line
categorical
Pie charts, bar charts, column charts, stacked charts, and others exist for …. data
pie chart
If the data have a natural order, a … is not the best choice. A bar chart whose horizontal axis put these in order
misleading
Vertical axis can exaggerate differences/Does the y-axis start at zero?/Is the y-axis skewed or stretched?/Using 3D can make certain categories seem larger/smaller/Misrepresenting areas (by mistake or on purpose) is misleading/Selecting the wrong type of graph to represent your data can be very confusing/Improper scaling (especially in pictographs) can exaggerate differences/Not labeling a graph makes readers fill in the blanks/Improper extraction does not show the whole picture. These are all the ways that graphs can be …
Frequency or probability
…. distributions can take on different shapes
Skewness
Is a measure of the asymmetry of a distribution. Values far from the “peak” skew a distribution in their direction
Normal distribution
In a symmetric distributionw, the mean = median = mode.These measure occur under the peak
Right-skewed (positively skewed)
In a … distribution, the mode < median < mean.Outliers, if any, appear on the right side
Left-skewed (negative skewed)
In a …. distribution, the mean < median < mode. Outliers, if any, appear on the left side
consistency
In everyday use, variability can mean a lack of …
Variability
The extent to which data points in a statistical distribution or data set diverge (or vary) from the average value
Range or interquartile range, Variance, and Standard deviation
What are some common measures of variation (or measures of spread) include:
Range
The … of a data set is the difference between the maximum and the minimum
Range
Maximum data value - minimum data value
affected by outliers
Since the range is calculated using only the two most extreme data values, it is highly…
Interquartile Range (IQR)
Uses what is called quartiles to provide a range of values that are not as affected by potential outliers as the range
Quartiles
… are values separate a data set in to fourths (or quarters, hence quartiles)
Q1
The first quartile
Q2
The second quartile
Q3
The third quartile
1/4
About … (25%) of the data lie between any two consecutive quartiles
Minimum
Q1
Median (Q2)
Q3
Maximum
The 3 quartiles together with the minimum and the maximum values constitute the five-number summary
Q3 and Q1
The IQR is the difference between the …
slightly
Sometimes, the median. is. excluded and the IQR will differ …
not affect
If there are a large number of data values, this will probably … the IQR much.
IQR may not
If there are a very small number of data values, then we can look directly at those values and a summary such as the …. be necessary
Boxplot
Is a visual representation of the 5 number summary and also helps identify outliers. Can be displayed vertically or horizontally
distribution
You can get a sense of the shape of a … from its boxplot
Variance
The …. is the square of the standard deviation (standard deviation)²
standard deviation
The … is the square root of the variance (√𝑣𝑎𝑟𝑖𝑎𝑛𝑐e)
same
Because the units of standard deviation are the … as the units of the data, the interpretation is easier to understand. Therefore, we tend to use the standard deviation. However, there are circumstances when it is easier to use the variance since it does not include a square root
standard deviation
The … is defined as a measure of how much data values deviate from the mean
negative
The value of the standard deviation is never…. It is zero only when all of the data values are exactly the same
larger
In standard deviation, … values indicate greater amounts of variations
increase
The standard deviation can … dramatically with one or more outliers
the same as
The units of the standard deviation (such as minutes, feet, pounds) are …. the units of the original data values
population or a sample
For variance and standard deviation, we use different symbols and we use different formulas, depending on whether the data set is from a ….
Population variance
𝜎 2 (𝜎 is the Greek lower-case letter “sigma”, we call this “sigma squared”)
Standard deviation
𝜎 (𝜎 is the Greek lower-case letter “sigma”)
Sample variance
𝑠 2 (we read this “s squared”)