1/71
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What do measures of variability describe in descriptive statistics?
The spread or dispersion of the data.
Name the three most common measures of variability.
The range, the standard deviation (and the variance), and the coefficient of variation.
How is the range calculated?
It is the difference between the highest and lowest value in a set of data (Range = Highest value - Lowest value).
What is a main disadvantage of using the range as a measure of variability?
It is extremely sensitive to outliers and is not a resistant measure of variability since it depends only on the maximum and minimum observations.
What does standard deviation measure?
It measures the average deviation of data values from the mean, considering every value in the distribution.
What symbols are used to denote the population standard deviation and the sample standard deviation?
The population standard deviation is denoted by σ (sigma), and the sample standard deviation is denoted by s.
How does a data set with a large standard deviation differ from one with a small standard deviation?
A data set with a large standard deviation or variance is usually more spread about the mean, while one with a small standard deviation or variance is usually more clustered about the mean.
What is the variance, and how is it related to the standard deviation?
The variance is the average of the squared deviations of values from the mean, and it is the square of the standard deviation.
List some important properties of standard deviation.
It measures variability about the mean, should be used only with the mean as the center, is always non-negative (s ≥ 0), increases with larger variability, has the same units as the original observations, and is sensitive to outliers.
How is standard deviation used to identify "unusual" or "extreme" data values?
A data value that is more than 2 standard deviations above or below the mean is considered unusual or extreme.
What is the Coefficient of Variation (C.V.), and why is it a useful measure of dispersion?
The Coefficient of Variation is a relative measure of dispersion, considering the size of the standard deviation relative to the mean. It is useful because it has no unit and measures variability as a percentage, allowing comparison of variability between groups with different means or variables with different units.
What is the main purpose of Descriptive Statistics?
To describe the main characteristics or features of a dataset, such as its mean, median, mode, or standard deviation.
What is the difference between a population parameter and a descriptive statistic (sample statistic)?
A population parameter quantitatively describes a characteristic of data from a population (e.g., population mean μ), while a descriptive statistic quantitatively describes a characteristic of data from a sample (e.g., sample mean x̄).
What do measures of central tendency describe?
They describe the central value, location, or a typical value of a distribution.
How is the arithmetic mean calculated?
The arithmetic mean is calculated by summing all values in a dataset and dividing by the total number of observations (sum of all values / number of observations).
How are outliers related to the arithmetic mean?
The arithmetic mean is sensitive to extreme values (outliers), meaning outliers can significantly affect its value.
How is the median calculated for a simple distribution?
To find the median, first place all values in order from smallest to largest. Then, locate the value at the median position, which is (n+1)/2, where n is the number of observations.
How do outliers affect the median?
The median is not significantly affected by extreme values or outliers, making it useful for skewed distributions.
What is the mode of a distribution?
The mode is the value that occurs most frequently in a distribution. A distribution can have no mode, one mode, or several modes.
When is the trimmed mean used?
The trimmed mean is used when there are outliers in the data; it removes a percentage of the smallest and largest values before calculating the mean to reduce the influence of these extremes.
What is the purpose of a weighted average?
A weighted average is used when some numbers in a dataset need to be assigned more importance or 'weight' than others. It's calculated as the sum of (value × weight) divided by the sum of weights.
Which measure of central tendency is most appropriate for nominal data?
The mode is generally the most appropriate measure of central tendency for nominal data.
Which measure(s) of central tendency are resistant to outliers?
The mode, median, and trimmed mean are resistant to extreme values because they are not affected much by outliers in a data set.
How do the mean, median, and mode relate in a left-skewed distribution?
In a left-skewed distribution, the mean is typically less than the median, which is less than the mode (Mean < Median < Mode).
How do the mean, median, and mode relate in a right-skewed distribution?
In a right-skewed distribution, the mean is typically greater than the median, which is greater than the mode (Mean > Median > Mode).
How do the mean, median, and mode relate in a symmetric distribution?
In a symmetric distribution, the mean, median, and mode are approximately equal (Mean = Median = Mode).
What are the primary objectives when summarizing and presenting quantitative data?
To summarize the distribution of a quantitative variable with frequency tables (simple, relative, grouped) and to make, describe, and compare histograms of quantitative data distributions.
What is a Simple Distribution in the context of quantitative variables?
A list of data values placed in ascending order.
What is a Simple Frequency Distribution?
It shows all the values a variable can take and the number of times (frequency, f) each value appears in the data set.
What is a Grouped Frequency Distribution?
It shows categories of values that a variable can take and the number of times (frequency, f) a value from the data set appears in a given category.
In frequency distributions, what does 'x' always represent?
The variable being measured.
In frequency distributions, what does 'n' always represent?
The sample size.
What are the steps to construct a Simple Frequency Distribution from raw data?
List all the unique values of the variable (x). 2. Count the number of individuals who answered each of the values of x (frequency, f).
How do you verify the sample size in a simple frequency distribution?
By summing all the frequencies (f) in the distribution.
What is the primary purpose of organizing data into a grouped frequency distribution?
To make the distribution shorter and more readable by grouping observations (data) into classes.
In a grouped frequency distribution, what do percentages (%) communicate?
The frequency of each class in percentage form relative to the total sample size.
What is Cumulative Frequency (cf) in a grouped frequency distribution?
The sum of the frequencies of a given class and all classes that came before it.
What is Cumulative Percentage (c%) in a grouped frequency distribution?
Similar to cumulative frequency, but instead of adding up frequencies, percentages are added up cumulatively.
What do midpoints (m) represent in a grouped frequency table?
The 'middle value' of each class interval.
What is the formula for calculating the midpoint (m) of a class in a grouped frequency distribution?
Midpoint (m) = (Lower limit + Upper limit) / 2
What is a histogram?
A special type of bar graph used to display the distribution of quantitative data.
What is a key characteristic of the bars in a histogram?
There are no spaces between the bars of a histogram.
What do the heights of the bars in a histogram represent?
The frequencies or relative frequencies of values in each interval.
What aspects should be described when interpreting the overall pattern of a histogram?
Its shape, center, variability/spread, and any outliers.
What is an outlier in the context of a histogram or data distribution?
An individual value that falls outside the overall pattern of the data.
Describe the characteristics of a mound shape/symmetric distribution.
It is single-peaked, with much of the data clustered around one clear center, and observations decrease as one moves away from the center in either direction. Both sides are roughly the same if folded vertically down the middle.
What does skewness refer to in a histogram?
A histogram in which one tail is stretched out longer than the other, with the direction of skewness indicating the side of the longer tail.
What is a Right (Positively) skewed distribution?
While most of the data are clustered around a low value, a number of cases stretch out into the higher (right) values.
What is a Left (Negatively) skewed distribution?
While most of the data are clustered around a large value, a number of cases stretch down into the lower (left) values.
What characterizes a bimodal distribution?
It has two distinct peaks, meaning two classes with the largest frequencies are separated by at least one class, often indicating the presence of two separate populations within the data.
What are line graphs used to display?
Measurements of the same variable recorded at regular intervals over a period of time, useful for showing how data change over time.
What is another name for line graphs when they display data over time?
Time-series graphs.
In the example of student commuting time, what is the variable of interest?
Average commuting time (in minutes from home to school).
In the example of student commuting time, what is the sample size (n)?
35 students.
What is the primary purpose of organizing data?
To pinpoint where data values tend to concentrate, which is its distribution.
What does the distribution of a variable tell us?
It tells us what values the variable takes and how often it takes these values.
What is a frequency distribution table for a categorical variable?
It lists the categories and gives either the count, relative frequency, or percent of individuals who fall in each category.
What are cross-tabulations?
Tables that display the distribution of data across two categorical variables.
How is relative frequency calculated?
Relative frequency is calculated as the frequency (f) divided by the sum of all frequencies (sigma f), which is equal to the sample size (n).
What are descriptive statistics?
Numbers that describe certain characteristics of a sample, highlighting salient features of a data distribution.
How do you calculate a proportion?
By dividing the portion you are interested in (frequency) by the whole (sample size).
How do you convert a proportion to a percentage?
Multiply the proportion by 100.
From the social media preference example with n=50, what proportion of respondents prefer Instagram if 9 people preferred it?
0.18 (9 divided by 50).
From the social media preference example, what is the ratio of respondents who preferred Snapchat to those who prefer Twitter if 8 preferred Snapchat and 4 preferred Twitter?
2 to 1 (8 divided by 4).
What is a pie chart used for?
To show the distribution of a categorical variable as a 'pie' whose slices are sized by the percentage for the categories, emphasizing each category
's relation to the whole.
When should pie charts be avoided?
If there are too many categories, if the percentages do not sum to 100%, or to display distribution across two categorical variables.
What is a bar chart used for?
To represent each category of a variable as a bar, where bar heights show category counts or percentages.
What is a cluster bar chart?
A bar chart that displays and compares two or more groups along the same variable.
What is a segment bar chart?
A chart that displays the distribution of a categorical variable as portions (segments) of a rectangle, with the area of each segment proportional to the percentage of individuals in the corresponding category.
When is a bar chart generally preferred over a pie chart?
When comparing the magnitude of differences between categories, when there is a larger number of categories, or when emphasizing the distribution of data.
When is a pie chart generally preferred over a bar chart?
When emphasizing the relationship of parts to a whole, with a smaller number of categories, or for a simple comparison of proportions or percentages, provided all categories
' percentages sum to 100%.
What are the key elements of a good graph?
Title, plot, source, legend, and axis titles.