1/28
A set of QUESTION_AND_ANSWER style flashcards covering key topics from the video lecture, including frequency distributions, histograms vs bar charts, normal distribution, skewness, floor/ceiling effects, stem-and-leaf plots, scatter plots, and common data-reporting pitfalls.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is the main purpose of a frequency distribution in data analysis?
To show how often each value or category occurs, often via a frequency table or graph (bar chart or histogram).
In a frequency histogram, what goes on the x-axis and what goes on the y-axis?
X-axis shows the variable values or interval bins; Y-axis shows the frequencies (counts).
Which variable types typically use bar charts, and which use histograms?
Nominal and ordinal variables use bar charts; scale/continuous variables use histograms.
What is the key difference between a bar chart and a histogram?
Bar charts have gaps between bars; histograms have touching bars because they represent continuous intervals.
What is a group frequency table and why is it used?
A frequency table that bins values into evenly spaced intervals to summarize data with many distinct values.
Why is precise bin width and boundary labeling important in group frequency tables?
To avoid double counting and ensure accurate interval assignment, especially when decimal cutoffs determine bin boundaries.
Describe a normal distribution.
Bell-shaped, symmetric, and unimodal; mean equals the mode; data tails are symmetric and extend indefinitely (asymptotes).
What does unimodal mean in a distribution?
Having a single peak (one mode).
In a normal distribution, how are mean, median, and mode related?
They are equal (mean = median = mode).
What is the standard deviation used to measure in a normal distribution?
Dispersion around the mean; defines how spread out the data are and helps determine the 68–95–99.7 rule.
State the 68-95-99.7 rule for a normal distribution.
Approximately 68% of observations lie within 1 SD of the mean; about 95% within 2 SD; about 99.7% within 3 SD.
What happens to the mean and distribution when an extreme high value is added (an outlier)?
The mean increases and the distribution becomes positively skewed (tail to the right), potentially distorting interpretation.
What does skew tell you about a distribution?
Skewness describes asymmetry; negative skew has a tail to the left, positive skew has a tail to the right.
What are floor effects and ceiling effects in measurement?
Floor effect: scores cannot go lower than the minimum; ceiling effect: scores cannot go higher than the maximum (e.g., GPA 0–4.0, ACT 0–36).
What is a stem-and-leaf plot?
A display for small data sets where stems are leading digits and leaves are trailing digits, revealing the distribution shape.
What is a frequency polygon?
A line graph that connects the midpoints of histogram bins, offering an alternative view of the distribution.
What is a scatter plot and what does it show?
A plot of two continuous variables to reveal their relationship; can show linear or nonlinear trends and requires both variables to be on a scale.
What is a range frame in plotting and why is it used?
Starting the axis at a higher minimum (or truncating the axis) to emphasize variation and reduce whitespace for easier interpretation.
How do you identify a positive vs negative relationship in a scatter plot?
Positive: as x increases, y increases. Negative: as x increases, y decreases.
What is a linear relationship in scatter plots?
Data points roughly fall along a straight line, indicating a constant rate of change between variables.
What is a nonlinear relationship in scatter plots?
A curved pattern (e.g., parabola) or no clear straight-line fit, indicating a non-constant relationship.
What is interpolation in data reporting, and why can it be problematic?
Leaving out or filling in data within the observed range to smooth trends; can mislead if important points are omitted.
What is extrapolation in data reporting, and why can it be problematic?
Predicting beyond the observed data range by extending trends; risky because future behavior may differ from the past.
What is false face validity in data reporting?
Using data that appears related to an assertion but doesn't truly justify the conclusion; can mislead if the match is superficial.
What is a sneaky sample?
A non-representative sample biased toward a group likely to give favorable responses, leading to distorted conclusions.
What is a biased scale in surveys?
A scale that nudges responses toward a desired direction by omitting or weighting options to influence answers.
What is outright lying in data reporting?
Deliberate falsification or fabrication of data; unethical and detectable by scrutiny.
Why is a continuous variable requirement important for scatter plots?
Scatter plots require two scale (continuous) variables to meaningfully assess relationships.
How is a bar chart different from a histogram in terms of data categories?
Bar charts depict discrete categories with gaps between bars; histograms depict continuous intervals with touching bars.