1/39
Flashcards covering key concepts from Sections 1.1-4.6 of the lecture notes on Foundations of Statistics, Variables & Data, Organizing & Displaying Data, Descriptive Statistics, and Probability.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Population
The entire group of individuals or objects under consideration in a study.
Sample
A subset of the population from which data is collected.
Descriptive statistics
Methods for organizing, summarizing, and displaying data through charts, tables, and numerical summaries.
Inferential statistics
Methods that involve using sample data to make generalizations or draw conclusions about a population.
Observational study
A study where researchers observe individuals and measure variables of interest without influencing the responses or assigning treatments.
Designed experiment
A study where researchers actively impose some treatment on one or more groups to control variables and observe effects.
Qualitative variable
A variable that categorizes or describes an attribute, often represented by non-numerical labels or names.
Quantitative variable
A variable that takes on numerical values, representing counts or measurements.
Discrete data
Quantitative data whose values are finite or countable, often resulting from counting processes.
Continuous data
Quantitative data whose values can take on any value within a given interval, often resulting from measuring processes.
Frequency distribution
A table that lists all categories or classes of data and the number of occurrences (frequencies) in each category.
Relative frequency distribution
A table that lists all categories or classes of data along with the proportion or percentage of observations in each category.
Bar chart
A graphical display used for qualitative data, where bars represent frequencies or relative frequencies of categories.
Pie chart
A circular graph used for qualitative data, divided into sectors proportional to the relative frequencies of categories.
Histogram
A graph used for quantitative data, particularly continuous data, where bars represent frequency or relative frequency of data within intervals.
Stem-and-leaf plot
A graphical display of quantitative data that separates each value into a 'stem' (first digit(s)) and a 'leaf' (last digit).
Symmetric distribution
A distribution shape where the left side of the graph is roughly a mirror image of the right side.
Skewed left distribution
A distribution shape where the tail extends further to the left, indicating that most data values are clustered on the right.
Skewed right distribution
A distribution shape where the tail extends further to the right, indicating that most data values are clustered on the left.
Modality
A measure of the number of peaks in a distribution (e.g., unimodal, bimodal).
Misleading graphs
Graphs that distort the visual representation of data, such as by truncating axes or using uneven bar widths, to create a false impression.
Mean
The arithmetic average of a data set, calculated by summing all values and dividing by the number of values.
Median
The middle value of a data set when it is arranged in ascending or descending order. If there's an even number of values, it's the average of the two middle numbers.
Mode
The value that appears most frequently in a data set.
Range
The difference between the maximum and minimum values in a data set, representing the overall spread.
Interquartile Range (IQR)
The range of the middle 50% of the data, calculated as the difference between the third quartile (Q3) and the first quartile (Q1).
Variance
A measure of the average of the squared deviations from the mean, indicating the spread of data points.
Standard deviation
The square root of the variance, representing the typical distance of data points from the mean.
Five-number summary
A set of five descriptive statistics for a data set: minimum value, first quartile (Q1), median, third quartile (Q3), and maximum value.
Boxplot
A graphical display of the five-number summary, showing the distribution's center, spread, and potential outliers.
Z-score
A standardized score that indicates how many standard deviations a data point is from the mean of its distribution.
Probability (rules)
The likelihood of an event occurring, always a value between 0 (impossible) and 1 (certain).
Addition Rule (General)
P(A or B) = P(A) + P(B) - P(A and B), used to find the probability of at least one of two events occurring.
Addition Rule (Mutually Exclusive)
P(A or B) = P(A) + P(B), used when two events cannot occur at the same time.
Complement Rule
P(not A) = 1 - P(A), used to find the probability that an event does not occur.
Conditional Probability
The probability of an event occurring given that another event has already occurred, calculated as P(B|A) = P(A and B) / P(A).
Multiplication Rule
P(A and B) = P(A) * P(B|A), used to find the probability that two events both occur.
Joint probability
The probability of two or more events occurring together, often found in the cells of a contingency table.
Marginal probability
The probability of a single event occurring, found by summing probabilities across rows or columns in a contingency table.
Independence (of events)
Two events A and B are independent if the occurrence of one does not affect the probability of the other; P(A|B) = P(A), or P(A and B) = P(A) * P(B).