1/41
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Individuals
The subjects or objects in a statistical study being analyzed or measured, such as people, animals, or items.
Variables
Any characteristics or properties that can vary among individuals in a study.
Categorical variables
are variables that represent distinct categories or groups, such as gender, race, or yes/no responses, rather than numeric values.
Quantitative variables
are variables that represent numerical values, allowing for mathematical operations and measurements, such as height, weight, or age.
Discrete variables
are a type of quantitative variable that can take on a finite number of values, often counting numbers, such as the number of children in a family or the number of cars in a parking lot.
Continuous variables
are a type of quantitative variable that can take on an infinite number of values within a given range, such as temperature or time.
Relative Count
is a measure that expresses the frequency of an category relative to the total number of categories, often used to compare proportions (ex. 2 federalists out of 10 —> 20%)
Two way tables
are a tabular method used to display the relationship between two categorical variables, allowing for simultaneous analysis of their frequencies and patterns; can be harder to read depending on audience.
Distribution
refers to the way in which values of a variable are spread or arranged, often represented graphically through histograms or density plots.
Bar Graphs
are graphical representations of categorical data to show the frequencies or relative sizes of different categories, making it easy to compare data across categories.
Pie Graphs
are circular charts divided into slices to illustrate numerical proportions, with each slice representing a category's contribution to the total.
Histogram
is a graphical representation of the distribution of quantitative data, often showing the frequency of data points within specified ranges or bins.
Stem and Leaf Plot
is a method of displaying quantitative data, where each data point is split into a stem (the leading digit(s)) and a leaf (the trailing digit), providing a way to visualize the distribution while maintaining the original data.
Back-to-back stemplots
are comparative stem-and-leaf plots that display two related sets of data simultaneously. One set is displayed to the left of a central axis and the other to the right, facilitating direct comparison between the two distributions.
Split Stem and Leaf
is a variation of the stem-and-leaf plot where the stems are split into smaller groups, enabling a more detailed representation of the data distribution while preserving the original values.
Dot plot
is a simple statistical chart that uses dots to represent the frequency of data points along a number line. Each dot corresponds to one data value, making it easy to see the distribution and frequency of the dataset.
Cumulative relative frequency graph (ogive)
is a graph displaying the cumulative sums of relative frequencies for a dataset. It illustrates how many observations fall below a particular value, helping to visualize the distribution of data.
Line Graph
is a type of chart that displays information as a series of data points called 'markers' connected by straight line segments. It is commonly used to visualize trends over time or continuous data.
Describing Distributions: SOCS
is a mnemonic used to summarize the key features of a distribution: Shape, Outliers, Center, and Spread. This approach helps in effectively communicating the essential characteristics of a dataset.
Unimodal
describes a distribution with a single peak or mode, indicating the most frequent value in the dataset.
Bimodal
a distribution with two distinct modes or peaks, indicating two different groups or clusters within the data.
Uniform
describes a distribution where all values have the same frequency, resulting in a flat, even appearance. This indicates that every outcome in the dataset is equally likely.
Symmetric
a distribution that is identical on both sides of its central point, where the left half mirrors the right half, indicating equal frequencies of values around the mean.
Left skewed
a distribution that has a longer tail on the left side, indicating that the majority of the data points are concentrated on the right.
Right skewed
a distribution that has a longer tail on the right side, indicating that the majority of the data points are concentrated on the left.
Outlier
a data point that significantly deviates from the other observations in a dataset, often due to variability or measurement error. Plot the data to find them.
Center
the central value of a dataset, often measured by the mean, median, or mode, which summarizes the data's location.
Resistant vs Nonresistant Measures
Resistant measures are statistical values that are not significantly influenced by extreme observations (outliers), while nonresistant measures can be heavily affected by them. Ex: Mean is nonresistant; mode and median are.
Measures of spread
Range, standard deviation, and inter quartile range.
Variance
a measure of how much a set of numbers differs from the mean, calculated as the average of the squared differences from the mean (divide by n+1 instead of n).
Range
The difference between the highest and lowest values in a dataset.
Standard deviation
A measure that quantifies the amount of variation or dispersion of a set of values, indicating how much the individual data points differ from the mean.
Inter quartile range
The difference between the first and third quartiles, representing the range of the middle 50% of the data.
Five number summary
A descriptive statistic that provides information about a dataset through five key values: the minimum, first quartile, median, third quartile, and maximum.
Box plot
A graphical representation of a dataset that displays the distribution through the five number summary: minimum, first quartile, median, third quartile, and maximum. It helps visualize the spread and skew of the data.
Outlier test
A statistical method used to identify values that deviate significantly from the rest of the dataset, often defined as values more than 1.5 times the interquartile range above the third quartile or below the first quartile.
Modified box plot
A variation of a box plot that adjusts for outliers by using a different mechanism (dots, stars) to define the whiskers, often extending them to the last data points within 1.5 times the interquartile range.
IQR Rule
Method for detecting outliers based on the interquartile range (IQR), where any data point that lies more than 1.5 times the IQR above the third quartile or below the first quartile is considered an outlier.
Standard Deviation Rule
A rule for identifying outliers by determining how many standard deviations a data point is from the mean, considering points beyond 2 standard deviations as outliers.
Normal curve
A bell-shaped curve that represents the distribution of values, where most observations cluster around the central peak and probabilities for values far from the mean taper off equally in both directions.
Z-Score
A statistical measurement that describes a value's relationship to the mean of a group of values, expressed in terms of standard deviations from the mean.
Symbols for Mean and Standard Deviation
Population: Mean is represented by the symbol ( \mu ) and standard deviation by ( \sigma ).
Sample: Mean is represented by the symbol ( ar{x} ) and standard deviation by ( s ).