Advanced Stats Unit 2

0.0(0)

Studied by 0 people

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/41

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

42 Terms

New cards

Individuals

The subjects or objects in a statistical study being analyzed or measured, such as people, animals, or items.

New cards

Variables

Any characteristics or properties that can vary among individuals in a study.

New cards

Categorical variables

are variables that represent distinct categories or groups, such as gender, race, or yes/no responses, rather than numeric values.

New cards

Quantitative variables

are variables that represent numerical values, allowing for mathematical operations and measurements, such as height, weight, or age.

New cards

Discrete variables

are a type of quantitative variable that can take on a finite number of values, often counting numbers, such as the number of children in a family or the number of cars in a parking lot.

New cards

Continuous variables

are a type of quantitative variable that can take on an infinite number of values within a given range, such as temperature or time.

New cards

Relative Count

is a measure that expresses the frequency of an category relative to the total number of categories, often used to compare proportions (ex. 2 federalists out of 10 —> 20%)

New cards

Two way tables

are a tabular method used to display the relationship between two categorical variables, allowing for simultaneous analysis of their frequencies and patterns; can be harder to read depending on audience.

New cards

Distribution

refers to the way in which values of a variable are spread or arranged, often represented graphically through histograms or density plots.

New cards

Bar Graphs

are graphical representations of categorical data to show the frequencies or relative sizes of different categories, making it easy to compare data across categories.

New cards

Pie Graphs

are circular charts divided into slices to illustrate numerical proportions, with each slice representing a category's contribution to the total.

New cards

Histogram

is a graphical representation of the distribution of quantitative data, often showing the frequency of data points within specified ranges or bins.

New cards

Stem and Leaf Plot

is a method of displaying quantitative data, where each data point is split into a stem (the leading digit(s)) and a leaf (the trailing digit), providing a way to visualize the distribution while maintaining the original data.

New cards

Back-to-back stemplots

are comparative stem-and-leaf plots that display two related sets of data simultaneously. One set is displayed to the left of a central axis and the other to the right, facilitating direct comparison between the two distributions.

New cards

Split Stem and Leaf

is a variation of the stem-and-leaf plot where the stems are split into smaller groups, enabling a more detailed representation of the data distribution while preserving the original values.

New cards

Dot plot

is a simple statistical chart that uses dots to represent the frequency of data points along a number line. Each dot corresponds to one data value, making it easy to see the distribution and frequency of the dataset.

New cards

Cumulative relative frequency graph (ogive)

is a graph displaying the cumulative sums of relative frequencies for a dataset. It illustrates how many observations fall below a particular value, helping to visualize the distribution of data.

New cards

Line Graph

is a type of chart that displays information as a series of data points called 'markers' connected by straight line segments. It is commonly used to visualize trends over time or continuous data.

New cards

Describing Distributions: SOCS

is a mnemonic used to summarize the key features of a distribution: Shape, Outliers, Center, and Spread. This approach helps in effectively communicating the essential characteristics of a dataset.

New cards

Unimodal

describes a distribution with a single peak or mode, indicating the most frequent value in the dataset.

New cards

Bimodal

a distribution with two distinct modes or peaks, indicating two different groups or clusters within the data.

New cards

Uniform

describes a distribution where all values have the same frequency, resulting in a flat, even appearance. This indicates that every outcome in the dataset is equally likely.

New cards

Symmetric

a distribution that is identical on both sides of its central point, where the left half mirrors the right half, indicating equal frequencies of values around the mean.

New cards

Left skewed

a distribution that has a longer tail on the left side, indicating that the majority of the data points are concentrated on the right.

New cards

Right skewed

a distribution that has a longer tail on the right side, indicating that the majority of the data points are concentrated on the left.

New cards

Outlier

a data point that significantly deviates from the other observations in a dataset, often due to variability or measurement error. Plot the data to find them.

New cards

Center

the central value of a dataset, often measured by the mean, median, or mode, which summarizes the data's location.

New cards

Resistant vs Nonresistant Measures

Resistant measures are statistical values that are not significantly influenced by extreme observations (outliers), while nonresistant measures can be heavily affected by them. Ex: Mean is nonresistant; mode and median are.

New cards

Measures of spread

Range, standard deviation, and inter quartile range.

New cards

Variance

a measure of how much a set of numbers differs from the mean, calculated as the average of the squared differences from the mean (divide by n+1 instead of n).

New cards

Range

The difference between the highest and lowest values in a dataset.

New cards

Standard deviation

A measure that quantifies the amount of variation or dispersion of a set of values, indicating how much the individual data points differ from the mean.

New cards

Inter quartile range

The difference between the first and third quartiles, representing the range of the middle 50% of the data.

New cards

Five number summary

A descriptive statistic that provides information about a dataset through five key values: the minimum, first quartile, median, third quartile, and maximum.

New cards

Box plot

A graphical representation of a dataset that displays the distribution through the five number summary: minimum, first quartile, median, third quartile, and maximum. It helps visualize the spread and skew of the data.

New cards

Outlier test

A statistical method used to identify values that deviate significantly from the rest of the dataset, often defined as values more than 1.5 times the interquartile range above the third quartile or below the first quartile.

New cards

Modified box plot

A variation of a box plot that adjusts for outliers by using a different mechanism (dots, stars) to define the whiskers, often extending them to the last data points within 1.5 times the interquartile range.

New cards

IQR Rule

Method for detecting outliers based on the interquartile range (IQR), where any data point that lies more than 1.5 times the IQR above the third quartile or below the first quartile is considered an outlier.

New cards

Standard Deviation Rule

A rule for identifying outliers by determining how many standard deviations a data point is from the mean, considering points beyond 2 standard deviations as outliers.

New cards

Normal curve

A bell-shaped curve that represents the distribution of values, where most observations cluster around the central peak and probabilities for values far from the mean taper off equally in both directions.

New cards

Z-Score

A statistical measurement that describes a value's relationship to the mean of a group of values, expressed in terms of standard deviations from the mean.

New cards

Symbols for Mean and Standard Deviation

Population: Mean is represented by the symbol ( \mu ) and standard deviation by ( \sigma ).

Sample: Mean is represented by the symbol ( ar{x} ) and standard deviation by ( s ).