stats2

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/47

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

48 Terms

New cards

Contingency Table

Displays the distribution frequency of categorical variables

New cards

Bar Plot

A graph that represents numeric categorical variables with rectangles

New cards

Making a bar plot in R

ggplot(Dataset) + aes(x = Variable1, y = Variable2) + geom_bar()

New cards

Making a 2 way contingency table in R

Dataset %>% count(Variable1, Variable2) %>% mutate(prop = n / sum (n)) %>% pivot_wider (names_from = Variable1, values_from = Variable2)

New cards

2 way contingency table

Displays frequencies for combinations of two categorical variables. They classify outcomes for one variable in rows and the other in columns

New cards

Overall 2 way contingency table

Dividing the observations by the overall total

New cards

Conditional, row or column, 2 way contingency table

Dividing the observations by the row or column total

New cards

Stacked bar plot

Showing multiple (raw data) values for each category by stacking bars atop each other. Total bar height represents the sum

New cards

Dodged bar plot

Compares bars side by side

New cards

Standardized stacked bar plot

Like a regular stacked bar plot but with proportions

New cards

Unimodal distribution

Probability distribution with a clear peak

New cards

Bimodal distribution

Has two clear peaks

New cards

Multimodal distribution

Having two or more peaks

New cards

Uniform distribution

Every outcome is equally likely to occur

New cards

Symmetric distribution

Distribution where the left and right sides mirror each other

New cards

Left skewed distribution

Having a tail on the left

New cards

Right skewed distribution

Having a tail on the right

New cards

Skewness

A measure of asymmetry of a distribution

New cards

Histogram

Like a bar graph without gaps between the bars

New cards

Density plot

Like a histogram except it uses a smooth curve to represent data distribution

New cards

Box plots are best for…?

For comparing data across different groups

New cards

Mean

The average sum of all observations in a dataset

New cards

Median

The middle observation value (or middle 2 added and divided)

New cards

Mode

The most common value in a dataset

New cards

Standard deviation

The average variation of the values from the mean

New cards

Variance

The expectation of a squared deviation of a random variable from it’s mean

New cards

Quartile 1

The 25th percentile, where the lowest 25% lies

New cards

Quartile 3

Where 75% of the data lies

New cards

IQR(interquartile range)

The 50% of data that lie between Q1 and Q3

New cards

What data shows

Raw values (examples: 1, 2, 3, 4, 5)

New cards

What numerical data shows

Summary statistics (examples: mean = 5.5, median = 4.5)

New cards

What graphical data shows

Shape, spread, and outliers (examples: histogram, box plot, density plot)

New cards

What verbal data shows

Conceptual summary (examples: “right skewed with one high outlier, median around 5”)

New cards

Mean = median

Roughly symmetric graph

New cards

Mean > median

Graph shows a right skew

New cards

Mean < median

Graph shows a left skew

New cards

Large SD or IQR

Graph shows a wide spread

New cards

Graph shows outliers

Isolated points in boxplots or gaps in histograms

New cards

IQR = 0 means…?

Means the middle 50% of data are all the same value

New cards

Standard deviation equals..?

The square root of sample variance

New cards

1.5 IQR method

Method for identifying outliers

New cards

1.5 x IQR =

Below -1.5 x IQR - Q1 or above 1.5 x IQR + Q3

New cards

Use Mean and SD for…?

For symmetric data

New cards

Use Median and IQR for..?

Skewed data

New cards

When to use mean and SD

Symmetric data w no outliers, Interval data with equal spacing, heights of students in class, daily high temperatures in June, normally distributed test scores

New cards

When to use Median and IQR

For asymmetrical skewed data, if data has one outlier, small sample, ordinal ratings (satisfaction from 1-5 stars), salaries of employees (CEO makes $10m), number of emergency room visits per patient, response times for a computer program (some fast some very slow)

New cards

Use a histogram when:

You want to see the shape of distribution, you need frequency of data points for specific intervals, you are analyzing a single numerical dataset with many data points, you need to determine if a process is meeting customer requirements based on its distribution

New cards

Use a boxplot when:

Comparing distributions across multiple groups, you want a quick summary of the key statistics components (like mean), if you have a large amount of data and need a way to visualize it without showing every data point, you want to identify potential outliers.