stats2

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/47

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

48 Terms

1
New cards

Contingency Table

Displays the distribution frequency of categorical variables

2
New cards

Bar Plot

A graph that represents numeric categorical variables with rectangles

3
New cards

Making a bar plot in R

ggplot(Dataset) + aes(x = Variable1, y = Variable2) + geom_bar()

4
New cards

Making a 2 way contingency table in R

Dataset %>% count(Variable1, Variable2) %>% mutate(prop = n / sum (n)) %>% pivot_wider (names_from = Variable1, values_from = Variable2)

5
New cards

2 way contingency table

Displays frequencies for combinations of two categorical variables. They classify outcomes for one variable in rows and the other in columns

6
New cards

Overall 2 way contingency table

Dividing the observations by the overall total

7
New cards

Conditional, row or column, 2 way contingency table

Dividing the observations by the row or column total

8
New cards

Stacked bar plot

Showing multiple (raw data) values for each category by stacking bars atop each other. Total bar height represents the sum

9
New cards

Dodged bar plot

Compares bars side by side

10
New cards

Standardized stacked bar plot

Like a regular stacked bar plot but with proportions

11
New cards

Unimodal distribution

Probability distribution with a clear peak

12
New cards

Bimodal distribution

Has two clear peaks

13
New cards

Multimodal distribution

Having two or more peaks

14
New cards

Uniform distribution

Every outcome is equally likely to occur

15
New cards

Symmetric distribution

Distribution where the left and right sides mirror each other

16
New cards

Left skewed distribution

Having a tail on the left

17
New cards

Right skewed distribution

Having a tail on the right

18
New cards

Skewness

A measure of asymmetry of a distribution

19
New cards

Histogram

Like a bar graph without gaps between the bars

20
New cards

Density plot

Like a histogram except it uses a smooth curve to represent data distribution

21
New cards

Box plots are best for…?

For comparing data across different groups

22
New cards

Mean

The average sum of all observations in a dataset

23
New cards

Median

The middle observation value (or middle 2 added and divided)

24
New cards

Mode

The most common value in a dataset

25
New cards

Standard deviation

The average variation of the values from the mean

26
New cards

Variance

The expectation of a squared deviation of a random variable from it’s mean

27
New cards

Quartile 1

The 25th percentile, where the lowest 25% lies

28
New cards

Quartile 3

Where 75% of the data lies

29
New cards

IQR(interquartile range)

The 50% of data that lie between Q1 and Q3

30
New cards

What data shows

Raw values (examples: 1, 2, 3, 4, 5)

31
New cards

What numerical data shows

Summary statistics (examples: mean = 5.5, median = 4.5)

32
New cards

What graphical data shows

Shape, spread, and outliers (examples: histogram, box plot, density plot)

33
New cards

What verbal data shows

Conceptual summary (examples: “right skewed with one high outlier, median around 5”)

34
New cards

Mean = median

Roughly symmetric graph

35
New cards

Mean > median

Graph shows a right skew

36
New cards

Mean < median

Graph shows a left skew

37
New cards

Large SD or IQR

Graph shows a wide spread

38
New cards

Graph shows outliers

Isolated points in boxplots or gaps in histograms

39
New cards

IQR = 0 means…?

Means the middle 50% of data are all the same value

40
New cards

Standard deviation equals..?

The square root of sample variance

41
New cards

1.5 IQR method

Method for identifying outliers

42
New cards

1.5 x IQR =

Below -1.5 x IQR - Q1 or above 1.5 x IQR + Q3

43
New cards

Use Mean and SD for…?

For symmetric data

44
New cards

Use Median and IQR for..?

Skewed data

45
New cards

When to use mean and SD

Symmetric data w no outliers, Interval data with equal spacing, heights of students in class, daily high temperatures in June, normally distributed test scores

46
New cards

When to use Median and IQR

For asymmetrical skewed data, if data has one outlier, small sample, ordinal ratings (satisfaction from 1-5 stars), salaries of employees (CEO makes $10m), number of emergency room visits per patient, response times for a computer program (some fast some very slow)

47
New cards

Use a histogram when:

You want to see the shape of distribution, you need frequency of data points for specific intervals, you are analyzing a single numerical dataset with many data points, you need to determine if a process is meeting customer requirements based on its distribution

48
New cards

Use a boxplot when:

Comparing distributions across multiple groups, you want a quick summary of the key statistics components (like mean), if you have a large amount of data and need a way to visualize it without showing every data point, you want to identify potential outliers.

Explore top flashcards