Chapter 2: Exploratory Data Analysis

0.0(0)
studied byStudied by 0 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/34

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

35 Terms

1
New cards

what are the 2 ways to visualize categorical data?

bar plot

pie chart

2
New cards

what data can be represented in bar plot?

any. can have more than 1 choice.

3
New cards

what data can be represented in pie chart?

data that only has one choice.

4
New cards

what is on the horizontal axis of bar plots?

categories

5
New cards

what is on the vertical axis of bar plots?

counts

6
New cards

pie chart

convert data into percentages

7
New cards

what are the 2 ways to visualize quantitative data?

histogram & box-plot

8
New cards

how is a histogram organized?

x = variable

y = frequency

9
New cards

mean

average

10
New cards

median

middle

11
New cards

mode

most common

12
New cards

are the mean, median, & mode always enough to tell you about something?

no, you often need standard deviation

13
New cards

standard deviation (SD)

how each observation is deviated from the mean

small: close to mean

large: far from mean

14
New cards

what are the 3 diff. shapes of a histogram?

symmetric, right-skewed/tailed, left-skewed/tailed

15
New cards

symmetric histogram distribution

bell shape, mirror image

mean = median

16
New cards

right-skewed/tailed distribution

long tail on right

mean > median

17
New cards

left-skewed/tailed distribution

long tail on left

mean < median

18
New cards

in R, when do you use the table function?

only for categorical data

19
New cards

what is the equal sign in R?

<-

20
New cards

5 number summary

min, Q1, med, Q3, max

21
New cards

Q1

first quartile

median of first half

22
New cards

Q3

third quartile

median of second half

23
New cards

IQR (inter quartile ranfge)

Q3 -Q1

24
New cards

spread/variable measurements

IQR & SD

25
New cards

mean, median, mode

typical/center measurements

26
New cards

typical & spread measurement for symmetric distribution

mean & SD

27
New cards

typical & spread measurement for skewed distribution

median & IQR

28
New cards

outlier

an extreme value

29
New cards

formulas for finding outliers

x > Q3 + 1.5(IQR)

x < Q1 - 1.5(IQR)

30
New cards

how can you visualize both quantitative and categorical data?

use parallel box plot

31
New cards

when determining the shape of a distribution, should you look at each individual category?

NO. only look at all the data,

32
New cards

when do you use z-score

when you have a symmetric distribution

33
New cards

z-score calculation

data - mean / SD

34
New cards

what does z-score tell you?

how many standard deviations a specific data point is away from the mean of its dataset

35
New cards

compared to a skewed distribution, what does the SD & IQR of a bell-shaped curve look like?

smaller, since most points are concentrated close to center