Stats IPS and R4DS Reading Notes

0.0(0)
studied byStudied by 0 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/23

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

24 Terms

1
New cards

cases

the objects described by a set of data

2
New cards

variable

a characteristic of a case

3
New cards

different cases can have different _____ of the variables

values

4
New cards

label

a special variable used in some data sets to uniquely identify different cases

5
New cards

categorical variable

type of variable that places a case into one of several groups or categories (eg. location)

6
New cards

quantitative variable

a type of variable that takes numerical values for which arithmetic operations such as adding and averaging make sense (eg. amount)

7
New cards

converting categorical variables into quantitative variables

letter grades (categorical) can be converted to a numerical value to make a GPA. this is now quantitative because you can average these numbers to make the GPA. it is still important to be careful when making these conversions because the scale is not always perfectly balanced (eg. A and B difference not the same as D and F difference)

8
New cards

you can adjust one variable to create another, like when calculating a ____

rate

9
New cards

vector (in R)

basic data structure, an ordered collection of elements of the same data type.

10
New cards

what to put in comments in R after the #

the why of your actions, because this is hard or impossible to get just from reading the code. the how or the what can be seen even tediously from the code itself. easier to keep up with what you’ve done so far. explain plan and mode of attack and record important insights as you meet them.

11
New cards

<—

reads as gets. it is used to assign a value to a variable or an object

12
New cards

resistant measure

because the mean can not resist the influence of extreme observations (ie. outliers), it is not a resistant measure of center. a resistant measure will limit the influence of outliers, and its value does not respond strongly to changes in a few observations, no matter how large the changes are. sometimes called a robust measure.

13
New cards

n + 1 over 2

how you find where the center of a dataset is to find the median. ie. if there are 24 observations, 25/2= 12.5, so the median will be here between the 12 and 13th observations.

14
New cards

the five number summary

gives a quick summary of both center and spread: the smallest observation (minimum), the first quartile (Q sub 1), the median (M), the third quartile (Q sub 3), and the largest observation (maximum). graphed by the boxplot!

15
New cards

1.5 x IQR rule for outliers

call an observation a suspected outlier if it falls more than 1.5 x IQR below the first quartile or above the third quartile v

16
New cards

modified boxplot

for the 1.5 x IQR rule, the lines of the boxplot are modified.

17
New cards

side by side boxplots

use two or more boxplots in the same graph to compare groups measured on the same variable.

18
New cards

min () and max()

for largest and smallest values

19
New cards

quantile(x, 0.25/0.75)

finds the value of x that is greater than 25% or 75% of those values

20
New cards

IQR()

quantile(x, 0.75) - quantile(x, 0.25)

21
New cards

first (x), last (x), or nth(x, n)

extracts a value at a specific position

22
New cards

filter()

gives you all variables with each observation in a separate row

23
New cards
24
New cards