DATA1001 Codes

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/50

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

51 Terms

1
New cards

sqrt(x)

square root

2
New cards

abs(x)

absolute value of x

3
New cards

length(x)

the amount of numbers

4
New cards

min(x)

the lowest value of the numbers

5
New cards

sum(x)

the sum of the numbers

6
New cards

str(dataset_name)

structure of the data- shows the number of rows and columns in the data, the variable names, and R's classification of each variable

7
New cards

head(dataset_name, n)

shows the first n rows of the data set (shows 6 if n is not specified)

8
New cards

tail(dataset_name, n)

shows the last n rows of the data set (shows 6 if n is not specified)

9
New cards

ggplot()

creates graphical summaries

10
New cards

library(tidyverse)

loads the tidyverse library

11
New cards

geom_histogram()

histogram graph

12
New cards

geom_boxplot()

boxplot graph

13
New cards

ggplot(iris, aes(x = Petal.Width, fill = Species)) + geom_histogram() + labs(title = "Sliced histogram of Petal Width by Species")

Iris histogram example

14
New cards

mean(x)

mean

15
New cards

median(x)

median

16
New cards

sd(x)

sample standard deviation

17
New cards

 library(rafalib), popsd(dataset_name$variable)

population standard deviation

18
New cards

fivenum(x)

minimum, Q1, median, Q3, maximum

19
New cards

IQR(x)

IQR

20
New cards

 LT = Q1 - 1.5 * iqr_value

Lower threshold outlier

21
New cards

UT = Q3 + 1.5 * iqr_value

upper threshold outlier

22
New cards

as.factor(), e.g. mtcars$vs = as.factor(mtcars$vs)

class(mtcars$vs)

converting numeric to a factor

23
New cards

char_list = c("1", "2", "3", "4")

character list

24
New cards

?dataset_name

help page for the data set

25
New cards

class(dataset_name$variable)

to see how r has classified a certain variable

26
New cards

pnorm(x=…, mean=.., sd=…, lower.tail=…)

works out the probability/the area under the curve up to a certain point/ the Normal cumulative distribution function (cdf). Defaults are mean=0, sd = 1, lower.tail=TRUE

27
New cards

 round(number, n)

round an answer to n decimal places

28
New cards

qnorm(x=…, mean=…, sd=…, lower.tail=…)

gives you the value (quantile) below which a certain percentage of data from a Normal distribution falls. Defaults are mean=0, sd = 1, lower.tail=TRUE

29
New cards

filter()

filtering data

30
New cards

mutate()

add new columns or modify existing ones

31
New cards

cor(x,y)

finds the correlation coefficient

32
New cards

lm(y ~ x, data=df)

creates a linear regression model where y is the dependent variable, x is the independent variable and df is the data.frame

33
New cards

geom_smooth(method="lm", se = T/F, color=…)

adding a regression line to a scatter plot where se is standard error (always put false) and method="lm" means we want a linear regression line in particular

34
New cards

geom_point()

makes a residual plot

35
New cards

geom_hline(yintercept=0)

creates a horizontal line at 0

36
New cards

ggplot(model, aes(x=.fitted, y=.resid)) + geom_point() + geom_hline(yintercept=0, linetype= "dotted", color="red")

scatter residual plot

37
New cards

sample(x, size, replace = T/F, prob=)

modelling random events where x is the data from which you sample, size is the size of your sample, replace is with or without replacement, prob (optional) is the set probability for each chance

38
New cards

replicate(x, function)

to repeat a function multiple times where x is the number of repeats

39
New cards

set.seed(x)

makes results reproducible where x is any number (question will tell you what to use)

40
New cards

dbinom(x,n,p)

 calculates P(X=x) where x is the number of events we want, n is the number of trials and p is the probability of success

41
New cards

pbinom(x,n,p)

combines many dbinom() to calculate P(X is less than or equal to x) where x is the number of events we want, n is the number of trials and p is the probability of success

42
New cards

Box=c(1,0)

to define your box

43
New cards
  • Set.seed(1)

  • Box=c(1,0)

  • Sum(sample(box, 10000, replace=TRUE)

to simulate 1000 tosses

44
New cards

cumsum()

calculates the cumulative sum

45
New cards

rep(a,b)

to create large boxes where a is the number to be repeated and b is how many times it is repeated

46
New cards
47
New cards
48
New cards
49
New cards
50
New cards
51
New cards