DATA1001 Codes

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/50

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

51 Terms

New cards

sqrt(x)

square root

New cards

abs(x)

absolute value of x

New cards

length(x)

the amount of numbers

New cards

min(x)

the lowest value of the numbers

New cards

sum(x)

the sum of the numbers

New cards

str(dataset_name)

structure of the data- shows the number of rows and columns in the data, the variable names, and R's classification of each variable

New cards

head(dataset_name, n)

shows the first n rows of the data set (shows 6 if n is not specified)

New cards

tail(dataset_name, n)

shows the last n rows of the data set (shows 6 if n is not specified)

New cards

ggplot()

creates graphical summaries

New cards

library(tidyverse)

loads the tidyverse library

New cards

geom_histogram()

histogram graph

New cards

geom_boxplot()

boxplot graph

New cards

ggplot(iris, aes(x = Petal.Width, fill = Species)) + geom_histogram() + labs(title = "Sliced histogram of Petal Width by Species")

Iris histogram example

New cards

mean(x)

mean

New cards

median(x)

median

New cards

sd(x)

sample standard deviation

New cards

library(rafalib), popsd(dataset_name$variable)

population standard deviation

New cards

fivenum(x)

minimum, Q1, median, Q3, maximum

New cards

IQR(x)

IQR

New cards

LT = Q1 - 1.5 * iqr_value

Lower threshold outlier

New cards

UT = Q3 + 1.5 * iqr_value

upper threshold outlier

New cards

as.factor(), e.g. mtcars$vs = as.factor(mtcars$vs)

class(mtcars$vs)

converting numeric to a factor

New cards

char_list = c("1", "2", "3", "4")

character list

New cards

?dataset_name

help page for the data set

New cards

class(dataset_name$variable)

to see how r has classified a certain variable

New cards

pnorm(x=…, mean=.., sd=…, lower.tail=…)

works out the probability/the area under the curve up to a certain point/ the Normal cumulative distribution function (cdf). Defaults are mean=0, sd = 1, lower.tail=TRUE

New cards

round(number, n)

round an answer to n decimal places

New cards

qnorm(x=…, mean=…, sd=…, lower.tail=…)

gives you the value (quantile) below which a certain percentage of data from a Normal distribution falls. Defaults are mean=0, sd = 1, lower.tail=TRUE

New cards

filter()

filtering data

New cards

mutate()

add new columns or modify existing ones

New cards

cor(x,y)

finds the correlation coefficient

New cards

lm(y ~ x, data=df)

creates a linear regression model where y is the dependent variable, x is the independent variable and df is the data.frame

New cards

geom_smooth(method="lm", se = T/F, color=…)

adding a regression line to a scatter plot where se is standard error (always put false) and method="lm" means we want a linear regression line in particular

New cards

geom_point()

makes a residual plot

New cards

geom_hline(yintercept=0)

creates a horizontal line at 0

New cards

ggplot(model, aes(x=.fitted, y=.resid)) + geom_point() + geom_hline(yintercept=0, linetype= "dotted", color="red")

scatter residual plot

New cards

sample(x, size, replace = T/F, prob=)

modelling random events where x is the data from which you sample, size is the size of your sample, replace is with or without replacement, prob (optional) is the set probability for each chance

New cards

replicate(x, function)

to repeat a function multiple times where x is the number of repeats

New cards

set.seed(x)

makes results reproducible where x is any number (question will tell you what to use)

New cards

dbinom(x,n,p)

calculates P(X=x) where x is the number of events we want, n is the number of trials and p is the probability of success

New cards

pbinom(x,n,p)

combines many dbinom() to calculate P(X is less than or equal to x) where x is the number of events we want, n is the number of trials and p is the probability of success

New cards

Box=c(1,0)

to define your box

New cards