1/67
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
example()
example on how to use the function
?
provides documentation on a specific function or dataset that are part of a package
←
assignment operator
— assigning value to an object / variable
ls()
check how many objects you have created / available
rm()
removes objects that you don’t need
seq()
creates a sequence of numbers
length.out= argument
sets a desired length of the sequence
library()
loads packages
every time a new R session is open
when to use the library () function?
int
stands for integers
EX]: 1,2,3
dbl
stands for doubles or real numbers
EX]: -1, 1.5, 4/5
continuous
measured data
can have infinite values within a possible range
EX]: i am 3.1” tall and i weigh 34.16 grams
discrete
observations can only exist at limited values, often counts
EX]: i have 8 legs and 4 spots
date
stands for dates
EX]: (01/21/2025)
dttm
stands for date-times, a date + a time
EX]: (01/21/2025 11:00am)
fctr
stands for factors
— R uses to represent categorical variables with fixed possible values
EX]: freshman, sophomore, junior, senior
lgl
stands for logical
— vectors that contain only TRUE or FALSE values
chr
stands for characters
— vectors or strings
EX]: “this is a string”
nominal
unordered descriptions
EX]: “i’m a turtle” and “i’m a butterfly”
ordinal
ordered descriptions
EX]: “i am unhappy” and “i am awesome”
binary
only 2 mutually exclusive outcomes
vector
the simplest data structure in R which consists of an ordered set of values of the same type (e.g. numeric, charcater, data, etc)
scalar
a vector of length 1
data frame / tibble / dataset
data structure that organizes data into a 2-dimensional table of rows and columns, much like a spreadsheet.
glimpse()
get a sense of all the columns and their content
str()
get a more detailed sense of columns and all their contends
colnames()
know the columns name of your dataset as a list
functions to get to know your data
glimpse()
str()
colnames()
?
the 5 + 1 data manipulations
arrange ()
filter()
select ()
mutate()
summarize()
— group_by()
arrange()
reorder / sort observations / rows
filter()
keep observations / rows based on conditions
select()
pick variables / columns
mutate()
create new variables / columns or update existing ones
summarize()
produce descriptive statistics
group_by()
change the unit of analysis by creating groups based on one of more variables / columns
ends_with()
select function
matches names that end with whatever is in the ()
starts_with()
select function
matches names that start with whatever is in the ()
contains()
select function
matches names that contains whatever is in the ()
transmute()
used to compute the new column only
na.rm=T argument
critical if the column you are using for your average contains missing values
|>
can be used to rewrite multiple operations. think of it like reading “then”
measures of location
mean() and median()
n()
includes a count
mean()
the sum divided by the length
median()
a value where 50% of () is above it and 50% is below it
measures of spread
sd()
sd()
the root mean squared deviation
measures of rank
min() and max()
min()
identifies the smallest value
max()
identifies the largest value
data= argument
adds in the dataset to use in the graph, so data is loaded in the background
mapping=aes()
maps what variables we want to visualize on our axes
geom
determines the visual structure / shape of the chart
aesthetics
color / fill
size
alpha (transparency)
shape
choosing the right chart depends on…
the data type of the columns: is the data numerical or categorial?
the objective of the chart: what is trying to be conveyed with your chart?
distribution chart
shows how values in a dataset are spread out or clustered.
highlights the range of data, concentration of data points, and whether data tends to be skewed towards specific values.
EX]: histograms, boxplots, violin, and density plots
correlation chart
used to examine the relationship between two (or more) numerical variables.
! correlation does not imply causation
EX]: scatter plot, smoothing lines, 2d charts, heatmaps, and correlograms.
ranking chart
displays how different categories (categorical variables) compare in terms of a certain measure
EX]: bar charts, lollipop charts, dot plots
evolution chart / time-series chart
shows how a variable changes over time
highlights trends, patterns, seasonality, or fluctuations over a period
EX]: line chart
geom_col()
use to show the categorical variable with respect to a numerical variable or if you want to use two variables (x and y)
geom_bar()
only used to show one variable (x or y)
static / local
if you put an aesthetic in the geom function, it is…
just changes the aesthetic
dynamic / global
if you put an aesthetic in the mapping=aes(), it is…
assigns an aesthetic to an x or y variable and automatically includes a legend
facet_wrap()
charts by a single variable
facet_grid()
chart by the combination of two variables
linetype= argument
adds different line styles (solid, dashed, etc) to differentiate
se=F argument
removes the gray area in the geom_smooth() function or displays the confidence interval around the smooth
method=”lm” argument
makes the lines linear in a geom_smooth function