1/19
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Wide format
“Spreadsheet style”
Each row is a country, columns represent HIV values over a series of years
Better for viewing
Easier to create variables that compare response values across time periods
Long format
Each row is a different country-year combination
Each row represents only one distinct observation
Better for data analysis
Easier to add new variables or combine info in multiple tables
Tidy data
Array of rows and columns
Rows (items/case) are a specific, unique, and similar sort of thing
Column (variables) each have the same sort of value for each row
pivot_wider
Long → wide
Value_from
name of the variable in the narrow format to be divided up into multiple variables in the resulting wide format.
Names_from
name of the variable in the narrow format that identifies, for each case, which column in the wide format will receive the value
pivot_longer
wide → long
names_to
defines variables from the wide form that will become the categorical levels in the narrow form.
values_to
the variable that is to hold the values in the variables being gathered – it should reflect what those values actually represent
list-columns
A column where each cell contains an entire mini dataset instead of a single value.
Each row is a group (i.e. subjects)
A variable of type list
nest function
Turns a long dataset into one row per group, with the remaining data packed into a tibble in a list-column.
‘unnest’ converts nested list back into numeric or ungrouped data
Example:
Before
subject time score
A 1 10
A 2 12
B 1 8
After
subject data
A <tibble of A’s rows>
B <tibble of B’s rows>
pull function
Extract a list-column
map function
Apply a function to each element of a list
Use map with pull to get a specific measured variable
Use the result of that (a list) wtih map to perform calculations on this list
Native R data file format
.rda or .RData
Write: saveRDS
Read: readRDS
packages for reading files
readxl: Excel
googlesheets4: google sheets
dybplyr/DBI: relational databases on remote servers
readr: .csv files
rvest: HTML tables
read.csv (base R)
readr::read_csv
reading HTML files
rvest converts HTML to an R structure, then convert the HTML tables to R data tables
read_html: produces a list containing tables from the webpage
purrr:pluck: extracts any table from the list, can be stores as tibble
data cleaning: strings and numbers
parse_number: takes a character string and translates into numeric value
parse_character: takes number value/column and converts into character
Dates
Usually need to convert from character strings to date type
lubridate package
‘Date’ and ‘dttm’ (date-time) values
‘interval’ function values differences in date/time
‘hour’, ‘month’ functions extracts parts of variables that are stores as dates or times
Factors/strings
Factor: objects containing levels of a categorical variable
Allows custom ordering (fct_relevel)
readr::read_csv reads character strings by defailt, not factors
‘forcats’ package has tools for wrangling factor data
Vectorized operations
Basically like a for loop
Take vector as input, perform an operation on every element, return vector as output
map functions
Iteratively apply an R function to each element of a vector
map: collection of outputs stored as a list
map_dbl: numeric vectir
map_lgl, map_int, map_chr: logical, int, character
map_dfr: collects results into data frame
Base R: lapply, tapply
across function
Apply a function across columns
Used with summarize, mutate
‘where’ and ‘is.numeric’
iteration over subgroups
‘group_modify’ applies functions to subgroups of a data frame
Define groups with ‘group_by’
Animation plots
gganimate
transition_time: continuous, specify the name of the variable that is changing with the frames
transition_states: discrete, if the plot is changing over levels of a discrete variable, this will specify the name of this discrete variable