Midterm 2 Codes

These tools are designed for data science workflows using a linear, multi-layered command structure.

Core Packages & Philosophy

readr, dplyr, ggplot2, tidyr, lubridate: The primary collection of packages for importing, transforming, and visualizing tidy data.
|> (Native Pipe) or %>% (Magrittr Pipe): Used to chain commands together by inserting the left-hand object as the first argument of the right-hand function.
tibble: A modern data frame that features enhanced printing and stricter behavior.

Data Manipulation (dplyr)

Data Visualization (ggplot2 & Extensions)

ggplot(): Initializes a plot using the Grammar of Graphics.
aes(): Maps data variables to visual properties like color, size, or shape.
geom_point(), geom_line(), geom_histogram(), geom_density(): Layers for scatterplots, lines, histograms, and density plots.
geom_errorbar(): Adds visual representations of uncertainty (e.g., ±2 SE).
facet_wrap(): Creates side-by-side graphs for easier categorical comparison.
scale_x_continuous(): Customizes axes (e.g., applying a log transformation).
labs(), ggtitle(): Adds labels and titles to plots.
ggcorr(), ggpairs(): Tidyverse-style functions from the GGally package for correlation matrices.

This group includes "Base R" functions (pre-installed) and specialized packages used for traditional statistical analysis.

Data Management & Math (Base R)

data.frame(), matrix(): Create standard data structures.
nrow(), head(), sample(): Used to count rows, view the top of a dataset, or take random samples (often for bootstrapping).
round(), apply(), sort(): Used for rounding numbers, performing operations across matrices, or ordering vectors.
na.omit(), complete.cases(): Identify or remove missing (NA) values.
mean(), sd(), sqrt(), table(), quantile(): Core mathematical functions for statistics and scaling.
log(): Performs natural log transformations to achieve linearity.

Graphics & Visualization (Base R)

plot(): The standard base function for creating scatterplots.
jitter(): Adds random noise to coordinates to reveal overlapping data density.
abline(), lines(): Adds regression lines or trend lines to an existing plot.
legend(), mtext(): Adds descriptive legends or text to plot margins.
range(): Finds the minimum and maximum values for axis scaling.

Regression & Correlation (Base R & Specialized)

cor(), cor.test(): Calculates correlation coefficients ($r$) and performs significance tests.
lm(): Fits linear regression models using the syntax Response ~ Predictor.
summary(), confint(): Provides model reports and confidence intervals for slopes.
predict(): Calculates predicted values, confidence bands, or prediction bands.
coef(), I(): Extracts model coefficients or protects mathematical terms (like $Year^2$) in a formula.
rstudent(), hatvalues(), cooks.distance(): Diagnostic functions for identifying outliers, leverage points, and influential data.
logit(): Specialized transformation for percentages/proportions (from the car package).
corrplot(), corrplot.mixed(), cor.mtest(): Tools for visualizing complex correlation matrices.
chart.Correlation(): Combined visualization of scatterplots, histograms, and $r$ values.
ols_plot_resid_stud(), ols_plot_resid_lev(), ols_plot_cooksd_chart(): Diagnostic charts from the olsrr package.
myResPlots(): A custom function used to check regression assumptions (linearity, variance, normality).
options(scipen = 999): A utility function to disable scientific notation for p-values.