Midterm 2 Codes

Group 1: Tidyverse Functions and Packages

These tools are designed for data science workflows using a linear, multi-layered command structure.

Core Packages & Philosophy

  • readr, dplyr, ggplot2, tidyr, lubridate: The primary collection of packages for importing, transforming, and visualizing tidy data.

  • |> (Native Pipe) or %>% (Magrittr Pipe): Used to chain commands together by inserting the left-hand object as the first argument of the right-hand function.

  • tibble: A modern data frame that features enhanced printing and stricter behavior.

Data Manipulation (dplyr)

  • filter(): Selects rows based on specific conditions.

  • select(): Extracts specific columns or variables.

  • contains(): A helper for select() to find columns matching specific text.

  • mutate(): Creates new columns that are functions of existing ones.

  • arrange(): Sorts rows; often paired with desc() for descending order.

  • group_by(): Groups data for pooled calculations.

  • summarize(): Aggregates many rows into a single summary value.

Data Visualization (ggplot2 & Extensions)

  • ggplot(): Initializes a plot using the Grammar of Graphics.

  • aes(): Maps data variables to visual properties like color, size, or shape.

  • geom_point(), geom_line(), geom_histogram(), geom_density(): Layers for scatterplots, lines, histograms, and density plots.

  • geom_errorbar(): Adds visual representations of uncertainty (e.g., ±2 SE).

  • facet_wrap(): Creates side-by-side graphs for easier categorical comparison.

  • scale_x_continuous(): Customizes axes (e.g., applying a log transformation).

  • labs(), ggtitle(): Adds labels and titles to plots.

  • ggcorr(), ggpairs(): Tidyverse-style functions from the GGally package for correlation matrices.


Group 2: Original R Code and Specialized Packages

This group includes "Base R" functions (pre-installed) and specialized packages used for traditional statistical analysis.

Data Management & Math (Base R)

  • data.frame(), matrix(): Create standard data structures.

  • nrow(), head(), sample(): Used to count rows, view the top of a dataset, or take random samples (often for bootstrapping).

  • round(), apply(), sort(): Used for rounding numbers, performing operations across matrices, or ordering vectors.

  • na.omit(), complete.cases(): Identify or remove missing (NA) values.

  • mean(), sd(), sqrt(), table(), quantile(): Core mathematical functions for statistics and scaling.

  • log(): Performs natural log transformations to achieve linearity.

Graphics & Visualization (Base R)

  • plot(): The standard base function for creating scatterplots.

  • jitter(): Adds random noise to coordinates to reveal overlapping data density.

  • abline(), lines(): Adds regression lines or trend lines to an existing plot.

  • legend(), mtext(): Adds descriptive legends or text to plot margins.

  • range(): Finds the minimum and maximum values for axis scaling.

Regression & Correlation (Base R & Specialized)

  • cor(), cor.test(): Calculates correlation coefficients ($r$) and performs significance tests.

  • lm(): Fits linear regression models using the syntax Response ~ Predictor.

  • summary(), confint(): Provides model reports and confidence intervals for slopes.

  • predict(): Calculates predicted values, confidence bands, or prediction bands.

  • coef(), I(): Extracts model coefficients or protects mathematical terms (like $Year^2$) in a formula.

  • rstudent(), hatvalues(), cooks.distance(): Diagnostic functions for identifying outliers, leverage points, and influential data.

  • logit(): Specialized transformation for percentages/proportions (from the car package).

  • corrplot(), corrplot.mixed(), cor.mtest(): Tools for visualizing complex correlation matrices.

  • chart.Correlation(): Combined visualization of scatterplots, histograms, and $r$ values.

  • ols_plot_resid_stud(), ols_plot_resid_lev(), ols_plot_cooksd_chart(): Diagnostic charts from the olsrr package.

  • myResPlots(): A custom function used to check regression assumptions (linearity, variance, normality).

  • options(scipen = 999): A utility function to disable scientific notation for p-values.