Midterm 2 Codes
Group 1: Tidyverse Functions and Packages
These tools are designed for data science workflows using a linear, multi-layered command structure.
Core Packages & Philosophy
readr,dplyr,ggplot2,tidyr,lubridate: The primary collection of packages for importing, transforming, and visualizing tidy data.|>(Native Pipe) or%>%(Magrittr Pipe): Used to chain commands together by inserting the left-hand object as the first argument of the right-hand function.tibble: A modern data frame that features enhanced printing and stricter behavior.
Data Manipulation (dplyr)
filter(): Selects rows based on specific conditions.select(): Extracts specific columns or variables.contains(): A helper forselect()to find columns matching specific text.mutate(): Creates new columns that are functions of existing ones.arrange(): Sorts rows; often paired withdesc()for descending order.group_by(): Groups data for pooled calculations.summarize(): Aggregates many rows into a single summary value.
Data Visualization (ggplot2 & Extensions)
ggplot(): Initializes a plot using the Grammar of Graphics.aes(): Maps data variables to visual properties like color, size, or shape.geom_point(),geom_line(),geom_histogram(),geom_density(): Layers for scatterplots, lines, histograms, and density plots.geom_errorbar(): Adds visual representations of uncertainty (e.g., ±2 SE).facet_wrap(): Creates side-by-side graphs for easier categorical comparison.scale_x_continuous(): Customizes axes (e.g., applying a log transformation).labs(),ggtitle(): Adds labels and titles to plots.ggcorr(),ggpairs(): Tidyverse-style functions from theGGallypackage for correlation matrices.
Group 2: Original R Code and Specialized Packages
This group includes "Base R" functions (pre-installed) and specialized packages used for traditional statistical analysis.
Data Management & Math (Base R)
data.frame(),matrix(): Create standard data structures.nrow(),head(),sample(): Used to count rows, view the top of a dataset, or take random samples (often for bootstrapping).round(),apply(),sort(): Used for rounding numbers, performing operations across matrices, or ordering vectors.na.omit(),complete.cases(): Identify or remove missing (NA) values.mean(),sd(),sqrt(),table(),quantile(): Core mathematical functions for statistics and scaling.log(): Performs natural log transformations to achieve linearity.
Graphics & Visualization (Base R)
plot(): The standard base function for creating scatterplots.jitter(): Adds random noise to coordinates to reveal overlapping data density.abline(),lines(): Adds regression lines or trend lines to an existing plot.legend(),mtext(): Adds descriptive legends or text to plot margins.range(): Finds the minimum and maximum values for axis scaling.
Regression & Correlation (Base R & Specialized)
cor(),cor.test(): Calculates correlation coefficients ($r$) and performs significance tests.lm(): Fits linear regression models using the syntaxResponse ~ Predictor.summary(),confint(): Provides model reports and confidence intervals for slopes.predict(): Calculates predicted values, confidence bands, or prediction bands.coef(),I(): Extracts model coefficients or protects mathematical terms (like $Year^2$) in a formula.rstudent(),hatvalues(),cooks.distance(): Diagnostic functions for identifying outliers, leverage points, and influential data.logit(): Specialized transformation for percentages/proportions (from thecarpackage).corrplot(),corrplot.mixed(),cor.mtest(): Tools for visualizing complex correlation matrices.chart.Correlation(): Combined visualization of scatterplots, histograms, and $r$ values.ols_plot_resid_stud(),ols_plot_resid_lev(),ols_plot_cooksd_chart(): Diagnostic charts from theolsrrpackage.myResPlots(): A custom function used to check regression assumptions (linearity, variance, normality).options(scipen = 999): A utility function to disable scientific notation for p-values.