1/21
Flashcards on Relational Data and Data Analysis
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Relational Data
Data analysis that involves multiple tables of data. It focuses on the relations between the tables, facilitated by common variables (keys/value pairs).
Keys
Variables used to connect records in one table to another. They can uniquely identify an observation.
Primary Key
Uniquely identifies an observation in its own table.
Foreign Key
Uniquely identifies an observation in another table, helping to combine data between tables.
Joins
Adding variables from one table to another, deciding which observations are copied into the new table.
Inner Join
Matches pairs of observations when their keys are equal, outputting a new data frame with the key, x values, and y values. Unmatched rows are excluded.
Outer Join
Keeps observations that appear in at least one of the tables being joined.
Left Join
Retains all observations in x, and those in y that have a key in x.
Right Join
Retains all observations in y, and those in x that have a key in y.
Full Join
Retains all observations in both x and y.
Filtering Joins
Filter rows from x based on the presence or absence of matches in y.
Semi_join()
Keeps all observations in x that have a match in y, returning just the columns from x.
Anti_join()
Drops all observations in x that have a match in y.
Stacking Rows - bind_rows()
Combine data frames by rows, where all tables should contain the same variables (column names), but not necessarily in the same order.
Stacking Columns - bind_cols()
Stack two or more data frames horizontally by columns, rows need to match.
library(readxl)
R package used for extracting data from excel files
anchored()
Select a range of cells in excel.
cell_limits()
Select a range of cells in excel.
cell_cols()
Select columns in excel.
cell_rows()
Select rows in excel.
Left join NAs
NA values in columns added by a left join indicate that no matching key was found in the "right" table for those specific observations from the "left" table
Full Join NAs
NA values in a full join can appear on either side, indicating that a record from one table had no match in the other.