Introduction to Data Analysis - Relational Data

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/21

flashcard set

Earn XP

Description and Tags

Flashcards on Relational Data and Data Analysis

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

22 Terms

1
New cards

Relational Data

Data analysis that involves multiple tables of data. It focuses on the relations between the tables, facilitated by common variables (keys/value pairs).

2
New cards

Keys

Variables used to connect records in one table to another. They can uniquely identify an observation.

3
New cards

Primary Key

Uniquely identifies an observation in its own table.

4
New cards

Foreign Key

Uniquely identifies an observation in another table, helping to combine data between tables.

5
New cards

Joins

Adding variables from one table to another, deciding which observations are copied into the new table.

6
New cards

Inner Join

Matches pairs of observations when their keys are equal, outputting a new data frame with the key, x values, and y values. Unmatched rows are excluded.

7
New cards

Outer Join

Keeps observations that appear in at least one of the tables being joined.

8
New cards

Left Join

Retains all observations in x, and those in y that have a key in x.

9
New cards

Right Join

Retains all observations in y, and those in x that have a key in y.

10
New cards

Full Join

Retains all observations in both x and y.

11
New cards

Filtering Joins

Filter rows from x based on the presence or absence of matches in y.

12
New cards

Semi_join()

Keeps all observations in x that have a match in y, returning just the columns from x.

13
New cards

Anti_join()

Drops all observations in x that have a match in y.

14
New cards

Stacking Rows - bind_rows()

Combine data frames by rows, where all tables should contain the same variables (column names), but not necessarily in the same order.

15
New cards

Stacking Columns - bind_cols()

Stack two or more data frames horizontally by columns, rows need to match.

16
New cards

library(readxl)

R package used for extracting data from excel files

17
New cards

anchored()

Select a range of cells in excel.

18
New cards

cell_limits()

Select a range of cells in excel.

19
New cards

cell_cols()

Select columns in excel.

20
New cards

cell_rows()

Select rows in excel.

21
New cards

Left join NAs

NA values in columns added by a left join indicate that no matching key was found in the "right" table for those specific observations from the "left" table

22
New cards

Full Join NAs

NA values in a full join can appear on either side, indicating that a record from one table had no match in the other.