BUSN5000 Midterm 1

studied byStudied by 0 people
0.0(0)
Get a hint
Hint

Mutate

1 / 53

flashcard set

Earn XP

Description and Tags

Business

54 Terms

1

Mutate

Tidyverse function for creating new variables.

New cards
2

Filter

Tidyverse function for including data matching a certain condition

New cards
3

Select

Tidyverse function for keeping/dropping variables

New cards
4

Group_By

Tidyverse function for categorizing data

New cards
5

Summarise

Tidyverse function for computing the basic summary statistics

New cards
6

Population Average

The big average we’re trying to figure out

New cards
7

Sample average

The average of the population we have.

New cards
8

records

A data set is made up of _____ that contain information on a specific entity.

New cards
9

fields

Each record is made of _____ that contain measurements of known types.

New cards
10

Panel Data

Data collected over time on multiple entities, such as individuals, firms, or countries.

New cards
11

Acquire, Transform, Analyze, Communicate

The 4 stages of analysis

New cards
12

Cross Section

Many units observed at a particular time

New cards
13

Time Series

A single unit observed over multiple time periods

New cards
14

Data Set

Multiple data tables structured for a particular analysis

New cards
15

Database

A collection of tables where each table has some known and meaningful relationship to the other tables.

New cards
16

Volume

A word to describe the literal size and scale of data

New cards
17

Velocity

A word to describe the speed of generation, collection, and storage of data

New cards
18

Variety

A word to describe the complexity of sources and forms of data

New cards
19

Veracity

A word to describe the degree of consistency and completeness of data.

New cards
20

Content

What a variable measures

New cards
21

Validity

Whether a variable measures what its supposed to measure

New cards
22

Reliability

Whether repeated measurements return the same value

New cards
23

Comparability

Whether a variable is measured the same way across units

New cards
24

Coverage

Whether all units intended for inclusion are included

New cards
25

Selection

Whether selected units are representative of those not covered

New cards
26

Data Schema

A representation of the data structure that comprises all the attributes of the data and their data types

New cards
27

Vector

A sequence of data elements of the same type

New cards
28

Matrix

A two-dimensional array of data elements of the same type

New cards
29

Data Frame

A tabular data structure

New cards
30

List

An ordered collection of objects

New cards
31

Factor

A vector that can contain only predefined values, and is used to store categorical data

New cards
32

Array

A multidimensional collection of same-type data elements.

New cards
33

Frequentism

The approach to thinking that states the probability of some event happening is the number of times it happens over the number of random trials

New cards
34

Law of Large Numbers

The idea that the more trials you use, the closer your data gets to being exactly accurate.

New cards
35

Estimand

The thing we want to estimate

New cards
36

Estimator

The formulas we use to make an estimate

New cards
37

Estimate

Our best guess for something, with bias and sampling error.

New cards
38

Measurement error

When a variable’s empirical measurement does not accurately capture the thing we are interested in

New cards
39

CEF

The workhorse of data science, has the expected value (average) of a variable given another variable.

New cards
40

Law of Iterated Expectations

The law that states the unconditional expectation is equal to the weighted average of conditional expectations. E(Y) = E(Y|X)

New cards
41

Covariance

Indicates the strength of a relationship

New cards
42

Human Capital Theory

Models education as an investment much like you would for any other capital asset, predicts age-earnings profile will be concave.

New cards
43

Consistency

The bias and sampling error approach zero as sample size increases

New cards
44

Central Limit Theory

THe Theory that under random sampling, given enough data, a random variable will approach a normal distribution.

New cards
45

Confidence Interval

how likely an estimate is close to its target in the population

New cards
46

Item Nonresponse

When data is missing because respondents refused to provide it

New cards
47

Unit Nonresponse

When data is missing because of people that the data was not collected from

New cards
48

Missing Completely at Random

Sampling error is completely independent of X and Y.

New cards
49

Missing at Random

Selection into a dataset depends on X, but not other unobserved factors.

New cards
50

Exogenous

Anything that went wrong with sampling is external.

New cards
51

Endogenous

Anything that went wrong with sampling is internal.

New cards
52

Imputation

The process of filling in the missing values based on data you observe

New cards
53

Simpson’s Paradox

The idea that there is a lurking third variable that effects correlations

New cards
54

Bayes’ Rule

A mathematical formula used to update the probability of a hypothesis based on new evidence or information. It calculates the probability of an event occurring given prior knowledge and new data.

New cards

Explore top notes

note Note
studied byStudied by 11 people
... ago
5.0(1)
note Note
studied byStudied by 12 people
... ago
5.0(1)
note Note
studied byStudied by 9 people
... ago
5.0(1)
note Note
studied byStudied by 5 people
... ago
5.0(1)
note Note
studied byStudied by 51 people
... ago
5.0(1)
note Note
studied byStudied by 4 people
... ago
5.0(1)
note Note
studied byStudied by 13 people
... ago
5.0(1)
note Note
studied byStudied by 34930 people
... ago
4.6(69)

Explore top flashcards

flashcards Flashcard (102)
studied byStudied by 31 people
... ago
5.0(1)
flashcards Flashcard (20)
studied byStudied by 2 people
... ago
5.0(2)
flashcards Flashcard (22)
studied byStudied by 7 people
... ago
4.3(3)
flashcards Flashcard (21)
studied byStudied by 10 people
... ago
5.0(1)
flashcards Flashcard (32)
studied byStudied by 1 person
... ago
5.0(1)
flashcards Flashcard (121)
studied byStudied by 37 people
... ago
5.0(1)
flashcards Flashcard (58)
studied byStudied by 4 people
... ago
5.0(1)
flashcards Flashcard (250)
studied byStudied by 4 people
... ago
5.0(2)
robot