BUSN5000 Midterm 1

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/53

flashcard set

Earn XP

Description and Tags

Business

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

54 Terms

1
New cards

Mutate

Tidyverse function for creating new variables.

2
New cards

Filter

Tidyverse function for including data matching a certain condition

3
New cards

Select

Tidyverse function for keeping/dropping variables

4
New cards

Group_By

Tidyverse function for categorizing data

5
New cards

Summarise

Tidyverse function for computing the basic summary statistics

6
New cards

Population Average

The big average we’re trying to figure out

7
New cards

Sample average

The average of the population we have.

8
New cards

records

A data set is made up of _____ that contain information on a specific entity.

9
New cards

fields

Each record is made of _____ that contain measurements of known types.

10
New cards

Panel Data

Data collected over time on multiple entities, such as individuals, firms, or countries.

11
New cards

Acquire, Transform, Analyze, Communicate

The 4 stages of analysis

12
New cards

Cross Section

Many units observed at a particular time

13
New cards

Time Series

A single unit observed over multiple time periods

14
New cards

Data Set

Multiple data tables structured for a particular analysis

15
New cards

Database

A collection of tables where each table has some known and meaningful relationship to the other tables.

16
New cards

Volume

A word to describe the literal size and scale of data

17
New cards

Velocity

A word to describe the speed of generation, collection, and storage of data

18
New cards

Variety

A word to describe the complexity of sources and forms of data

19
New cards

Veracity

A word to describe the degree of consistency and completeness of data.

20
New cards

Content

What a variable measures

21
New cards

Validity

Whether a variable measures what its supposed to measure

22
New cards

Reliability

Whether repeated measurements return the same value

23
New cards

Comparability

Whether a variable is measured the same way across units

24
New cards

Coverage

Whether all units intended for inclusion are included

25
New cards

Selection

Whether selected units are representative of those not covered

26
New cards

Data Schema

A representation of the data structure that comprises all the attributes of the data and their data types

27
New cards

Vector

A sequence of data elements of the same type

28
New cards

Matrix

A two-dimensional array of data elements of the same type

29
New cards

Data Frame

A tabular data structure

30
New cards

List

An ordered collection of objects

31
New cards

Factor

A vector that can contain only predefined values, and is used to store categorical data

32
New cards

Array

A multidimensional collection of same-type data elements.

33
New cards

Frequentism

The approach to thinking that states the probability of some event happening is the number of times it happens over the number of random trials

34
New cards

Law of Large Numbers

The idea that the more trials you use, the closer your data gets to being exactly accurate.

35
New cards

Estimand

The thing we want to estimate

36
New cards

Estimator

The formulas we use to make an estimate

37
New cards

Estimate

Our best guess for something, with bias and sampling error.

38
New cards

Measurement error

When a variable’s empirical measurement does not accurately capture the thing we are interested in

39
New cards

CEF

The workhorse of data science, has the expected value (average) of a variable given another variable.

40
New cards

Law of Iterated Expectations

The law that states the unconditional expectation is equal to the weighted average of conditional expectations. E(Y) = E(Y|X)

41
New cards

Covariance

Indicates the strength of a relationship

42
New cards

Human Capital Theory

Models education as an investment much like you would for any other capital asset, predicts age-earnings profile will be concave.

43
New cards

Consistency

The bias and sampling error approach zero as sample size increases

44
New cards

Central Limit Theory

THe Theory that under random sampling, given enough data, a random variable will approach a normal distribution.

45
New cards

Confidence Interval

how likely an estimate is close to its target in the population

46
New cards

Item Nonresponse

When data is missing because respondents refused to provide it

47
New cards

Unit Nonresponse

When data is missing because of people that the data was not collected from

48
New cards

Missing Completely at Random

Sampling error is completely independent of X and Y.

49
New cards

Missing at Random

Selection into a dataset depends on X, but not other unobserved factors.

50
New cards

Exogenous

Anything that went wrong with sampling is external.

51
New cards

Endogenous

Anything that went wrong with sampling is internal.

52
New cards

Imputation

The process of filling in the missing values based on data you observe

53
New cards

Simpson’s Paradox

The idea that there is a lurking third variable that effects correlations

54
New cards

Bayes’ Rule

A mathematical formula used to update the probability of a hypothesis based on new evidence or information. It calculates the probability of an event occurring given prior knowledge and new data.