Econ Data Analytics Midterm Spring 2025

0.0(0)
studied byStudied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/80

flashcard set

Earn XP

Description and Tags

Last updated 10:22 PM on 5/11/25
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

81 Terms

1
New cards

What does each row of a rectangular dataset represent?

An Observation

2
New cards

What does each column of a rectangular dataset represent?

A Variable (for observations)

3
New cards

What is Personally Identifying info? (PII)

Any information from a data set that could be used to individually identify a person

4
New cards

Give an example of PII:

address, first and last name, SSN, birthday etc

5
New cards

What is the difference between POPULATION and SAMPLE?

Population refers to the entire group of individuals or items being studied, while a sample is a subset of that population selected for analysis

6
New cards

Some studies collect and utilize qualitative data. Name one type of qualitative data.

An interview transcript

7
New cards

Name two steps you may need to take to prepare your data for Analysis?

  1. Transpose Records – e.g. from horizontal to vertical or vice verse — to get the right units. 

  2. Collapse Records so that smaller, more specific information is combined or summed together and the data is easier to make observations from.

8
New cards

What are imputations? Give methods of imputation.

  • methods of filling in the gaps in data

  • Methods:

    • use a value of related information —> someone in the same household

    • to the mean —> overall or subgroup

    • Regressions, multiple regressions

9
New cards

Name the three measures of central tendency

  • mean

  • median

  • mode

10
New cards

Name the three measures of variability (spread)

  • range

  • standard deviation

  • variance

11
New cards

Which measure(s) of variability is expressed in the same unit as

the variable? Why is that important?

  • range because it is just the difference

  • standard deviation —> makes it easier to interpret and use

12
New cards

Formula for Standard Deviation: (STDEV.S or STDEV.P)

<p></p><p></p>
13
New cards

Formula for Variance: (VAR.S or VAR.P)

  • equal to std Dev squared

<ul><li><p>equal to std Dev squared</p></li></ul><p></p>
14
New cards

Formula for Range: (=max-min)

  • can also use “h-L+1” to make it inclusive

<ul><li><p>can also use “h-L+1” to make it inclusive</p></li></ul><p></p>
15
New cards

What does the correlation coefficient measure? What can’t you

say based on correlation coefficient?

  • how related two variables are from -1 (strongly unrelated) to +1 (strongly related)

  • no relation is 0

  • canNOT say that one causes the other

16
New cards

Is 0.5 or -0.9 a stronger correlation coefficient?

-0.9 is a stronger correlation. Although it is negative, implying that the variables are strongly UNrelated, that is still more correlation that a 0.5 which is only half related.

17
New cards

Formula for Correlation Coefficient:

knowt flashcard image
18
New cards

What is the difference between validity and reliability?

  • Validity: does it measure what it’s supposed to? measure of accuracy

  • Reliabiltiy: does it work every time? measure of consistency

19
New cards

What are the types of numeric variables?

  • categorical

  • ordinal

  • continuous

  • discrete

  • binary

20
New cards

What is a categorical (nominal) variable?

two or more categories, but the numbers themselves have no value.
example: hair color: 1 = brunette, 2 = blonde, 3 = red, 4 = grey

21
New cards

What is an ordinal variable?

two or more categories but with levels.

example: level of edu: 1 = elementary, 2 = 2ndary

22
New cards

What is a continuous variable?

any number between two points (line of a graph)

23
New cards

what is a discrete variable?

number of children in household, cars in garage, trees in yard etc

24
New cards

What is a binary variable?

  • value of 1 or 0

  • example: female (0=no, 1= yes)

  • example: did you attend? (0=no, 1=yes)

25
New cards

What is time series data?

  • collected over time —> think Dad’s time-lapse of pond puddle

  • regular equal intervals

  • usually collected for same interval

  • example: ocean tides, quarterly revenue

26
New cards

What is cross-sectional data?

  • collected on different individuals

  • collected at one time or same period of time

  • Example: opinion polls, census

27
New cards

What is pooled data?

  • mixture of time series and cross-sectional

  • same piece of info for multiple people

  • example: annual GDP for multiple countries

28
New cards

What is panel data/logitudinal data?

  • info for same cross-sectional same is repeated

  • some variables collected @ once are constant

    • Gender

    • DOB

    • Race

  • others over time

    • Edu level

    • Earnings

    • Marital Status

29
New cards

What is extant data?

  • already available from organizations

  • was not collected FOR analysis but could be useful

    • HW and projects and art from schools

30
New cards

What is client data?

  • data that firms collect about themselves

  • sales, revenue, etc

  • usually proprietary so only in-house

31
New cards

What are Public Use Data Files (PUF)

  • end of some studies —> data made public

  • stripped of all Personally Identifying Info (PII)

  • other data-masking techniques

32
New cards

What is Personally Identifying Data (PII)?

  • anything that could attach data to a person

    • DOB

    • Social Security Number

    • First and Last Names

    • Address

33
New cards

What are data-masking techniques?

  • dropping sensitive variables entirely

  • collapsing categorical variables with small cell sizes

34
New cards

What is a Restricted-Use Data File? (RUF)

  • most PII is stripped but other data not masked

  • higher risk, may need

    • Data Use Agreement (DUA)

    • Memorandum of Understanding (MOU)

    • may need to work on computer in locked room etc

35
New cards

What may you find in Data Codebooks and Documentation?

  • list of variables

  • lots of time-saving info

    • sample definiitions

    • description of data collection —> annotated survey

36
New cards

What are four methods are analysis?

  • Experimental

  • Quasi-experimental

  • Correlational

  • Descriptive

37
New cards

What are examples of quantitative data?

  • mean, median, mode

  • distributions, frequencies

38
New cards

Ways to collect qualitative Data?

  • interviews

  • observations

  • focus groups

  • survey write-in responses

39
New cards

How can you combine data from multiple files if they have observations under the same variables?

append/stack together the files one after the other

40
New cards

What are the three ways to merge files when the variables are split up between them?

  • one-to-one

  • one-to-many

  • many-to-many

41
New cards

What is a one-to-one file merger?

  • Take two files each with half the needed variables

  • combine them into one new file with all variables

<ul><li><p>Take two files each with half the needed variables</p></li><li><p>combine them into one new file with all variables</p></li></ul><p></p>
42
New cards

What is a one-to-many file merger?

  • not sure?

<ul><li><p>not sure?</p></li></ul><p></p>
43
New cards

What is a many-to-many file merger?

  • very tricky!

  • to be avoided

44
New cards

What to do with extra data?

  1. Leave extra variables/observations and filter with “if/when” statements

  2. Create a new file and delete extras — keep raw file just in case

45
New cards

How to spot poor data quality?

  • will need a Data Dictionary

  • does variable take on expected values?

    • look for outliers

  • How much data is missing?

    • could use different notations: “missing” “.” “9999”

46
New cards

How to fix poor data?

  • Conditional formatting on Excel

    • create a rule to find out of range

    • “top” and “bottom” rules to see outliers

  • filters —> view only certain values

  • visualization methods

    • histograms and box plots

47
New cards

What is a business rules document and what would be found on it?

  • a file that lists all the analytical decisions you made

    • explain to other what you did

    • show WHY you did it

    • lets someone else replicate your process

  • any dropped or constructed variables

  • any other imputations

48
New cards

What are the three quartiles based on the median?

  • Lower (QL) or First (Q1) —> 25% of data below

  • Median or Second (Q2) —> 50% of data below

  • Upper (QU) or Third (Q3) —> 75% of data below

49
New cards

Why would you use median instead of mean?

  • insensitive to extreme values

  • if data has outliers, median better reflects central tendency

  • depends on distribution

    • normal distribution - mean

    • skewed data - median

50
New cards

Where is the “mode” a useful measure of central tendency?

  • measure of non-numeric variables

    • most common hair color

    • party affiliation

    • college majors

51
New cards

Why use “N-1” for variation measures?

  • observation values typically closer to sample than population mean

  • N-1 does more when N is small —> less correction needed for large sample

  • variance and std dev are calculated from sample mean

    • N would underestimate, N-1 doesn’t

52
New cards

How to deal with outliers in data?

  • adjust up TOP CODE or down BOTTOM CODE

  • set outliers to missing

53
New cards

What are the four ways a distribution can vary?

  • average value (shift left or right)

  • variability (change shape of curve)

  • skewness

  • kurtosis

54
New cards

What is skewness? What are the two directions?

  • measure of lack of symmetry

  • positive skewness —> mean is greater than median

  • negative skewness —> median is greater than mean

<ul><li><p>measure of lack of symmetry</p></li><li><p>positive skewness —&gt; mean is greater than median </p></li><li><p>negative skewness —&gt; median is greater than mean </p></li></ul><p></p>
55
New cards

What is formula for skewness?

  • xbar is mean

  • s is std dev

  • M is median

<ul><li><p>xbar is mean</p></li><li><p>s is std dev</p></li><li><p>M is median</p></li></ul><p></p>
56
New cards

What is Kurtosis?

a measure of how flat or peaked the distribution is

57
New cards

What are the three forms of Kurtosis?

  1. Mesokurtosis - bellshaped (red)

  2. Platykurtic - flatish with thin tails (green)

  3. Leptokurtic - peaked with fat tails (purple)

<ol><li><p>Mesokurtosis - bellshaped (red)</p></li><li><p>Platykurtic - flatish with thin tails (green)</p></li><li><p>Leptokurtic - peaked with fat tails (purple)</p></li></ol><p></p>
58
New cards

What are two visual ways to represent interval grouping of data?

histograms —> each bar is one interval

  • covers the whole set of data

Cumulative Frequency Distribution

  • shows intervals and their frequency + total frequency

59
New cards

What are dashboards and what are they used for?

  • visual presentations

  • used to track

    • historic and real-time data

    • Key Performance Indicators (KPI)

60
New cards

What is the correlation coefficient (r value) and how does it work?

  • measure of how two variables relate to each other

  • ranges from -1 to 1, with the magnitude being the strength

  • 0 means no correlation

61
New cards

Rate the strength of several intervals of correlation coefficient

  • 0.8 to 1.0 —> very strong

  • 0.6 to 0.8 —> strong

  • 0.4 to 0.6 —> moderate

  • 0.2 to 0.4 —> weakish

  • 0.0 to 0.2 —> weak

62
New cards

What is the formula for correlation coefficient?

knowt flashcard image
63
New cards

What is a correlation matrix used for?

Comparing several variables all to each other

64
New cards

What is measurement?

assignment of values to outcomes following a set of rules

65
New cards

What are the four scales of measurement?

  • Nominal —> least precise

  • Ordinal

  • Interval

  • Ratio → includes absolute zero

<ul><li><p>Nominal —&gt; least precise</p></li><li><p>Ordinal</p></li><li><p>Interval</p></li><li><p>Ratio → includes absolute zero</p></li></ul><p></p>
66
New cards

What is the nominal level of measurement?

  • named categories - least precise

  • outcome only fits in one category

  • we know categories are different

  • DONT know how they relate

    • blonde/brunette/red/grey

67
New cards

What is the Ordinal level of measurement?

  • “ord” means order

  • categories are ordered

  • we know theyre different

  • we know how they rank

  • we DONT know how different the rankings are

    • job applications

68
New cards

What is the interval level of measurement?

  • intervals are ordered along a scale of equal positions

  • we know theyre different, how they rank, difference between categories

    • tests - 10 questions right is twice 5 right

69
New cards

What is the ratio level of measurement?

  • most precise, includes absolute zero

  • only works in some disciplines:

    • physics —>no light, no molecular movement

    • BAD for knowledge tests —> zero on spelling test does NOT mean no spelling ability

70
New cards

What is the difference between observed and true score?

  • observed —> score they were given “i got 55!”

  • true —> what they actually know

    • can never really be tested perfectly

71
New cards

What is the error score and where can error come from?

  • difference between observed and true score

    • True = Observed + Error

  • goal is to minimize error score

  • outside factors that cause error

    • room too hot, too loud, i was sick, etc

    • measurement problems

72
New cards

What are the four forms of reliabiliy?

  • test-retest

  • Parallel forms

  • Internal consistency —> within one test

  • Interrater

73
New cards

What is test retest reliability?

  • is it good over time?

  • same test, same ppl, two diff times

    • good test gives similar/same answer

    • calculate correlation between two sets of scores

74
New cards

What is parallel forms Reliability?

  • make sure two diff forms of a test are the same

    • “version A” (Blu) and “Version B” (Gre)

  • ensure that same ideas are tested

  • calculate correlation between two sets

75
New cards

What is internal consistency Reliability?

  • used to check consistency within a test

  • how well do diff measures for same concept yield the same result?

    • would a certain concept do better with multiple-choice or true-false?

  • Calculate Cronbach’s Alpha

76
New cards

What is Interrater Reliability?

  • see if diff judges scores same way

    • judges at Olympics expected to give same score

  • whenever humans are used there is error

    • #of agree/ #of possible agreements

77
New cards

What are the main goals for reliability coefficients?

  • need to be positive/direct

  • should be as large as possible

    • -0.7 is really bad, 0.3 still isn’t great

78
New cards

What are the three types of validity?

  • content

  • criterion

  • construct

79
New cards

What is content validity?

  • does the sampled content really represent the population

  • use on achievement tests

    • ask experts to make judgement that the items represent the universe of possible items on the same topic

80
New cards

What is criterion validity?

  • are scores systematically linked to other variiables to show that the testee understands material

  • Concurrent validity —> is the new measure simular to tried-and-true ones?

    • correlate new scores with proven ones

  • Predictive validitiy —> ability of test to predict future outcomes

81
New cards

What is Construct validity?

  • the test measures a psychological construct

    • correlate test scores with theorized outcome that reflect the construct you’re testing

    • example of measuring aggression from correlation with fights and suspensions

Explore top notes

note
Western Civ Essay topics
Updated 661d ago
0.0(0)
note
organic basics
Updated 312d ago
0.0(0)
note
Chapter 5- The American Revolution
Updated 1335d ago
0.0(0)
note
Chapter 53: Airway management
Updated 618d ago
0.0(0)
note
Chapter 11: Atmospheric Pollution
Updated 900d ago
0.0(0)
note
Chapter 8 - Metabolism
Updated 1220d ago
0.0(0)
note
AP Music Theory Ultimate Guide
Updated 1058d ago
0.0(0)
note
Western Civ Essay topics
Updated 661d ago
0.0(0)
note
organic basics
Updated 312d ago
0.0(0)
note
Chapter 5- The American Revolution
Updated 1335d ago
0.0(0)
note
Chapter 53: Airway management
Updated 618d ago
0.0(0)
note
Chapter 11: Atmospheric Pollution
Updated 900d ago
0.0(0)
note
Chapter 8 - Metabolism
Updated 1220d ago
0.0(0)
note
AP Music Theory Ultimate Guide
Updated 1058d ago
0.0(0)

Explore top flashcards

flashcards
Vocab List #6
26
Updated 725d ago
0.0(0)
flashcards
Global Hazards Flashcards
100
Updated 831d ago
0.0(0)
flashcards
theology exam
49
Updated 1165d ago
0.0(0)
flashcards
ProBook #2
49
Updated 536d ago
0.0(0)
flashcards
Oceania
32
Updated 395d ago
0.0(0)
flashcards
Level G Unit 11
20
Updated 1207d ago
0.0(0)
flashcards
Vocab List #6
26
Updated 725d ago
0.0(0)
flashcards
Global Hazards Flashcards
100
Updated 831d ago
0.0(0)
flashcards
theology exam
49
Updated 1165d ago
0.0(0)
flashcards
ProBook #2
49
Updated 536d ago
0.0(0)
flashcards
Oceania
32
Updated 395d ago
0.0(0)
flashcards
Level G Unit 11
20
Updated 1207d ago
0.0(0)