Data Science Midterm 1

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/86

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

87 Terms

1
New cards

What are the strength of polls?

Quick, inexpensive, can represent large populations if well designed, track over time

2
New cards

What are the weaknesses of polls?

Sampling bias, nonresponse bias, poor question wording, bad timing

3
New cards

What makes a high quality poll?

Random sampling, transparent methods, weighting for demographics, reports margin of error

4
New cards

What makes a low-quality poll

Nonrandom or opt-in samples, small sample size, unclear methods, biased questions.

5
New cards

What is post-stratification weighting?

adjusting sample data to match population demographics (eg. by age, race, gender)

6
New cards

Why use survey weights?

To correct for over- or underrepresented groups so result better reflect the population

7
New cards

What”s the difference between correlation and causation?

correlation = relationship; causation = one variable directly affects another.

8
New cards

What conditions are needed for a casual claim?

random assignment, control group, and no confounding variables

9
New cards

What is a confounder?

A variable related to both the cause and effect that can distort results

10
New cards

What is ATE (Average Treatment Effect)?

The average difference in outcomes between the treatment and control groups

11
New cards

How is ATE calculated?

ATE= mean(outcome in treatment) - mean (outcome in control)

12
New cards

What does statistical significance mean?

The observed effect Îs unlikely due to chance 

13
New cards

How do you calculate cost per vote?

total campaign cost / estimated number of additional votes gained

14
New cards

What does cost per vote tell us?

The efficiency of a political intervention in influencing votes

15
New cards

Why are the visualizations important to data science?

They help reveal patterns, trends, and outliers in data

16
New cards

What are signs of a misleading visualization?

Truncated axes, distorted scales, lack of labels, or missing content

17
New cards

How can visualizations be improved?

use full scales, clear labels, consistent colors, and contextual explanation

18
New cards

What is computational thinking?

using coding and algorithms to analyze, organize, and manipulate data

19
New cards

What’s data cleaning?

fixing missing or incorrect data, removing duplicates, formatting variables

20
New cards

What is cross tab?

A table showing the relationship between two categorial variables ( eg. party x gender)

21
New cards

What is inferential thinking?

using data from a sample to make conclusions about a larger population

22
New cards

what are the two main types of inference?

estimation( predicting values) and hypothesis testing (testing claims)

23
New cards

What is sampling variability?

The natural difference in results when you take different random samples from the same population

24
New cards

What is computational thinking in data science?

using programming to process and visualize data

25
New cards

What is inferential thinking in data science?

Using probability and statistics to reason about uncertainty and draw conclusions

26
New cards

How do computational and inferential thinking work?

code organizes and analyzes data; inference interprets what those results mean for the real world.

27
New cards

What are means by group

calculating the average of a variable for each subgroup (eg. mean income by gender)

28
New cards

What are the main goals of a political campaign?

To persuade undecided voters and turn out supporters until reaching 50% +1 of votes

29
New cards

What are the two main ways campaigns?

Persuasion (changing minds) and mobilization (getting supporters to votes.

30
New cards

What is targeting in a campaign?

identifying which voters to reach out to, using data to prioritize persuasion or turnout

31
New cards

What do turnout targeting and supporter targeting mean?

Turnout targeting predicts who will vote; supporter targeting who supports your candidate

32
New cards

Why is the mathematics of victory different across states?

Different partisan balances and voter behaviors require different strategies (eg. Alabama vs. Pennsylvania)

33
New cards

What are the six voter types campaigns analyze?

Supporters who will vote, supporters who might not, undecideds who will vote, undecideds who might not, opponents who will vote, opponents who might not 

34
New cards

What law created modern digital voter files?

The Help America Vote Act (2002) requiring statewide computerized registration lists

35
New cards

What does a voter file include ?

Name, address, vote history (whether you voted, not who for) and sometimes demographics

36
New cards

What are “commercial voter files”?

Enhanced voter lists combining public voter data with consumer info (like donations, subscriptions, property value)

37
New cards

Why campaigns use voter files?

They are the foundation for modeling turnout, support, and targeting.

38
New cards

What political factors affect voter data laws?

States pass laws shaped by partisan goals- some limit race/ party data, others expand it for campaign advantage.

39
New cards

What’s a major ethical concern about campaign data use?

Microtargeting can narrow political outreach and reduce exposure to diverse viewpoints

40
New cards

What is Random Digit Dialing (RDD) and why is it declining?

Random phone surveys once produced representative samples, but response rates have collapsed are now biased

41
New cards

What makes today’s polling less reliable?

Non-response bias: certain political groups (like conservatives) are less likely to participate.

42
New cards

Why are voter files better for sampling than phone lists?

They include verified vote histories, reducing reliance on unreliable self-reported voting.

43
New cards

What are “turnout models”?

Statistical models predicting how likely each voter is to vote, based on history and demographics.

44
New cards

What was the major reason for polling error in 2020?

Between-party and within-party nonresponse — too few Republicans and unrepresentative Democrats responded.

45
New cards

What is a “tipping point state”?

The state that gives a candidate the decisive electoral votes to reach 270.

46
New cards

Why do campaigns focus on tipping point states?

Shifting a few votes there can change the national outcome, unlike in safe states.

47
New cards

What is “response substitution” in polling?

When respondents answer a question about change by expressing their current opinion instead (e.g., Roy Moore example)

48
New cards

How do survey experiments improve over direct questions?

By randomly assigning some respondents to receive information, isolating the message’s causal effect.

49
New cards

Why can’t we rely on observational data to evaluate campaign strategies?

Confounding bias — people who choose to engage are already different from those who don’t.

50
New cards

What is selection bias in campaigns?

More politically interested voters are likelier to be contacted, making “contact increases turnout” claims misleading.

51
New cards

What does random assignment achieve in experiments?

It makes treatment and control groups comparable and removes selection bias.

52
New cards

What is the “potential outcomes framework”?

Thinking about what would happen with and without treatment for each voter to define causal effects.

53
New cards

What does the “fundamental problem of causal inference” mean?

You can never observe both outcomes for the same individual — only one.

54
New cards

What are examples of confounding in campaign studies?

Comparing contacted vs. uncontacted voters without accounting for political interest or prior turnout.

55
New cards

What does a high p-value (e.g., 0.8) mean?

The result could easily occur by random chance; not statistically significant.

56
New cards

What’s the difference between statistical and practical significance?

Statistical = unlikely by chance; Practical = meaningful for real-world strategy.

57
New cards

Why visualize data in campaign analysis?

To identify outliers, misreporting, and trends that affect messaging and targeting.

58
New cards

What is the “Lie Factor” in graphs?

The exaggeration or understatement of effects due to misleading scales or visuals.

59
New cards

What is the “Data-Ink Ratio”?

The proportion of ink used for data versus decoration — high ratio = more honest graph.

60
New cards

What is Simpson’s Paradox, and why does it matter?

Trends can reverse when groups are combined — can mislead campaign data interpretation.

61
New cards

Why are outliers important in campaign data?

They can either reveal new voter patterns or distort averages if not examined carefully.

62
New cards

What is the formula for deciding whether someone votes?

pB + D > C

63
New cards

In the voting formula, what do the letters stand for?

• p = probability vote is pivotal

• B = benefit of one candidate winning

• D = psychological/civic duty benefit

• C = cost of voting

64
New cards

Which parts of the formula can campaigns influence?

D (increase civic motivation) and C (reduce voting costs)

65
New cards

What is an experiment in political science?

A study where subjects are randomly assigned to treatment or control to measure a causal effect.

66
New cards

What’s the difference between percent and percentage point?

• Percentage point = simple subtraction of percentages

• Percent difference = relative change ((new − old)/old × 100)

67
New cards

What does “statistically significant” mean?

The result is unlikely due to chance (p-value < 0.05).

68
New cards

What is a p-value?

The probability of seeing a result as extreme as the observed one if the null hypothesis were true.

69
New cards

What is the null hypothesis in an experiment?

That the treatment has no effect.

70
New cards

What was tested in the Virginia experiment?

Whether sending text messages increased voter turnout.

71
New cards

What were the turnout rates?

Treatment 11.3%, Control 10.7%

72
New cards

How many text messages created one new voter?

167 text messages.

73
New cards

What was the cost per vote if each text cost $0.07?

$12 per new voter.

74
New cards

Was it effective?

Yes—cheap and slightly effective, but small effect.

75
New cards

What was the purpose of the Georgia experiment?

To test if increasing civic duty (“D”) boosted turnout.

76
New cards

What were the three groups?

Control, Plain Reminder, and Gratitude Letter.

77
New cards

Why do we randomize?

To make groups comparable and remove bias.

78
New cards

What is the goal of calculating ATE?

To find the average impact of the treatment on the outcome.

79
New cards

What does a small p-value mean?

The observed effect is unlikely due to random chance

80
New cards

Why is explaining your reasoning important on the midterm?

You get partial credit if the logic is clear even if math isn’t perfect.

81
New cards

What is the #1 rule for voter mobilization?

The more personal the contact, the more effective.

82
New cards

Which methods work best?

Door-to-door and volunteer phone calls.

83
New cards

What’s the effectiveness per contact?

 •    Door-to-door: 1 per 15

    •    Volunteer phone: 1 per 35

    •    Commercial phone: 1 per 125

    •    Mail (nonpartisan): 1 per 273

    •    Mail (social pressure): 1 per 77

84
New cards

Do emails or social media ads increase turnout?

No clear effect

85
New cards

What matters most in personal communication

Quality — authentic, unscripted, personal contact.

86
New cards

Do advocacy or issue-based appeals mobilize people?

Not much — making voting easy and social norms are more effective.

87
New cards

Is there strong evidence of synergy between tactics?

No — effects are additive or diminishing, not multiplying.

Explore top flashcards

Basic Spanish
Updated 27d ago
flashcards Flashcards (22)
lang big vocab 1-3
Updated 627d ago
flashcards Flashcards (46)
chemistry ch 11
Updated 1071d ago
flashcards Flashcards (65)
Bio 160 - First exam
Updated 948d ago
flashcards Flashcards (200)
Unit 2 Test
Updated 20d ago
flashcards Flashcards (32)
English Vocab
Updated 40d ago
flashcards Flashcards (44)
Basic Spanish
Updated 27d ago
flashcards Flashcards (22)
lang big vocab 1-3
Updated 627d ago
flashcards Flashcards (46)
chemistry ch 11
Updated 1071d ago
flashcards Flashcards (65)
Bio 160 - First exam
Updated 948d ago
flashcards Flashcards (200)
Unit 2 Test
Updated 20d ago
flashcards Flashcards (32)
English Vocab
Updated 40d ago
flashcards Flashcards (44)