Lecture 7 - Issues in Data Collection

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/25

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

26 Terms

1
New cards

Survey

method of gathering info using relevant questions from a sample of people with the aim of understanding populations as a whole

  • Can be used for scientific research, market research, political data, etc

2
New cards

Opinion poll

type of survey asking people for their opinion, usually about a policy or about a candidate for public office

3
New cards

Fundamental goal of surveys and polls

  1. Sample: a set of people who will answer the questions

    1. Ideally large and representative of the population we want to generalize to

  2. Questionnaire: a set of questions to capture what we want to learn

    1. Typically multiple-choice or a likert/rating scale

  3. Mode of interview: how we ask the questions

    1. In-person, mail, phone, online

4
New cards

Outcomes of simulated polls of 100 ppl

  • Let’s assume that the 55% is the true level of support in a population

  • Assume you repeatedly sample 100 people

  • Histogram: with samples of 100 ppl, the outcome varies a lot around the mean (55%). In most samples, we get numbers between 50-60, but occasionally beyond this range

<ul><li><p><span style="background-color: transparent;"><span>Let’s assume that the 55% is the true level of support in a population</span></span></p></li><li><p><span style="background-color: transparent;"><span>Assume you repeatedly sample 100 people</span></span></p></li><li><p><span style="background-color: transparent;"><span>Histogram: with samples of 100 ppl, the outcome varies a lot around the mean (55%). In most samples, we get numbers between 50-60, but occasionally beyond this range</span></span></p></li></ul><p></p>
5
New cards

Outcomes of simulated polls of 1000 people

  • Let’s assume that the 55% is the true level of support in a population

  • Assume you repeatedly sample 1000 people

  • Histogram: with samples of 1000 ppl, the outcome varies less around the mean (55%). In most samples, we get numbers between 53-57

<ul><li><p><span style="background-color: transparent;"><span>Let’s assume that the 55% is the true level of support in a population</span></span></p></li><li><p><span style="background-color: transparent;"><span>Assume you repeatedly sample 1000 people</span></span></p></li><li><p><span style="background-color: transparent;"><span>Histogram: with samples of 1000 ppl, the outcome varies less around the mean (55%). In most samples, we get numbers between 53-57</span></span></p></li></ul><p></p>
6
New cards

95% confidence interval

knowt flashcard image
7
New cards

Margin of error: half the 95% confidence interval

knowt flashcard image
8
New cards

Law of large numbers

states that as the sample size of a random experiment increases, the average value of the outcomes will converge to the expected value or true probability or population mean

  • Larger sample = lower sampling/statistical error or lower margin of error

  • Systematic errors such as error due to biased samples cannot be addressed by large samples

9
New cards

Statistical error

due to limited sample size, larger samples can minimize this error

10
New cards

Systematic error

due to biases in selecting your sample and other design problems. Larger samples cannot minimize this error

11
New cards

Probability sample

gives everybody in the population an equal chance of being included (e.g., calling random phone numbers)

  • This is different from non-probability sample, such as convenience samples. When an NYU researcher who wants to study the US population recruits an NYU sample, that’s a convenience sample. It’s not representative of the population but is easy to collect

12
New cards

Random selection

randomly select and contact respondents to avoid selection bias

  • But not everyone who is contacted participates

  • More educated, higher income ppl are more likely to participate

13
New cards

Weighting

mathematically boosting (or providing a higher weight to) the voices of people who belong to groups that are less likely to participate in polls

14
New cards

Oversampling

the researcher intentionally over represents one or more groups for which they expect lower representation

  • Ex: I'm collecting a sample of 1000 Americans. There are only ~2% South Asians in the US (20 ppl in my sample). Because 20 is too few, I may sample 100 South Asians (constituting 10% of my sample). My final analysis will under-weight the oversampled group to their actual proportion in the population

15
New cards

Reasons for systematic bias

  • Some people are easier to reach (e.g., WEIRD samples)

  • Some people may be more willing to participate (e.g., people will stronger opinions)

  • The source (e.g., Fox news vs CNN) or topic (e.g., vaccines) may affect who participates

  • People lie in surveys, even for harmless reasons (e.g., to make themselves look good)

  • Biased attrition in longitudinal surveys (e.g., anti-vaxxers may drop out in a vaccine study)

  • Misleading question wording

16
New cards

Leading questions

  • Would you favor or oppose taking military actions in Iraq to end Saddam Husssein’s rule? → 68% said they favored military action

  • Would you favor or oppose taking military action in Iraq to end Saddam Hussein’s rule even if it meant that U.S. forces might suffer thousands of casualties? → only 43% said they favored military action

17
New cards

Double-barreled questions

Double-barreled questions ask respondents to evaluate more than one concept. Better to ask two separate questions

  • Ex: How much confidence do you have in President Obama to handle domestic and foreign policy?

    • What if participants have confidence for domestic but not foreign policy? 

18
New cards

Avoiding reactance

  • Support for expanding “assistance to the poor” vs support for expanding “welfare”

  • Support for affordable health insurance vs support for obamacare 

  • “Making it legal for doctors to give terminally ill patients the means to end their lives,” vs “making it legal for doctors to assist terminally ill in committing suicide”

  • Consider whether certain words may produce bias, be viewed as biased or offensive, or trigger an emotional reaction

19
New cards

Negative Wording

Issues: 1) larger cognitive load for participants, 2) participants may misread the question

  • Simply negative wording: “I disagree that it is important to fund the arts”; “I do not believe that it is important to fund the arts”

  • Double negative wording: “I agree that it is not unimportant to fund the arts” 

  • Complex negative wording: “I disagree that it is not important to fund the arts”

20
New cards

How question choices matter

  • Choosing from a set of options versus allowing open text entry can return very different results

  • Forced-choice questions tend to yield more accurate responses than select-all-that-apply questions, especially for sensitive questions

21
New cards

Acquiescence bias

respondents have a greater tendency to agree with statement on surveys, especially respondents lower education levels

  • even more pronounced when there’s an interviewer present

22
New cards

Order effects

questions early in a questionnaire can have unintended effects on how respondents answer subsequent questions

23
New cards

How can polls be misused/manipulated?

  • By using misleading questions wording, choice wording, etc.

  • By collecting biased samples

  • By creating fake surveys

24
New cards

Potemkin numbers

Meaningless statistics designed to look real and authoritative

  • Results from poorly designed surveys are not reliable, but they can serve to advance the incentives of motivated individuals/groups

25
New cards

Prediction/forecasting

  • Telling in advance whether an event will happen or which event will happen

    • Elections results but also world events like collapse of the Soviet Union

    • Sports forecasting and betting

  • Uses past results and information (e.g., election predictions use state and national poll data, simulations, data on inflation, past voter turnout)

  • In electoral prediction, the goal is to predict who will win an election or the probability for each candidate winning

26
New cards

Challenges

  • Unanticipated sampling biases (in addition to usual/anticipated sampling biases):

    • Supports of one candidate are more willing to participate

  • Requires predicting not just people’s opinions but also who will vote 

  • Requires predicting how much each voter can actually influence the outcome given not everyone has equal influence (gerrymandering, swing states, etc)

  • Requires accounting for changes in people’s opinions in the days leading up to the election, especially for undecided voters