UMD SURV400 Midterm Flashcards

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/232

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

233 Terms

1
New cards

What is a survey?

A systematic method for gathering information from entities to construct estimates of their attributes on average.

2
New cards

What are examples of major surveys?

- Gallup public opinion polls,

- Harvard Adult Development Survey,

- the General Social Survey.

3
New cards

Why are surveys rapidly changing today?

Technology has changed how surveys are conducted and how people interact, causing the field to evolve faster than education and research can keep up.

4
New cards

What is survey methodology?

A field and profession that studies survey design, collection, processing, and analysis; it is multidisciplinary (statistics, psychology, sociology, etc.).

5
New cards

Survey

A systematic method for gathering information from entities (often people) to construct estimates of the attributes those entities have on average

6
New cards

Survey Methodology

A scientific field and profession that seeks to identify and study principles pertaining to survey design, collection, processing, and analysis

7
New cards

Data Science

Extracting meaning from and interpreting data using tools and methods from statistics and machine learning

8
New cards

Descriptive Research Question

Seeks to summarize characteristics of a set of data with no interpretation—just facts/attributes (e.g., "What is the average number of doctor visits reported by respondents?")

9
New cards

Exploratory Research Question

Analyzes data for patterns, trends, or relationships between variables, used for hypothesis generation: these are unplanned questions (e.g., "Is there an age difference in doctor visits?")

10
New cards

Predictive Research Question

Determines whether one or more phenomena can be used to forecast some future outcome, less interested in "why," just what predicts the outcome (e.g., "Can we guess whether a single household will be less likely to respond?")

11
New cards

Causal Research Question

Asks whether changing one factor will change another factor, requires controlled randomized trials or experiments to establish cause and effect (e.g., "Will this drug intervention reduce illicit drug use?")

12
New cards

Inferential Research Question

Uses a sample to make conclusions about a larger population

13
New cards

Mechanistic Research Question

Asks about the exact mechanism or process by which something occurs

14
New cards

Good Research Question Criteria

Must be: (1) of interest to your audience, (2) not already answered, (3) stemming from a plausible framework, (4) falsifiable, and (5) specific

15
New cards

Data Generation Process

The method by which data are collected, surveys often have large variation in size/quality, depend on modality, are often cross-sectional, and usually involve a sample rather than a census

16
New cards

Data Curation/Storage

Includes editing, de-identification, data entry, coding, error checking, dataset construction, codebook construction, building weights, and imputing missing values

17
New cards

Data Analysis

Using statistical models (t-tests, ANOVAs, regression) to make inferences about populations from samples, or machine learning models (KNN, decision trees, logistic regression) to make predictions

18
New cards

Data Output/Access

Communicating results through papers, dashboards, videos, blogs (science communication is almost as important as the science itself)

19
New cards

Total Survey Error (TSE)

A framework for thinking about various sources of error that may affect survey statistics, errors reflect uncertainty in an inference, not necessarily mistakes

20
New cards

Construct

Elements of information (variables) sought by researcher: usually abstract, described by words, often latent and not directly observable (e.g., happiness, quality of life, belief in God)

21
New cards

Measurement/Operationalization

Linking theoretical constructs to observable variables, the step-by-step protocols implemented to gather data: the construct is the "what" and measurement is the "how"

22
New cards

Response

The respondent value(s) from your measurement scheme (e.g., answers to questions, blood pressure readings)

23
New cards

Edited Response

Transforming data for specific use, including coding (text to numbers), acceptable answer sets, consistency rules, and reverse scoring negatively worded items

24
New cards

Target Population

The set of units to be studied: often abstractly defined with several ways to operationalize (e.g., adults in the US, users of a social media platform)

25
New cards

Sampling Frame

A set of units identified in some way that they could be sampled and located: ideally, every unit in the target population appears once and only once

26
New cards

Sample

A subset of the population from which measurements are drawn: the goal is to make inferences about the population from the sample

27
New cards

Respondents

Sample units that were successfully measured: the respondent pool may or may not equal the sample size

28
New cards

Post-survey Adjustments

Changes to survey data to make estimates better reflect the full target population, including selection weights, imputation, nonresponse weights, and poststratification

29
New cards

True Value

An idealized concept of a quantity to be measured: abstract and never truly known, but serves as a standard for comparison

30
New cards

Interviewer Variance

Error arising from different interviewers collecting different data despite having the same training, procedures, and workloads

31
New cards

Interviewer Bias

When personal factors of the interviewer systematically impact data collection

32
New cards

Sampling Variance

Variation in values of a survey statistic because different subsets of the population fall into samples over replications of the same survey design, measured via confidence intervals and standard errors

33
New cards

Sampling Bias

Consistent failure to estimate the proportion of the population correctly (e.g., relying on intro psych students when interested in all emerging adults): sampling bias is 0 for probability samples

34
New cards

Accuracy (Quality Dimension)

Total survey error is minimized

35
New cards

Credibility (Quality Dimension)

Data considered trustworthy by the survey community

36
New cards

Comparability (Quality Dimension)

Demographic, spatial, and temporal comparisons are valid

37
New cards

Relevance (Quality Dimension)

Data satisfy users' needs

38
New cards

Timeliness (Quality Dimension)

Data deliveries adhere to schedule

39
New cards

Completeness (Quality Dimension)

Data rich enough to satisfy analysis objectives without undue burden on respondents

40
New cards

R

An open-source statistical programming language

41
New cards

RStudio

An integrated development environment (IDE) for R that enhances R's usability by allowing you to keep track of objects, plots, run scripts, and more

42
New cards

Object

A container in R that stores information: you assign information using <- or = (e.g., course <- 400)

43
New cards

Numeric Data Type

Data consisting of all real numbers including whole numbers and decimals (e.g., num <- 6.5)

44
New cards

Integer Data Type

Data consisting only of whole numbers, denoted by an L (e.g., int <- 4L): a more efficient way of storing whole numbers

45
New cards

Character Data Type (String)

Text-based data enclosed in quotes (e.g., char <- "hello"), if not in quotes, R will try to interpret it as an object

46
New cards

Logical Data Type (Boolean)

Data that takes on TRUE or FALSE values (must be capitalized) (results from logical comparisons like 3 > 2)

47
New cards

Factor Data Type

A data structure for categorical variables that can be ordered or unordered. technically a structure, not a basic data type

48
New cards

class() Function

Command to check what data type is contained in an object

49
New cards

Vector

A unidimensional object that holds a singular data type, created using c() which stands for concatenate

50
New cards

Data Coercion Hierarchy

When mixing data types in vectors, R forces them to be the same following: character > numeric > integer > logical

51
New cards

length() Function

Returns the number of elements in a vector

52
New cards

Data Frame

R's primary means of data storage, similar to a spreadsheet where you can mix data types between columns but not within columns

53
New cards

nrow() and ncol() Functions

Return the number of rows and columns in a data frame, respectively. Only work on multi-dimensional objects

54
New cards

Indexing with []

Accessing specific elements of an object. R uses 1-based indexing (first element is position 1)

55
New cards

$ Operator

Used to index a named column in a data frame (e.g., df$column_name) (generally preferred method)

56
New cards

subset() Function

A more intuitive way to subset data frames based on conditions (e.g., subset(df, ratings > 7))

57
New cards

for Loop

A programming technique that runs a block of code a pre-specified number of times: structure: for(i in 1:10){ code }

58
New cards

install.packages() Function

Installs a package from CRAN, only needs to be done once and requires quotes around package name

59
New cards

library() Function

Loads an installed package for use in the current R session and must be done every time you start a new session

60
New cards

read.csv() Function

Reads a CSV file into R as a data frame

61
New cards

Working Directory

The default location where R searches for files: check with getwd(), change with setwd()

62
New cards

Commenting with #

Anything after # in a line will not be interpreted by R, used for transparency and reproducibility

63
New cards

summary() Function

Provides a five-number summary (min, 25th percentile, median, 75th percentile, max) and mean for numeric variables

64
New cards

mean(), sd(), var(), cor(), median()

Functions for calculating descriptive statistics on vectors

65
New cards

package::function() Syntax

Calling a function while explicitly stating which package it comes from, preferred for clarity and transparency

66
New cards

Mode of Data Collection

The method by which survey data are collected (e.g., face-to-face, telephone, mail, web)

67
New cards

CAPI (Computer-Assisted Personal Interviewing)

Computer displays questions on screen, interviewer reads them to respondent and enters answers

68
New cards

CATI (Computer-Assisted Telephone Interviewing)

Telephone counterpart to CAPI. Interviewer calls respondent, reads questions from computer, enters responses

69
New cards

CASI (Computer-Assisted Self-Interviewing)

Respondent completes survey on computer themselves. This can include text, audio, or video stimuli

70
New cards

ACASI (Audio Computer-Assisted Self-Interviewing)

Respondent sees questions on computer, hears recorded audio of questions, and enters their own answers. This increases privacy for sensitive topics

71
New cards

IVR (Interactive Voice Response)

Telephone counterpart of ACASI: respondent calls in, hears recorded questions, answers by keypad

72
New cards

TDE (Touchtone Data Entry)

Respondents call toll-free number, hear recorded questions, enter data using telephone keypad

73
New cards

SAQ (Self-Administered Questionnaire)

Paper questionnaire completed by respondent without interviewer present

74
New cards

CSAQ (Computerized Self-Administered Questionnaire)

Electronic version of SAQ completed on computer

75
New cards

Coverage Error (Mode)

Error due to the fact that not every unit in the population is represented on the sampling frame. Telephone surveys exclude those without phones, and web surveys exclude those without internet

76
New cards

Nonresponse Error (Mode)

Error that varies across modes. people may be more likely to complete certain types of surveys than others

77
New cards

Measurement Error (Mode)

Deviations of answers from true values. Sources include respondent, interviewer, instrument/questionnaire, and mode of data collection

78
New cards

Interviewer Effects (Positive)

Interviewers achieve higher response rates, motivate respondents, probe inadequate responses, provide feedback, clarify questions

79
New cards

Interviewer Effects (Negative)

Can lead to biased responses on sensitive questions, more expensive than self-administered modes

80
New cards

Social Desirability Bias (Mode)

Tendency to underreport sensitive behaviors in face-to-face surveys (self-administered modes reduce this)

81
New cards

Fixed vs. Variable Costs

Fixed costs (postage for set number of invites) vs. variable costs (hourly wages for unknown number of interviews) (affects mode choice)

82
New cards

Mixed-Mode Surveys

Using a combination of modes to compensate for weaknesses of individual modes, this is increasingly common

83
New cards

Random Digit Dialing (RDD)

Sampling method for telephone surveys that randomly generates phone numbers. No equivalent exists for web surveys

84
New cards

Probability-Based Online Panels

Panels where members are recruited via probability sampling methods (e.g., LISS, AmeriSpeak, KnowledgePanel)

85
New cards

Non-Probability Online Panels

Panels where members self-select or are recruited through non-random means. These are common but potentially biased

86
New cards

Elements

The unit of observation in a study (e.g., customers, households, businesses, schools, tweets)

87
New cards

Target Population Characteristics

Must be: (1) finite (can theoretically be counted), (2) observable/accessible, and (3) specific to a time frame (implies boundaries of space and time)

88
New cards

Unambiguous Population Definition

You should be able to clearly place elements in or out of the target population: "young adults in college" is better than "all young adults"

89
New cards

Sampling Frame

A list of elements in the target population that can be sampled and located. Coverage varies depending on frame quality

90
New cards

Coverage

The percent of the target population included in the frame (generally theoretical since we rarely know the exact population size)

91
New cards

Perfect Coverage

Ideal but unrealistic situation where the sampling frame exactly matches the target population

92
New cards

Undercoverage

When some members of the target population are missing from the sampling frame. This is the primary coverage concern since we can't reach them

93
New cards

Overcoverage

When the frame contains units not in the target population: includes ineligibles, duplicates, and blanks and can be identified and removed

94
New cards

Undercoverage Bias

Bias introduced when there is a difference between covered and uncovered units on the statistic(s) of interest

95
New cards

Duplication (Overcoverage)

Multiple frame entries link to the same element (e.g., a person with two phone numbers listed)

96
New cards

Clustering (Overcoverage)

Multiple elements can be reached via the same frame entry (e.g., a landline that reaches an entire household)

97
New cards

Ineligibles (Overcoverage)

Frame entries that are not part of the target population (e.g., businesses on a frame meant for individuals)

98
New cards

Blanks (Overcoverage)

Frame entries that don't connect to any unit (e.g., disconnected phone numbers)

99
New cards

Solutions for Overcoverage

Delete elements once identified: for clustering, take whole cluster or select one and weight up, for duplication, delete duplicates or weight down

100
New cards

Solutions for Undercoverage

Use multiple frames, combine modes, apply weighting (post-stratification), or change target population definition to match frame