Study Notes on Sampling, Observation, Surveys, Ethics, and Data Interpretation

Population, sampling, and generalizability

Population vs. sample definitions:
- Population: the entire group of interest (e.g., all gabbling college students).
- Sample: a subset drawn from the population for study.
Gold standard in social science research: random sample
- Definition: every member of the population has an equal chance of being selected.
- Intuition: lottery-like selection where everyone is in the hat and randomly drawn.
- Implication: data from a random sample are typically more generalizable to the population.
Real-world challenge: truly random samples are often not achieved
- People screen calls, blocks unknown callers, etc.
- Telephone survey limitations: many people use cell phones; landlines underrepresent younger generations and non-owners.
- If sampling via random digit dialing (house phones) or phone books, you may oversample older populations and miss younger, cell-phone-only respondents.
Example population and sampling risk
- Population: study habits of gabbling college students.
- If sample collection is biased (e.g., surveying only at the student union at noon, or only online students), the sample may not represent the entire student body.
- Consequence: biased results and limited generalizability.
Haphazard vs. systematic sampling
- Haphazard (convenience) sample: survey people who happen to be nearby with no strategy.
- Limitation: often not representative of the population; useful for learning survey administration, not for making broader inferences.
- Systematic and stratified sampling: approaches students should know for more robust inference (referenced later).

Classic sampling failure: Literary Digest, 1936

Survey scale: about 2,000,000 surveys conducted to predict the presidential election.
Prediction: Alfred Landon would win; map showed Landon with Maine and Vermont only; FDR would win the rest.
Why the failure?
- Sampling frame bias: respondents were drawn from magazine subscribers, telephone users, and vehicle registrations.
- Context: during the Great Depression, car ownership and telephone access were not uniformly distributed; rural and poorer populations were underrepresented.
- Nonresponse bias: a large portion of those contacted did not respond, and respondents differed in meaningful ways from nonrespondents.
Takeaway:
- Even huge samples can be biased if the sampling frame and response patterns miss important subgroups.
- Nonresponse bias remains a major challenge in modern surveys as well.

Observational methods and sampling ethics in fieldwork

Observational research approaches
- Unobtrusive observation: participants do not know they are being observed; minimizes reactivity but raises ethical/privacy concerns.
- Obtrusive observation: researchers are involved and interacting; higher potential for changing behavior or influencing the scene.
Participant observation and ethnography
- Participating observer: researchers engage with the group to gain deeper insight into culture and practices.
- Complete participant: researcher is fully embedded and identified as part of the group from the start.
- Reactivity: people act differently when they know they are being watched; ethnography often requires long immersion to restore natural behavior.
Covert vs. overt research
- Covert: participants are unaware they are being studied; raises substantial ethical concerns around privacy and informed consent.
- Overt: participants are informed and consent to be studied; more ethically straightforward.
Privacy and public vs private spaces
- Public settings (parks, malls) generally have fewer privacy expectations; observation is more permissible but still ethically guided.
- Private settings (homes, private offices) require explicit consent to observe or interview.
Informed consent, consent withdrawal, and coercion
- Participants should consent to participate and know what the study entails.
- Consent can be withdrawn at any time; participants are not obliged to continue.
- For surveys, forced answering (no skip options) is unethical; participants should be able to skip questions.
Special consent considerations
- Protected populations (children, incarcerated individuals, pregnant women, etc.) require extra safeguards due to history of mistreatment.
- Incarcerated populations often have constrained autonomy; research must be tightly justified and tightly regulated.

Surveys and measurement issues

Surveys basics
- A survey is a structured set of questions designed to elicit information.
- Operationalization: how a concept is defined and measured in the survey (wording, response options).
Social desirability bias
- Respondents may tailor answers to be viewed favorably, especially in face-to-face interviews about sensitive topics (e.g., sexual behavior).
- Anonymity or computer-based surveys can reduce social desirability bias.
Mode effects and question design
- Mode (in-person, phone, online, paper) influences how people respond.
- Sensitive questions may require anonymous or non-face-to-face formats to improve honesty.
Data quality and validation checks
- Include attention checks and implausible-response questions to detect non-serious participation.
- If respondents report impossible values (e.g., impossible drug use), discard or flag their data; otherwise, reliability is compromised.
Incentives and data quality
- Paid surveys can incentivize hurried or dishonest responses; data quality may suffer.
Deception and reporting in survey design
- Researchers may, in some cases, alter the framing or labels of surveys to improve participation (e.g., renaming a study to increase willingness to participate).
- Ethical considerations require transparency and minimizing harm.

Biases, misinformation, and the ethics of information presentation

The ethics of information presentation
- Statistics and graphs can shape perception without lying about numbers
- Best and Shirley argue for critical consumption of data: beware of how data are framed to advance a narrative.
Examples of misleading representation (conceptual, not exhaustive)
- Unemployment graphs: same data, different scales produce different visual stories
- Graphs with small y-axes show more dramatic changes; large scales can obscure changes.
- Planned Parenthood graph (Congressional hearing): juxtaposing abortions with cancer screenings without proper scaling or context can mislead about trends.
- Infographics and political messaging: increasing graduation rates vs. a very gradual real increase can be exaggerated visually; a careful, honest depiction would show gradual growth.
- Tax-cut visuals (Fox News example): framing of policy consequences can provoke a specific political stance without presenting a complete picture.
Correlation vs. causation
- Correlation does not imply causation; two things can move together due to a third variable or coincidence.
- Example style of misleading correlations often cited for humor:
- worldwide noncommercial space launches and sociology degrees awarded
- number of people electrocuted by power lines and marriage rate in Alabama
- Important reminder: apparent correlations should prompt questions about potential confounders and underlying mechanisms.
“Mutant statistics” and the risk of misinterpretation
- Misleading statistics can be created intentionally or unintentionally by misreading or misreporting data.
- Classic example: a statistic like "the number of gun deaths doubled every year since 1950" would imply an impossible total; what was meant is that the figure had doubled since 1950 (a different interpretation).
- Always verify the exact phrasing and time frame of a statistic before accepting it.

Ethical and practical dilemmas in reporting and data collection

Deceptive variable naming and sample framing
- The National Survey of Fertility Barriers was renamed to a more generic term (e.g., a family survey) to encourage participation, masking the study’s focus on fertility barriers.
- Deceptive labeling can affect who participates and how they respond.
Dilemmas with reporting sensitive information
- When researchers encounter sensitive contexts (e.g., illegal activities, exploitation), they must balance participant welfare with the scientific value of disclosure.
- Examples include interviews with formerly incarcerated individuals, surrogacy arrangements, or illicit drug use.
Mandated reporting and protection considerations
- Some researchers become mandated reporters; these legal obligations influence how information is handled.
- In sensitive cases (e.g., potential child welfare concerns), researchers must consider whether to report, given confidentiality and safety implications.
Balancing emancipatory goals with potential harm
- In some cases, sharing a story or technique can help social change but may also expose participants to risk or stigma.
- Researchers may choose to publish with pseudonyms or anonymized accounts, or to tell the story in a way that preserves participants’ safety while highlighting systemic issues.
Real-world ethical decision-making example
- A researcher studying pathways to parenting for same-sex couples faced a choice: disclose information about adoption loopholes and cross-state arrangements that could promote social change but also risk participants’ safety or privacy.
- After consultation with participants, the researcher chose to tell the full story to support social change, while recognizing ongoing legal debates and potential risks.
Reporting changes in policy and law
- Laws and policies shift over time; researchers must stay aware of current contexts (e.g., surrogacy law, joint adoption rights) to interpret data accurately and responsibly.

Special topics: measurement, interpretation, and responsibility

Emphasis on critical thinking when consuming statistics
- Always question: What exactly is being measured? What population? What time frame?
- Are there potential biases in sampling, response, or framing?
- Are there confounding variables that could explain the observed relationships?
Practical takeaways for exam and research practice
- Distinguish clearly between survey research, observational studies, and experiments
- Prefer random sampling where possible to improve generalizability
- Be transparent about limitations: sampling bias, nonresponse, measurement error, and ethical constraints
- Use appropriate statistical reasoning: avoid assuming causation from correlation; recognize the role of confounders
- When presenting data, ensure scales and labels accurately reflect the magnitude of changes (avoid deliberate or accidental misrepresentation)
Final ethical reminder
- Always honor participants’ autonomy: informed consent, voluntary participation, and the right to withdraw
- Protect vulnerable populations; ensure privacy and minimize potential harm
- Be honest about methods, limitations, and potential conflicts of interest

$ext{Example formula} \ n = 2{,}000{,}000 \ ext{(Literary Digest sample size)}$

$ext{Example of a misleading time-frame statement:} \ ext{The number of gun deaths doubled since } 1950 \ ext{vs. } ext{The number of gun deaths has doubled every year since } 1950.$