Lecture 5 Notes: Primary vs Secondary Research, Sampling, and Secondary Data Sources

Lecture 5 Notes: Primary vs Secondary Research, Sampling, and Secondary Data Sources

  • Opening inspiration: "Supposing is good, but finding out is better." – emphasizes humility in research: many people assume they know the information, but the goal is to actually find out through data collection and analysis.
  • Core focus of today: secondary vs primary research, why you should do secondary research first, and how to evaluate data quality.
  • Real-world framing: example job description in marketing research analysis reveals implied expectations (tight timelines, multiple quick-turn deliverables) and the importance of understanding sampling when interpreting survey results.

Primary vs Secondary Research

  • Primary research: data you collect yourself for a specific question or project.
  • Secondary research: data collected by someone else that is repurposed for your use. Can be internal or external.
  • Purpose of secondary data: to build context, identify gaps, test feasibility, and guide the design of primary data collection.
  • Key idea: always consider method and limitations of data sources before acting on them.

Sampling concepts: size, randomness, and representativeness

  • National survey example: Simmons uses a 25,000-person sample; for many questions, a 2,000-respondent sample could suffice, depending on the questions and desired precision.
  • Why larger samples? Bigger samples reduce sampling error for many small-effect questions and help with subgroup analyses, but the benefit depends on how random the sample is.
  • Randomness and representativeness:
    • A completely random sample should mirror the population. If the population is 50% female, the sample should be ~50% female; if 20% are Greek students, the sample should reflect ~20% Greek students.
    • Sampling from a limited universe (e.g., only students in a class, or people in a residence hall) can be random within that small universe but not representative of the broader population (e.g., all U.S. adults).
    • Pseudo-random samples from a small or biased frame (e.g., sampling only people at a quad or a lobby) can yield misleading conclusions about the broader group.
  • Randomness vs size quadrant analogy:
    • Imagine a 2x2 grid with axes: randomness (complete randomness vs no randomness) and size (large vs small).
    • Top-right quadrant: large and completely random—best chance for statistically significant findings (though not a guarantee).
    • Top-left: large but not random—may be large but biased.
    • Bottom-right: small but random—limited power and precision.
    • Bottom-left: small and not random—lowest reliability.
  • Statistical significance:
    • Affected by both sample size and randomness. Larger, random samples have the best chance to yield statistically significant and generalizable results.
    • Significance does not guarantee truth about the entire population, but it improves confidence in inferences.
  • Practical illustration: hypothetical national drinking habits survey
    • If you surveyed the entire adult U.S. population (~250,000,000 people) and tried to quantify drinking habits, costs would be enormous (
    • Example figure: Qualtrics sampling a national population could cost around $5 per completion; with 250,000,000 responses, cost would be astronomical and often impractical, illustrating why researchers use smaller, carefully designed samples.
  • Key questions to assess data quality:
    • What is the sample size?
    • What geography is covered (national, state, city)?
    • How was the survey conducted (online, phone, in person)?
    • Was the sampling method truly random? If not, what biases might exist?
    • Who funded or conducted the study (vested interests) and could that influence results?
    • What exactly was asked (survey questions can shape answers)?
  • Practical takeaway: small, non-random samples can look like random samples but are not generalizable. Conversely, large random samples are most informative for generalizing to a population, but only if the methodology is sound.
  • Example case: misalignment between what a chart shows and what the data actually represents (apple pie top in one chart, but Apple Crumb pie appears in another; one dataset excludes Alaska due to small sample). This highlights the importance of understanding what the data truly represents and where limitations lie.

Understanding Secondary Data and its Methodology

  • Definition of secondary data: data collected by someone else for a purpose other than your current project, but reused for your analysis.

  • Sources of secondary data can be internal (within your organization) or external (third-party providers, public sources).

  • Why methodology matters:

    • Reputable sources disclose how data was collected, adjusted, and weighted to be representative and random where possible.
    • Look for information on sample size, sampling frame, response rates, weighting methods, margins of error, and confidence intervals.
  • The role of secondary data beyond numbers:

    • Not all data are numerical; qualitative observations, trends, and reports can also be considered secondary data.
    • Example: TikTok trend observations during 2020 (hashtags like #renegade, #onlineclasses, #learningthedog, #celebratedoctors) represent secondary data in a broad sense (observations published or compiled by others).
    • When using non-numeric secondary data, scrutinize sources for reliability, context, and potential biases.

Library and Institutional Secondary Data Resources

  • University library secondary data resources are largely online and centralized for convenience.
  • APR Research Portal (aprresearchportal.com) provides consolidated access to about 37 links/resources related to advertising and PR, data sources, and market intelligence.
  • Notable groups and links within the APR portal:
    • Advertising and PR information resources:
    • Winmo: agency information, client lists, offices, HR contacts, departments; useful for job hunting.
    • Adweek: public relations strategy and tactics.
    • Ad Spender: ad spend data for brands and categories.
    • Note: This section is most relevant to advertising/PR professionals; other fields may skip details.
    • News information resources:
    • Internet TV News, ProQuest Newspaper, Nexis Uni, America’s News, Ethnic News Watch: search newspapers and news sources.
    • Market data and analytics: Mintel, MR_Simmons, Simply Analytics, eMarketer, Tapestry, US Census, Statista.
    • Public opinion and polling: IColl, Roper polls (used for attitudes on religion, politics, and other topics).
    • Nonprofits and philanthropy: Foundation Center stats/guides for nonprofit data and organization search.
  • Practical tips:
    • You can access many of these resources electronically; you don’t always need to visit the physical library.
    • If a source seems limited or biased, cross-check with other sources to validate findings.
    • The Internet remains a valuable tool for initial exploration, followed by deeper dives through the library portal as needed.
  • Real-world use: combine multiple secondary sources to triangulate a topic (e.g., pie preferences, consumer trends, or attitudes toward brands) before designing a primary study.

Pie Case Study: What Secondary Data Can Reveal—and What It Cannot

  • Demonstration of pie popularity across sources:
    • A dataset (e.g., Ethnic News Watch) shows Apple as a frequent top choice in reports about pies.
    • A separate source (EatThisNotThat) ranks pies by weight-loss potential, not by popularity; it may list Peanut Butter Pie as the worst for weight loss, which can be misconstrued as being the most popular.
  • Key insights:
    • Different datasets answer different questions (popularity vs weight-loss impact).
    • Inconsistent reporting or labeling (e.g., Apple Pie not appearing in one chart but appearing in another) signals data quality issues or misalignment of taxonomy across sources.
    • Local context or sample limitations (e.g., Alaska sample too small) can undermine confidence in results for certain geographies.
  • Lessons for researchers:
    • Always verify what question the data is answering and whether the sampling frame and geography match your research needs.
    • Be cautious of sensational headlines or chart designs that oversimplify, mislead, or omit crucial caveats.
    • Acknowledge limitations of secondary data and avoid overgeneralizing beyond the data’s scope.

Practical Takeaways and Key Messages

  • Four main takeaways from today:
    • Understand the data type: primary vs secondary, and the role of methodology in evaluating data quality.
    • Assess the sampling design: sample size, randomness, geography, and framing; these determine the data’s representativeness and reliability.
    • Evaluate sources critically: question the data’s origin, purpose, and potential biases; examine margins of error and confidence intervals when available.
    • Recognize the limitations of secondary data: it may not answer why something happens, and sometimes primary data is needed to uncover underlying causes.
  • Reading and assignments: engage with readings and Blackboard materials; focus on survey concepts and evaluating data sources.
  • Practical exercise: a quick field test – ask three people about a question (e.g., which destination wins) to illustrate whether a tiny sample can reflect broader public opinion.

Quick Practice: Three-Person Survey Exercise

  • Task: Ask three people you encounter (e.g., at the beach or in the mountains) which destination they prefer and why.
  • Critical question: Do you think results from just three people reflect the view of the overall public?
    • Use this to discuss limitations of small samples and the importance of broader sampling for generalizable conclusions.

Formative Connections: Why This Matters in Research Practice

  • Ethical and philosophical considerations:
    • Respect for data integrity: avoid misrepresenting data or cherry-picking results to fit a narrative.
    • Skepticism in the face of vested interests: sources with a stake in outcomes require scrutiny of methodology and potential bias.
    • Humility in research: always be prepared to discover information that contradicts your initial assumptions.
  • Foundational principles linked to today’s topics:
    • Representativeness and generalizability: the alignment between sample and population.
    • Sampling theory: the relationship between sample size, randomness, bias, and the reliability of inferences.
    • The role of secondary data: a starting point for understanding a topic, guiding design, and identifying gaps for targeted primary data collection.

Recap: Key Concepts and Formulas to Remember

  • Primary vs Secondary data definitions and purposes.
  • Representativeness and randomness: what makes a sample reflect the population?
  • Statistical significance vs practical significance.
  • Margin of error for a proportion (large population):
    • MOE=zα/2p(1p)nMOE = z_{\,\alpha/2} \sqrt{\frac{p(1-p)}{n}}
    • Where: $n$ = sample size, $p$ = estimated proportion, $z_{\alpha/2}$ = z-score for the desired confidence level.
    • If $p$ unknown, use the conservative estimate $p = 0.5$ to maximize $MOE$.
  • Real-world constraints: cost, feasibility, and trade-offs in designing surveys (e.g., $5 per completion, large-scale nationwide sampling costs).
  • Critical evaluation questions for data sources:
    • What is the sampling frame and geography?
    • How was randomness achieved and was there any nonresponse bias?
    • What is the sample size and margin of error? Is it reported?
    • Who funded the study, and could there be vested interests?
    • What is the exact question wording and potential measurement bias?
  • Practical nuance: secondary data can be misinterpreted if the question it answers differs from your research question; always align data source with your research objective.

Next Steps and Preparedness

  • Review the APR Research Portal and explore at least three sources relevant to a potential project.
  • Practice evaluating a secondary data source by identifying its population, sampling method, and stated limitations.
  • Prepare a short reflection on how you would design a primary study to answer a why-question that secondary data cannot, using insights from today’s lecture.