psych 209: exam 2

  • measurement reliability = consistency

    • test-retest: if tested again, the results/patterns will be similar/dissimilar

    • internal: will generate similar responses across all of the items, even with different wording

      • average inter-item correlation (AIC): mean of all possible correlations

        • i.e., AIC between 0.15-0.50 have a reasonable correlation

      • cronbach’s alpha: combines AIC and the # of items in the scale

        • i.e., the closer to 1.0, the more reliable the scale

    • interrater: different interpretations of the same sentence/prompt

      • whether results are uniform when multiple administrators use the measure

      • r: 0.70 or greater for strong, positive correlation

      • kappa: when rating categorical variables; closer to 1.0 is a stronger correlation

  • measurement validity = accuracy

    • construct: how well variables have been operationalized

      1. face: subjective; does this seem plausible?

        1. e.g., layperson may be confused or question construct validity compared to an expert

      2. content: subjective; does it cover all the (relevant) content?

        1. e.g., a quiz or survey covering only some of the relevant material vs. all or comprehensively

      3. criterion: empirical; does it correlate with behavior (the way it should theoretically)?

        1. e.g., measuring class with relevant behaviors (arriving on time, using a planner, etc.), vs. irrelevant behaviors

        2. known-groups paradigm: testing two groups who are known to differ on the measured variable to ensure the score differently on that variable

      4. convergent: empirical; does it correlate with similar metric.. do they converges?

        1. e.g., measuring movie quality with imdb, rotten tomatoes, etc., scores (commonly used measurements)

      5. discriminant/divergent: empirical; does it correlate with dissimilar metrics?

        1. e.g., ensuring that measurement used for “itch” does not converge with measurement used for “pain”

          1. prevents confusion and irrelevant overlap

Study Validity (Concerns about the Research Design)

  1. Internal Validity – The degree to which a study establishes a cause-and-effect relationship between variables, without confounding factors.

  2. External Validity – The extent to which the study’s findings generalize to other settings, populations, or times.

    • Ecological Validity – How well the findings apply to real-world settings.

  3. Statistical Conclusion Validity – The extent to which conclusions drawn from statistical analysis are accurate and reliable.

  • sampling validity

    • statistical validity: findings are precise, reasonable, and replicable

      • requires big enough sample

      • Cares which wolves are sampled and how (the number doesn’t matter)

      • larger sample size → smaller CI

        • larger sample size → more precise estimate of variable of interest

    • external validity: findings can be generalized to other contexts

      • requires right kind of sample

      • Doesn’t care which wolves are sampled, just how many are sampled

      • generalizability: extent we can apply claims of sample to entire population of interest

    • probability sampling: lets randomness determine who/what is sampled

      • simple random: list of every single member of population & choose sample randomly

      • systematic: choose sample based on a system (e.g., every 5th person)

      • stratified random: dividing population into meaningful subgroups (strata) and sample randomly from subgroups

      • cluster: random select naturally occurring clusters → randomly selecting people within those clusters

    • nonprobability sampling: researcher/participants choose who/what is sampled

  • response sets

    • non-differentiation: picking the same thing every time

      • solution: mix up the wording

    • acquiescence (yea-saying): agree with everything

      • solution: including attention check questions

    • fence-sitting: picking the neutral option every time

      • solution: eliminate neutral option on likert scales

  • observational validity solutions

    • observer bias → clear codebooks: precise operationalization of variables

    • observer bias/effects → masked (blind) design: observers are unaware of study’s purpose and conditions

    • reactivity → unobtrusive observations: making observers less noticeable

  • probability sampling: lets randomness determine who/what is sampled

    • simple random: list of every single member of population & choose sample randomly

    • systematic: choose sample based on a system (e.g., every 5th person)

    • stratified random: dividing population into meaningful subgroups (strata) and sample randomly from subgroups

      • oversampling: intentionally over representing certain subgroups

    • cluster: random select naturally occurring clusters → randomly selecting people within those clusters

      • multistage: includes 2 samples/stages

        1. random sample of clusters

        2. random sample of people within clusters

  • nonprobability sampling: researcher/participants choose who/what is sampled

    • convenience: sampling only easily accessible/available participants; most common sampling technique

    • purposive: only certain kinds of people included in sample

    • snowball: participants asked to recommend acquaintances for study

    • quota: identifies subsets of population and sets target number for each category in sample; samples nonrandomly until quotas are filled

robot