Personality & Individual Differences Lec2

Psychological Measurement

How to measure personality?
How to ensure:
- Meaningful characteristics are measured?
- The measured characteristic is the one intended to be measured?
Aim: To make meaningful comparisons among people and calculate statistics (e.g., investigate relationships between variables).
- No meaningful “zero” level.
- No absolute amounts of a variable.
- No use of ratios.
Well-designed personality measurements: Equal differences between scores ≈ equal differences in trait levels (with “≈” approximating).

1.1 Some Simple Statistical Ideas

1.1.1 Levels of Measurement (aka Scales of Measurement)

Nominal: Data can only be categorized.
Ordinal: Data can be ranked.
Interval: Data can be ranked and evenly spaced.
Ratio: Data can be ranked, evenly spaced, and have a natural zero.
Note: Data from personality scales (e.g., someone’s neuroticism score) are somewhere between ordinal and interval level but are usually treated as if they were interval level in statistical analyses.

1.1.2 Standard Scores

To make meaningful comparisons between scores, “raw scores” are converted to standard scores (e.g., by subtracting the mean [M] from the score and then dividing the result by the standard deviation [SD]).
Example: Standardized IQ test scores with a M of 100 and SD of 15.

1.1.3 Correlation Coefficients, r

Tell us how strongly two variables are related—and in which direction (positive or negative)!
Benchmarks for interpreting correlations in personality & individual differences research:
- Large (or high or strong): .40 or larger (in absolute terms).
- Moderate: Between .20 and .40 (positive or negative).
- Small (or weak): Between -.20 and .20.
- No correlation: r ≈ .00.

1.1.4 Sample Representativeness & Sample Size

Sample Representativeness

Samples should be reasonably representative of the population that the researcher wants to learn about.
Potential problems if samples are:
- Psychology undergraduate students.
- “WEIRD” = from Western, Educated, Industrialized, Rich, Democratic societies and show restricted variance (correlations are standardized covariances, so restricted variance → restricted correlations!).

Sample Size

Correlations from samples of ≥ 250 people are usually close to the population correlation (Schönbrodt & Perugini, 2013).
The larger the sample, the greater the statistical power to obtain statistically significant results.

1.2 Assessing the Quality of Measurement: Reliability and Validity

1.2.1 Reliability

The extent to which a measure produces consistent results.
Does the obtained score represent the “true level” of the construct being measured?

1.2.1.1 Internal-Consistency Reliability

The extent to which the items of a measure are correlated with one other.
Cronbach’s alpha (α); ≥ .70 are usually considered acceptable.

1.2.1.2. Interrater (Interobserver) Reliability

The extent of consistency between the scores of different raters/observers.

1.2.1.3 Test–Retest Reliability

The extent of consistency between scores across different measurement occasions (e.g., now and 1 year later).

1.2.2 Validity

The extent to which a test measures what it claims to measure.

1.2.2.1 Content Validity

The extent to which a measure assesses all relevant features of the construct and does not assess irrelevant features.

1.2.2.2 Construct Validity: Convergent & Discriminant

The measure assesses the same construct that it is intended to assess.
- Convergent validity: Correspondence with measures assessing similar (positive relations) or opposite (negative relations) characteristics.
- Discriminant validity: Correspondence with measures assessing characteristics unrelated to the one the measure is intended to assess.

1.2.2.3 Criterion Validity

Relations with relevant outcome variables; also called predictive validity.

1.3 Methods of Measurement

1.3.1 Self-Reports

Structured questionnaires.
- Every person/participant is asked the same set of questions or items.
- There is a fixed set of response alternatives for every item (note the difference between scale and response scale).
Most widely used method of measuring personality.
Most personality inventories (or questionnaires or scales) assess several personality traits.
Each trait is assessed with several items, allowing for good reliability and content validity.
Many researchers recommend including items that suggest the opposite of the trait (so-called reverse-scored items [R]) → balancing out the tendency to agree or disagree with statements (acquiescence).
Pros:
- Efficient, low cost.
- Mostly accurate if people know their behaviors, thoughts, and feelings.
Cons:
- Can be easily “faked” or distorted (e.g., when applying for a job) → socially desirable responding (very difficult to control!).
Extremely valuable as people usually know themselves very well… and sometimes are the only ones who know (Baldwin, 2000).

1.3.2 Observer Reports

Analogous to self-reports, but someone else provides the information about the “target” person.
The observer can be a spouse, a parent, a friend, a colleague, a classmate, etc., but should know the “target” fairly well.
Pros: Might be more objective (i.e., less biased) → “Others (sometimes) know us better than we know ourselves” (Vazire & Carlson, 2011).
Cons: Some aspects of personality might never really be observed; observations are done in a limited range of contexts.

1.3.3 Direct Observations

Directly observing a person’s behavior.
Frequency and intensity of behavior that indicate a certain trait.
In the person’s natural habitat or in an artificial setting (e.g., lab).
Can be (very) informative.
Cons: Time-consuming, expensive, require a lot of effort… and need to be aggregated (over multiple indicators, times, situations) if they are meant to capture personality traits!!!

1.3.4 Biodata (Life Outcome Data)

Life outcome data: records of a person’s life relevant to an individual’s personality.
- e.g., phone bills, speeding tickets, grade point average, sales records, diplomas, income… and death.
Objective behavioral indicators.
Cons: Not clear what information is relevant or accurate as an indicator for the personality trait of interest.

2.1 The Idea of a Personality Trait

Conceptual Definition (Ashton, 2018, p. 29): “A personality trait refers to differences among individuals in a typical tendency to behave, think, or feel in some conceptually related ways, across a variety of relevant situations, and across some fairly long period of time.”
Differences Among Individuals: A personality description is a comparison with other people.
Typical Tendency to Behave, Think, or Feel: Likelihood of showing some behaviors or having some thoughts or feelings.
In Some Conceptually Related Ways: Traits are expressed by various behaviors, thoughts, and feelings that appear to have some common psychological element.
Across a Variety of Relevant Situations: Not in just one specific situation, but consistency across a variety of situations and settings that are relevant.
Over Some Fairly Long Period of Time: Relatively stable pattern that can be observed over the long run.

2.3 Do Personality Traits Exist?

Hartshorne & May (1928):
- Investigated 11,000 children for the consistency in their “moral character” (altruism, self-control, honesty).
- Observed their behavior in a variety of situations, e.g., donation to charity, cheating on a test.
- Result: Children displayed little consistency between any two behaviors (rs ≈ .20).
Mischel (1968):
- Individual differences in behavior depend on the specific situation.
- Also Mischel and Peak (1982): Conscientiousness depends very strongly on the situation.
Claim: “Personality traits are of limited value for predicting behavior.”
Failure to notice the cross-situational consistency when aggregating observations across many situations
- Correlations between two sets of several behaviors are much higher (rs > .50) (Jackson & Paunonen, 1985)
- Personality is reflected in overall, typical behavior as observed across many different situations

2.4 Structured Personality Inventories

Some Widely Used Personality Inventories:
- The California Psychological Inventory (CPI):
- Over 400 items; various psychological characteristics, “everyday variables.”
- Based on The Minnesota Multiphasic Personality Inventory (MMPI) intended to measure mental illnesses.
- The Eysenck Personality Questionnaire (EPQ):
- Three basic dimensions of personality.
- Biological basis of personality.
- The Temperament and Character Inventory (TCI):
- Developed by Cloninger and colleagues.
- Basic biological dimensions of temperament and additional character dimensions.
- The Myers-Briggs Type Indicator:
- Very popular in business and assessment center settings.
- Cons:
  - Very crude measure: assigns people to 1 of 16 personality types instead of providing personality scores.
  - Not a scientifically sound instrument in theory and methods.
  - Very limited reliability and validity – if any.
Big Five Framework: 5 major dimensions:
- Neuroticism
- Extraversion
- Openness to Experience
- Agreeableness
- Conscientiousness
- The Big Five Inventory (BFI): 44 items
- The NEO Five-Factor Inventory (NEO-FFI) and the NEO Personality Inventory Revised (NEO-PI-R): 60 and 240 items!
The HEXACO Personality Inventory Revised (HEXACO-PI-R):
- Three versions: 200, 100, or 60 items
- 6 dimensions:
- Honesty-Humility
- Emotionality
- eXtraversion
- Agreeableness (vs Anger)
- Conscientiousness
- Openness to Experience

2.5 Strategies of Personality Inventory Construction

2.5.1 The Empirical Strategy

Collect a large pool of items that show empirical relationships with the trait the researcher is interested in (e.g., femininity−masculinity, “I like to eat red meat” [R]).

2.5.2 The Factor Analytic Strategy

Collect a large pool of items, subject them to factor analyses, and find “groups” of items that measure different traits (cf. the lexical approach that gave us the “Big 5”).

2.5.3 The Rational Strategy

Write items specifically for the purpose of assessing each trait—based on how the researcher, theory, and research conceptualize the trait (e.g., the Multidimensional Perfectionism Scale: self-oriented, socially prescribed & other-oriented perfectionism; Hewitt & Flett, 1991).

2.6 Self- & Observer Reports on Personality Inventory Scales

Combined Use of Self- & Observer Reports
- Obtain self-reports from a sample of “target” persons as well as observer reports about the same “target” persons from others
- High agreement between self- & observer reports provides support for the construct validity of scale
- NEO-PI-R: correlations of about .60 (with spouses as observers) and .40 (with friends or neighbors as observers)
- HEXACO-PI-R: correlations from .40 to .60 in a sample of over 600 college students (Lee & Ashton, 2013)
- Convergent validity of the scales
- TEST: compare observer reports from multiple, unacquainted observers from different contexts (Funder et al., 1995)
Kolar et al. (1996)
- “People know themselves better than anyone else knows them” versus “Others know us better than we know ourselves”
- Both self- and observer reports showed validity for predicting behavior
- Single observer reports were slightly better
- Accuracy increased when averaging across observers
Vazire (2010); Vazire and Carlson (2011)
- Gaps in our self-knowledge
- Blind spots due to lack or overload of information
- Biases in self-perception
- “Others sometimes know us better than we know ourselves”
- Accuracy depends on which types of traits are considered
- Self- and observer-reports capture different aspects of personality.
- Self-Other Knowledge Asymmetry model (SOKA model)

SOKA Model (Vazire, 2010)

Observability
- “Internal” traits: low observability primarily thoughts and feelings e.g., anxious, self-esteem
- “External” traits: high observability primarily overt behavior e.g., charming, talkative
Evaluativeness
- Highly evaluative traits: more biases in self-reports e.g., intelligent, rude

Self- & observer reports show fairly high levels of agreement
People provide fairly accurate descriptions of their own and others’ personalities
Self- & observer reports can predict behavior with moderate levels of validity
LIMITATION of self- & observer reports: BIASES
- Socially desirable responses and socially undesirable responses in both self- & observer reports
- BUT: the more sources of information, the less bias

Psychological Measurement: Aims to make meaningful comparisons among people and calculate statistics.
- Measures personality by ensuring meaningful characteristics are measured and that the measured characteristic is the one intended to be measured.
Levels of Measurement:
- Nominal: Data categorized.
- Ordinal: Data ranked.
- Interval: Data ranked and evenly spaced.
- Ratio: Data ranked, evenly spaced, with a natural zero.
- Personality scales are usually treated as interval level in statistical analyses.
Standard Scores: Convert raw scores to standard scores for meaningful comparisons.
Correlation Coefficients (r): Measure the strength and direction of the relationship between two variables.
- Large: .40 or larger.
- Moderate: Between .20 and .40.
- Small: Between -.20 and .20.
- No correlation: r ≈ .00.
Sample Representativeness & Size:
- Samples should represent the population.
- Larger samples provide greater statistical power.
Reliability: Consistency of a measure.
- Internal-Consistency: Items of a measure correlate with each other (Cronbach’s alpha ≥ .70).
- Interrater: Consistency between different raters/observers.
- Test-Retest: Consistency between scores across different measurement occasions.
Validity: The extent a test measures what it claims to measure.
- Content: Measures all relevant features of a construct.
- Construct: Measures the intended construct.
- Convergent: Corresponds with similar measures.
- Discriminant: Doesn't correspond with unrelated measures.
- Criterion: Relations with relevant outcome variables.
Methods of Measurement:
- Self-Reports: Questionnaires where individuals answer questions about themselves.
- Pros: Efficient, low cost.
- Cons: Can be faked or distorted.
- Observer Reports: Others provide information about the target person.
- Pros: More objective.
- Cons: Limited observation range.
- Direct Observations: Observing a person’s behavior directly.
- Cons: Time-consuming, expensive.
- Biodata: Using life outcome data as indicators (e.g., records, tickets).
- Cons: Relevance may not be clear.
Personality Trait: Differences among individuals in typical behavior, thoughts, or feelings across situations and time.
Do Personality Traits Exist?:
- Hartshorne & May (1928): Investigated consistency in children's moral character. Found little consistency between behaviors (rs ≈ .20).
- Mischel (1968): Behavior depends on the specific situation.
- Jackson & Paunonen (1985): Correlations between sets of behaviors are higher when aggregating observations (rs > .50).
Structured Personality Inventories:
- CPI: Measures various psychological characteristics.
- EPQ: Three basic dimensions of personality.
- TCI: Biological and character dimensions.
- Myers-Briggs: Assigns people to 1 of 16 types (limited reliability and validity).
- Big Five Framework: Neuroticism, Extraversion, Openness, Agreeableness, Conscientiousness.
- HEXACO: Honesty-Humility, Emotionality, Extraversion, Agreeableness, Conscientiousness, Openness.
Strategies of Personality Inventory Construction:
- Empirical: Collect items with empirical relationships to the trait.
- Factor Analytic: Use factor analyses to find groups of items measuring different traits.
- Rational: Write items based on theory and research.
Self- & Observer Reports:
- High agreement supports construct validity.
- Kolar et al. (1996): Both self- and observer reports predict behavior.
- Vazire (2010): Self-knowledge gaps; accuracy depends on traits.
SOKA Model (Vazire, 2010):
- Observability: Internal vs. external traits.
- Evaluativeness: Biases in self-reports.
Limitations: Biases in self- and observer reports, but more sources of information reduce bias.

Exam questions; these may include scenarios to evaluate understanding of internal versus external traits, examining how bias in self-reports affects the accuracy of personality assessments, and discussing the importance of utilizing multiple sources of information to mitigate these biases.

What makes for a good scale?
A good scale is both reliable and valid. It consistently produces similar results (reliability) and accurately measures what it's intended to measure (validity).

Reliability
-Internal-Consistency Reliability: The extent to which the items of a measure are correlated with one other.
-Interrater (Interobserver) Reliability: The extent of consistency between the scores of different raters/observers.
-Test–Retest Reliability: The extent of consistency between scores across different measurement occasions.

Validity
-Content Validity: The extent to which a measure assesses all relevant features of the construct and does not assess irrelevant features.
-Construct Validity: The measure assesses the same construct that it is intended to assess.
-Criterion Validity: Relations with relevant outcome variables; also called predictive validity.

Reverse-Coding
-Reverse-coding involves including items that are worded in the opposite direction of the construct being measured. This is done to balance out the tendency to agree or disagree with statements (acquiescence).

What is a self-report scale?
A self-report scale is a method of measurement where individuals answer questions about themselves to assess their personality, behaviors, thoughts, or feelings.

What is meant by “observer report”?
An observer report is a method of measurement where someone else provides information about the target person. This observer should know the target person fairly well.

Under what conditions would each measurement approach be more valid?
-Self-Reports: More valid when assessing internal states, feelings, and attitudes that are not easily observable by others.
-Observer Reports: More valid when assessing external behaviors and traits that are easily observable. They can also provide a more objective perspective by reducing biases in self-perception.
-Direct Observations: Most valid when measuring specific behaviors in natural or controlled settings, especially when the behavior can be quantified.
-**Biod