1/53
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is PCA?
Principal component analysis (PCA) is a dimensionality reduction method
transforms a set of possibily correalted variables into smaller set of uncorrelated variables called principal components
Ordered so that the first captures the most variance, the second the next most etc
variable 1 (v1)
key:
values observed for each partcipant place them along a dimension described by V1
V1 sorces observed (varaince reveals ONE DIMENSION like project individual data
‘math proficiceny’
x-axis
V2 (two dimensional space)
PROJECT EACH OBSERVATION onto cartesian space
ook each variable in isolation, interpret the variability in the values observed as mapping to a dimension along which these individual observations vary
reading proficienty (y-axis)
V3 (third dimension)
project each observation onto three dimensional space
{x = V1, y = V2, z = V3} coordinates
what can you say about this third dimension captured by V3?
Ans: V3 has very low variance
(V1 et V2 are cupatured by two dimenions the most of variability)
How would V3 would be dropped?
used Principal component analysis
Ex: lets say V1 et V2 are heavily correlated
individual both has a lot varaibilthy within but ploting together see that they vary together (share variance)
given their covariance only V1 gives some information about V2 (vice versa)
so the goal of PCA is to find the principal components that capture the MOST variability in the data
PC rotation
the coordinate system so that NEW AXES (principal components) line up with the DIRECTIONS WHERE THE DATA VARY THE MOST
PC rotation (STEP BY STEP INTUITION)
create the cloud so the origin is at the MEAN
FIND THE LONGEST STRETCH of the cloud —> angle is PC1
DRAW PC1 as a new axis through the mean
DROP A PERPENDICULAR —> PC2 captues leftover variance
RE-DESCRIBE POINTS usinig (PC1,PC2) incstead of (V1, V2)
RANK IMPORTANCE: variance along each PC (it’s eigenuvale): plot them —> scree plot
EIGENVALUES
PCA rotatees the axes so that PC1 follows the direction of MAXIMUM VARIANCE
amount of variance captuared along each PC
EIGENVECTOR
linear algebra — is a direction that does not change when we apply a transformation (gets strecthed or shrunk)
Eigenvalue
amount of stretching in that direction
each eigenvector =
principal component (direction of variacne)
each eigenvalue
variance explained by that component
tell us how important each component is
Scree plot
displays eigenvalues across components in descending order
HELPS DECIDE: how many components to keep
dimensionality reduction method
how many non redundant dimesions do i need to describe the variabiilty in my data?
what are the primcipcal conmpments
When do we use PCA?
SUMMARIZE DATA (many variables while losing minimal information)
VISUALIZE high-dimensional data (2D or 3D)
REMOVE REDUNDANCY (multicollinearity) in corrlated predictors
prcomp()
SCREE PLOT: vairance explained by each PC
PC1 most variance
PC2 second most vairance
PC3-PC6 minimal variance (can be dropped)
Elbow rule: look where curve flattens out
HOw does PCA differ from factor analysis?
data reduction technique
creates new axes (PCs) that captural TOTAL VARIANCE
ARE mathematical, not always meaningful
GOAl: SUMAMRIZE DATA WITH FEWER DIMENSIONS
how does Factor analysis differ from PCA?
Latent varaible model
Explain SHARED VARIANCE among observed variables
FACTORS = HIDDEN CONSTRUCTS (math ability)
GOAL: uncover underlying caues of correlations
SUMMARIZES VARIANCE
Visual analogy
tests correlate strongly
FA expalins this corrlation with ONE LATENT FACTOR
each test has its own uniqueness (ERORRS)
WHY IS FACTOR ANALYSIS IMPORTANT?
many psych constructs ARE NOT DIRECTLY MEASUREABLE (intelligence, anxiety,, motviation)
DO MEASURE ARE INDICATORS (test items, survey questions) “i feel nerouvse before exams, i have trouble sleeping
HELP US:
which items "“go together” because reflect same HIDDEN trait
REDUCE dozens of questions into few meaningful scales
RPVOIDE EVIDENCE FOR VALID MEASRUMENT of latent constructs
LATENT CONSTRUCTS
Scale development: the big picture
define the construct and its content domain
generate items
expert review
cognitive interviewing
PILOT STUDY
ITEM ANALYSIS
DIMENSIONALITY
REALIBTILTY
VALIDITY EVIDENCE
SCOREING
FINAZLIE & DOCUMENT
Test development: big picture
defien the construct and domain
wrist test specifications
generate test items
experect review
cognitive interviewing
PIOLT TESTING
ITEM ANALYSIS
DIFFICULTY & DISCRIMINATION
RELAIBILTY anaylsys
VALIDITY EVIDENCE
SCORING & NORMS
FIANZLIE & DOCUMENT
Step 6: Pilot Testing
the test is administered to a small group of participants to evaluate its
feasibility and effectiveness.
▶ Helps identify practical issues: completion time, confusing items,
technical problems.
▶ Provides first feedback on item functioning, instructions, and
timing.
▶ Bridges the transition from item writing to formal test validation.
▶ Item analysis:
▶ Reliability: Internal consistency
▶ Validity evidence: Content, criterion, construct validity.
▶ Scoring, missing data, and reporting: Define rules, inspect
patterns, document limitations.
step 7: Item analysis
Items are written to measure a specific aspect of the construct
we want to measure.
▶ They are written to measure different aspects of the content
domain (scales) or by mapping the test specifications (tests).
▶ They are written following general guidelines and best
practices.
▶ But, these initial attempts are only theoretical.
▶ The responses observed across items in our pilot study provide
us with empirical evidence of how the items are functioning.
▶ We analyse these responses to flag items with ambiguity, low
variance, or poor discrimination for revision/removal.
▶ Generally speaking, item analysis takes place at two levels:
1. Basic item quality checks
2. Item functioning (Dimensionality checks and
difficulty/discrimination analysis - Step 8)
SCALE ITEMS - quality check!
Scale items are built with the intention of measuring a specific
aspect of the construct we want to measure.
▶ Each item is intended to provide relevant non-overlapping or
redundant information.
▶ We want our scale to be efficient.
▶ There’s no point in having items and or response options that
are not providing relevant information.
Initial quality check: We look at the distribution of responses
observed across items.
We look for:
▶ 1) Response options that are not used.
▶ 2) Items with low to no variability in the responses observed.
PROBLEMATIC SINCE PROVIDING LIMITED INFORMATION
Test items - Quality checks!
Each item is intended to measure a sub-skill or knowledge
that is relevant to the construct we want to measure.
▶ We want our test to be able to discriminate between
different levels of knowledge.
Three main quality checks:
1. Identify items that are always answered correctly.
2. Identify items that are answered correctly less than
expected by chance.
3. Additionally, check the response distribution per item: Focus
on the distractors.
▶ Look for distractors that are never used (too obvious).
▶ Look for distractors that are too attractive (used more than
expected by chance)
PROBLEMATIC - DISTRACTOR MAY BE TOO ATTRACTICE (too specific or too ambiguous)
Dimensionality check (Scales)
Do all items reflect one underlying trait, or several related traits?
Dimensionality checks provide evidence for:
▶ Score meaning: A total score only makes sense if items pull
on the same latent construct.
▶ Reliability: Internal consistency assumes (rough)
unidimensionality; alpha/omega change with structure.
▶ Prerequisite for validity: Dimensionality constrains
hypotheses for relations to other variables
(convergent/discriminant).
What is a “Dimension”?
A dimension is a hidden commonality that makes some items vary
together.
Example: “Academic Stress” may split into:
▶ Time pressure (deadlines, juggling courses)
▶ Evaluation anxiety (tests, grading)
▶ Role overload (work–school conflict)
Key idea: Items within each dimension should be correlated with each
other
Exploratory dimensionality check
Start by inspecting the item correlation matrix.
▶ Exploratory Factor Analysis (EFA): Model shared variance
to explore factor structure
Confirmatory dimensionality check (dimensinoality check- how)
Confirmatory Factor Analysis (CFA): Test a hypothesized
structure.
Polychoric Correlations for Ordinal Data
Issue with Pearson on Likert:
Treats response options as equal-interval (but they have an ordinal
scale).
▶ With few/skewed categories, Pearson often attenuates the true
association.
Idea behind Polychoric: Assume each ordinal item comes from a
continuous latent variable cut by thresholds. We want to estimate the
latent correlation ρ between these ordinal items.
Strongly Disagree Disagree Neutral Agree Strongly Agree
1 2 3 4 5
Numbers are labels, they don’t carry numerical meaning other than order.
Polychoric Correlations for Ordinal Data
KEY
observed responses are chosen based on thresholds
cutting a latent continuous space. We estimate ρ that matches the
observed contingency table.
Exploratory Factor Analysis (EFA)
Let the data suggest clusters of items.
▶ Uses only shared variance; allows factors to correlate (oblique
rotation).
▶ Good for a first validated structure; flags problematic items
Confirmatory Factor Analysis (CFA)
Test a specific structure.
▶ You declare which items load on which factors.
▶ Evaluates fit (e.g., CFI/TLI, RMSEA, SRMR) and supports
invariance tests
flow in pracitce:
Draft scale —> EFA(explore) —> revise items —> CFA (confirm) on new sample
EFA —> CFA PIPELINE (EFA PORTION)
use polychoric correlations (Likert). Parallel analysis for factor
retention.
▶ Oblique rotation (Promax/Oblimin) for correlated facets.
▶ Flag: low communalities, cross-loadings > .30, Heywood cases.
EFA —> CFA PIPLINE (CFA PORTion)
Competing models: 1F vs correlated factors vs bifactor.
▶ Report: CFI/TLI (> .95 good), RMSEA (< .06), SRMR (< .08),
but judge substantively.
▶ Invariance: configural → metric → scalar (groups/time)
Item Difficulty
Definition (dichotomous scoring):
p = 1/N NX i=1 Xi where Xi ∈ {0, 1} (1 = correct)
Interpretation: p is the proportion correct (easiness); some texts define
difficulty as 1 − p.
General guidelines:
▶ For general tests, aim broadly p ≈ .30–.80 (avoid floor/ceiling).
▶ With k options, chance level = 1/k. p ≪ chance ⇒ suspect
miskey/ambiguity.
▶ Choose a mix of p values to target your score range and purpose
(diagnostic vs selection
Item Discrimination
Does the item distinguish higher from lower total scorers?
Point–biserial correlation (rpbis)
rpbis = corX , T(−i)
Correlate the 0/1 item score X with the total test score without that item T(−i).
Interpretation: About > .30 good, > .20 acceptable (context); near 0
weak; negative = red flag (miskey, ambiguity).
Difficulty–Discrimination interaction:
▶ If p ≈ 0 (too hard) or p ≈ 1 (too easy), groups look the same ⇒
rpbis shrinks.
▶ Many well-behaved items have .30 ≲ p ≲ .80 and rpbis ≳ .20.
Internal Consistency:
Metric: Cronbach’s α, Split-half (Spearman–Brown),
item-total correlation
Test–Retest Reliability:
correlation arcoss administrations
Parallel-Forms Reliability:
correlation between forms
Inter-Rater Reliability:
Metric: Cohen’s κ (categorical), ICC (continuous)
(reliability is about consistency)
Item–Test Reliability
Core Idea: Assess whether each item is consistent with the overall test
score (provides evidence for internal consistency).
Procedure:
1. Compute each item’s score (e.g., 0/1 for dichotomous, Likert
response for polytomous).
2. Compute the total test score (sum across items).
3. Correlate each item with the total score.
4. (Best practice) use a corrected item–total correlation, excluding the
item itself from the total.
Item–test reliability is part of the internal consistency toolbox.
Item–Test Reliability interpretation:
low or negative correlation: item does not align with the construct,
may add noise.
▶ Moderate correlation (e.g., > .30): item contributes useful,
consistent information.
▶ Very high correlation (> .80): item may be redundant with other
Item–Test Reliability example
item with ri,total = 0.05 (poor discrimination, likely problematic
item)
▶ Item with ri,total = 0.45 (acceptable contribution to the construct)
▶ Item with ri,total = 0.82 (very strong overlap, possibly redundant
with other items)
Item–test reliability helps decide whether to revise, keep, or remove
items in a scale.
Document Validity Evidence
TWO MAIN CONCERNS:
ensuring our measurment instrument is CONSISTENT
meaning the TARGET CONSTRUCT
results from expert reviews to ensure that instument covers the intended domain
Results from Cognitive Interviews verifying that our items evoke the latent processes we intended
3. Whether the EFA/CFA support the dimensionality we intended
4. Contrast our measurement instrument against related measurements (convergent validity!
Scoring — General Principles
Scoring: Turn item responses into interpretable scores by clearly
specifying scoring rules.
▶ Score types: total vs. subscale(s); 0/1 vs. polytomous; raw vs.
transformed.
▶ Keying & polarity: identify all reverse-coded items and ensure they
are reversed consistently.
▶ Missing data policy: state an exact rule for how to handle missing
data.
▶ Estimation precision: mean vs. sum; decimal places; ties.
Deliverables to users: scoring key, examples, missing-data rule,
interpretation guide, limitations
Scale Scores — Common Practices
subscales - mean(or sum) of item within each validated factor
total score: only if factors are sufficiently unidimensional or theroetically justified
BEFORE SCORING: reverse code keyed items: verify item total correlations have intended signs
typcial missing data rule (example:)
Allow up to 1 missing item or up to 20% missing; otherwise set
subscale to NA and report missingness.
▶ If allowed, prorate: use the mean of the answered items (for sums,
multiply that mean by the # of items); flag as prorated.
Transformations (when helpful):
▶ Standardize for comparability: z = x − ̄x
s ; optionally report
T = 50 + 10z alongside raw means.
Test Scores — Common Practices
Raw scoring: total correct (optionally partial credit for polytomous
items).
Scaled scores (why): place different forms on a common scale; improve
interpretability.
▶ Simple transform: linear map raw → scale (e.g., 200–800).
Test Scores — Common Practices
Cut scores (criterion-referenced intro):
1. Define performance levels (e.g.,
Basic/Proficient/Advanced) with clear descriptors.
2. Set provisional cuts with a defensible method:
▶ Angoff: panel estimates P(correct) for a minimally competent
examinee; sum across items.
▶ Bookmark: order items by difficulty; panel places the
“bookmark” at minimally competent point.
▶ (Other methods: Contrasting Groups, Body-of-Work.)
3. Check impact & error: pass rates, subgroup patterns, show
decision bands
Technical Report — Required Sections (document)
1. Overview: Purpose, population, uses, administration conditions,
accessibility.
2. Construct & Domain: working definition, content map/blueprint,
in/out of scope.
3. Development Process: item specifications, expert review, cognitive
interviews (protocol & changes).
4. Pilot Study: samples, procedures, exclusions, descriptive stats.
5. Dimensionality: EFA, CFA [Scales]
6. Item Functioning:
▶ [Scales] item–total, revision/removal rules.
▶ [Tests] difficulty (p), discrimination (rpbis / D
7. Reliability: internal consistency revisions.
8. Validity Evidence: content, response processes, internal
structure, relations to other variables, consequences.
9. Scoring: keys, reverse coding, subscales/total, missing-data
policy, transformations, reporting bands. [Scales]
10. Limitations & Use Guidance: appropriate/contraindicated
uses, known biases.
11. Maintenance Plan: review cadence, revalidation criteria,
contact/ownership.