stats 5 et lec 13,14

0.0(0)

Studied by 0 people

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/53

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

54 Terms

New cards

What is PCA?

Principal component analysis (PCA) is a dimensionality reduction method

transforms a set of possibily correalted variables into smaller set of uncorrelated variables called principal components
Ordered so that the first captures the most variance, the second the next most etc

New cards

variable 1 (v1)

key:

values observed for each partcipant place them along a dimension described by V1

V1 sorces observed (varaince reveals ONE DIMENSION like project individual data

‘math proficiceny’

x-axis

New cards

V2 (two dimensional space)

PROJECT EACH OBSERVATION onto cartesian space

ook each variable in isolation, interpret the variability in the values observed as mapping to a dimension along which these individual observations vary

reading proficienty (y-axis)

New cards

V3 (third dimension)

project each observation onto three dimensional space

{x = V1, y = V2, z = V3} coordinates

what can you say about this third dimension captured by V3?

Ans: V3 has very low variance

(V1 et V2 are cupatured by two dimenions the most of variability)

New cards

How would V3 would be dropped?

used Principal component analysis

New cards

Ex: lets say V1 et V2 are heavily correlated

individual both has a lot varaibilthy within but ploting together see that they vary together (share variance)

given their covariance only V1 gives some information about V2 (vice versa)

so the goal of PCA is to find the principal components that capture the MOST variability in the data

New cards

PC rotation

the coordinate system so that NEW AXES (principal components) line up with the DIRECTIONS WHERE THE DATA VARY THE MOST

New cards

PC rotation (STEP BY STEP INTUITION)

create the cloud so the origin is at the MEAN
FIND THE LONGEST STRETCH of the cloud —> angle is PC1
DRAW PC1 as a new axis through the mean
DROP A PERPENDICULAR —> PC2 captues leftover variance
RE-DESCRIBE POINTS usinig (PC1,PC2) incstead of (V1, V2)
RANK IMPORTANCE: variance along each PC (it’s eigenuvale): plot them —> scree plot

New cards

EIGENVALUES

PCA rotatees the axes so that PC1 follows the direction of MAXIMUM VARIANCE

amount of variance captuared along each PC

New cards

EIGENVECTOR

linear algebra — is a direction that does not change when we apply a transformation (gets strecthed or shrunk)

New cards

Eigenvalue

amount of stretching in that direction

New cards

each eigenvector =

principal component (direction of variacne)

New cards

each eigenvalue

variance explained by that component

tell us how important each component is

New cards

Scree plot

displays eigenvalues across components in descending order

HELPS DECIDE: how many components to keep

New cards

dimensionality reduction method

how many non redundant dimesions do i need to describe the variabiilty in my data?

what are the primcipcal conmpments

New cards

When do we use PCA?

SUMMARIZE DATA (many variables while losing minimal information)

VISUALIZE high-dimensional data (2D or 3D)

REMOVE REDUNDANCY (multicollinearity) in corrlated predictors

prcomp()

New cards

SCREE PLOT: vairance explained by each PC

PC1 most variance

PC2 second most vairance

PC3-PC6 minimal variance (can be dropped)

Elbow rule: look where curve flattens out

New cards

HOw does PCA differ from factor analysis?

data reduction technique

creates new axes (PCs) that captural TOTAL VARIANCE

ARE mathematical, not always meaningful

GOAl: SUMAMRIZE DATA WITH FEWER DIMENSIONS

New cards

how does Factor analysis differ from PCA?

Latent varaible model

Explain SHARED VARIANCE among observed variables

FACTORS = HIDDEN CONSTRUCTS (math ability)

GOAL: uncover underlying caues of correlations

SUMMARIZES VARIANCE

New cards

Visual analogy

tests correlate strongly

FA expalins this corrlation with ONE LATENT FACTOR

each test has its own uniqueness (ERORRS)

New cards

WHY IS FACTOR ANALYSIS IMPORTANT?

many psych constructs ARE NOT DIRECTLY MEASUREABLE (intelligence, anxiety,, motviation)

DO MEASURE ARE INDICATORS (test items, survey questions) “i feel nerouvse before exams, i have trouble sleeping

HELP US:

which items "“go together” because reflect same HIDDEN trait
REDUCE dozens of questions into few meaningful scales
RPVOIDE EVIDENCE FOR VALID MEASRUMENT of latent constructs

LATENT CONSTRUCTS

New cards

Scale development: the big picture

define the construct and its content domain
generate items
expert review
cognitive interviewing
PILOT STUDY
ITEM ANALYSIS
DIMENSIONALITY
REALIBTILTY
VALIDITY EVIDENCE
SCOREING
FINAZLIE & DOCUMENT

New cards

Test development: big picture

defien the construct and domain
wrist test specifications
generate test items
experect review
cognitive interviewing
PIOLT TESTING
ITEM ANALYSIS
DIFFICULTY & DISCRIMINATION
RELAIBILTY anaylsys
VALIDITY EVIDENCE
SCORING & NORMS
FIANZLIE & DOCUMENT

New cards

Step 6: Pilot Testing

the test is administered to a small group of participants to evaluate its
feasibility and effectiveness.
▶ Helps identify practical issues: completion time, confusing items,
technical problems.
▶ Provides first feedback on item functioning, instructions, and
timing.
▶ Bridges the transition from item writing to formal test validation.
▶ Item analysis:
▶ Reliability: Internal consistency
▶ Validity evidence: Content, criterion, construct validity.
▶ Scoring, missing data, and reporting: Define rules, inspect
patterns, document limitations.

New cards

step 7: Item analysis

Items are written to measure a specific aspect of the construct
we want to measure.
▶ They are written to measure different aspects of the content
domain (scales) or by mapping the test specifications (tests).
▶ They are written following general guidelines and best
practices.
▶ But, these initial attempts are only theoretical.
▶ The responses observed across items in our pilot study provide
us with empirical evidence of how the items are functioning.
▶ We analyse these responses to flag items with ambiguity, low
variance, or poor discrimination for revision/removal.
▶ Generally speaking, item analysis takes place at two levels:
1. Basic item quality checks
2. Item functioning (Dimensionality checks and
difficulty/discrimination analysis - Step 8)

New cards

SCALE ITEMS - quality check!

Scale items are built with the intention of measuring a specific
aspect of the construct we want to measure.
▶ Each item is intended to provide relevant non-overlapping or
redundant information.
▶ We want our scale to be efficient.
▶ There’s no point in having items and or response options that
are not providing relevant information.
Initial quality check: We look at the distribution of responses
observed across items.
We look for:
▶ 1) Response options that are not used.
▶ 2) Items with low to no variability in the responses observed.

PROBLEMATIC SINCE PROVIDING LIMITED INFORMATION

New cards

Test items - Quality checks!

Each item is intended to measure a sub-skill or knowledge
that is relevant to the construct we want to measure.
▶ We want our test to be able to discriminate between
different levels of knowledge.
Three main quality checks:
1. Identify items that are always answered correctly.
2. Identify items that are answered correctly less than
expected by chance.
3. Additionally, check the response distribution per item: Focus
on the distractors.
▶ Look for distractors that are never used (too obvious).
▶ Look for distractors that are too attractive (used more than
expected by chance)

PROBLEMATIC - DISTRACTOR MAY BE TOO ATTRACTICE (too specific or too ambiguous)

New cards

Dimensionality check (Scales)

Do all items reflect one underlying trait, or several related traits?
Dimensionality checks provide evidence for:
▶ Score meaning: A total score only makes sense if items pull
on the same latent construct.
▶ Reliability: Internal consistency assumes (rough)
unidimensionality; alpha/omega change with structure.
▶ Prerequisite for validity: Dimensionality constrains
hypotheses for relations to other variables
(convergent/discriminant).

New cards

What is a “Dimension”?

A dimension is a hidden commonality that makes some items vary
together.
Example: “Academic Stress” may split into:
▶ Time pressure (deadlines, juggling courses)
▶ Evaluation anxiety (tests, grading)
▶ Role overload (work–school conflict)
Key idea: Items within each dimension should be correlated with each
other

New cards

Exploratory dimensionality check

Start by inspecting the item correlation matrix.
▶ Exploratory Factor Analysis (EFA): Model shared variance
to explore factor structure

New cards

Confirmatory dimensionality check (dimensinoality check- how)

Confirmatory Factor Analysis (CFA): Test a hypothesized
structure.

New cards

Polychoric Correlations for Ordinal Data

Issue with Pearson on Likert:
Treats response options as equal-interval (but they have an ordinal
scale).
▶ With few/skewed categories, Pearson often attenuates the true
association.
Idea behind Polychoric: Assume each ordinal item comes from a
continuous latent variable cut by thresholds. We want to estimate the
latent correlation ρ between these ordinal items.
Strongly Disagree Disagree Neutral Agree Strongly Agree
1 2 3 4 5
Numbers are labels, they don’t carry numerical meaning other than order.

New cards

Polychoric Correlations for Ordinal Data
KEY

observed responses are chosen based on thresholds
cutting a latent continuous space. We estimate ρ that matches the
observed contingency table.

New cards

Exploratory Factor Analysis (EFA)

Let the data suggest clusters of items.
▶ Uses only shared variance; allows factors to correlate (oblique
rotation).
▶ Good for a first validated structure; flags problematic items

New cards

Confirmatory Factor Analysis (CFA)

Test a specific structure.
▶ You declare which items load on which factors.
▶ Evaluates fit (e.g., CFI/TLI, RMSEA, SRMR) and supports
invariance tests

New cards

flow in pracitce:

Draft scale —> EFA(explore) —> revise items —> CFA (confirm) on new sample

New cards

EFA —> CFA PIPELINE (EFA PORTION)

use polychoric correlations (Likert). Parallel analysis for factor
retention.
▶ Oblique rotation (Promax/Oblimin) for correlated facets.
▶ Flag: low communalities, cross-loadings > .30, Heywood cases.

New cards

EFA —> CFA PIPLINE (CFA PORTion)

Competing models: 1F vs correlated factors vs bifactor.
▶ Report: CFI/TLI (> .95 good), RMSEA (< .06), SRMR (< .08),
but judge substantively.
▶ Invariance: configural → metric → scalar (groups/time)

New cards

Item Difficulty

Definition (dichotomous scoring):
p = 1/N NX i=1 Xi where Xi ∈ {0, 1} (1 = correct)

Interpretation: p is the proportion correct (easiness); some texts define
difficulty as 1 − p.
General guidelines:
▶ For general tests, aim broadly p ≈ .30–.80 (avoid floor/ceiling).
▶ With k options, chance level = 1/k. p ≪ chance ⇒ suspect
miskey/ambiguity.
▶ Choose a mix of p values to target your score range and purpose
(diagnostic vs selection

New cards

Item Discrimination

Does the item distinguish higher from lower total scorers?
Point–biserial correlation (rpbis)
rpbis = corX , T(−i)
Correlate the 0/1 item score X with the total test score without that item T(−i).
Interpretation: About > .30 good, > .20 acceptable (context); near 0
weak; negative = red flag (miskey, ambiguity).

Difficulty–Discrimination interaction:
▶ If p ≈ 0 (too hard) or p ≈ 1 (too easy), groups look the same ⇒
rpbis shrinks.
▶ Many well-behaved items have .30 ≲ p ≲ .80 and rpbis ≳ .20.

New cards

Internal Consistency:

Metric: Cronbach’s α, Split-half (Spearman–Brown),
item-total correlation

New cards

Test–Retest Reliability:

correlation arcoss administrations

New cards

Parallel-Forms Reliability:

correlation between forms

New cards

Inter-Rater Reliability:

Metric: Cohen’s κ (categorical), ICC (continuous)

(reliability is about consistency)

New cards

Item–Test Reliability

Core Idea: Assess whether each item is consistent with the overall test
score (provides evidence for internal consistency).
Procedure:
1. Compute each item’s score (e.g., 0/1 for dichotomous, Likert
response for polytomous).
2. Compute the total test score (sum across items).
3. Correlate each item with the total score.
4. (Best practice) use a corrected item–total correlation, excluding the
item itself from the total.
Item–test reliability is part of the internal consistency toolbox.

New cards

Item–Test Reliability interpretation:

low or negative correlation: item does not align with the construct,
may add noise.
▶ Moderate correlation (e.g., > .30): item contributes useful,
consistent information.
▶ Very high correlation (> .80): item may be redundant with other

New cards

Item–Test Reliability example

item with ri,total = 0.05 (poor discrimination, likely problematic
item)
▶ Item with ri,total = 0.45 (acceptable contribution to the construct)
▶ Item with ri,total = 0.82 (very strong overlap, possibly redundant
with other items)
Item–test reliability helps decide whether to revise, keep, or remove
items in a scale.

New cards

Document Validity Evidence

TWO MAIN CONCERNS:

ensuring our measurment instrument is CONSISTENT
meaning the TARGET CONSTRUCT

results from expert reviews to ensure that instument covers the intended domain
Results from Cognitive Interviews verifying that our items evoke the latent processes we intended
3. Whether the EFA/CFA support the dimensionality we intended
4. Contrast our measurement instrument against related measurements (convergent validity!

New cards

Scoring — General Principles

Scoring: Turn item responses into interpretable scores by clearly
specifying scoring rules.
▶ Score types: total vs. subscale(s); 0/1 vs. polytomous; raw vs.
transformed.
▶ Keying & polarity: identify all reverse-coded items and ensure they
are reversed consistently.
▶ Missing data policy: state an exact rule for how to handle missing
data.
▶ Estimation precision: mean vs. sum; decimal places; ties.
Deliverables to users: scoring key, examples, missing-data rule,
interpretation guide, limitations

New cards

Scale Scores — Common Practices

subscales - mean(or sum) of item within each validated factor

total score: only if factors are sufficiently unidimensional or theroetically justified

BEFORE SCORING: reverse code keyed items: verify item total correlations have intended signs

New cards

typcial missing data rule (example:)

Allow up to 1 missing item or up to 20% missing; otherwise set
subscale to NA and report missingness.
▶ If allowed, prorate: use the mean of the answered items (for sums,
multiply that mean by the # of items); flag as prorated.
Transformations (when helpful):
▶ Standardize for comparability: z = x − ̄x
s ; optionally report
T = 50 + 10z alongside raw means.

New cards

Test Scores — Common Practices

Raw scoring: total correct (optionally partial credit for polytomous
items).
Scaled scores (why): place different forms on a common scale; improve
interpretability.
▶ Simple transform: linear map raw → scale (e.g., 200–800).

New cards

Test Scores — Common Practices

Cut scores (criterion-referenced intro):
1. Define performance levels (e.g.,
Basic/Proficient/Advanced) with clear descriptors.
2. Set provisional cuts with a defensible method:
▶ Angoff: panel estimates P(correct) for a minimally competent
examinee; sum across items.
▶ Bookmark: order items by difficulty; panel places the
“bookmark” at minimally competent point.
▶ (Other methods: Contrasting Groups, Body-of-Work.)
3. Check impact & error: pass rates, subgroup patterns, show
decision bands

New cards

Technical Report — Required Sections (document)

1. Overview: Purpose, population, uses, administration conditions,
accessibility.
2. Construct & Domain: working definition, content map/blueprint,
in/out of scope.
3. Development Process: item specifications, expert review, cognitive
interviews (protocol & changes).
4. Pilot Study: samples, procedures, exclusions, descriptive stats.
5. Dimensionality: EFA, CFA [Scales]
6. Item Functioning:
▶ [Scales] item–total, revision/removal rules.
▶ [Tests] difficulty (p), discrimination (rpbis / D

7. Reliability: internal consistency revisions.
8. Validity Evidence: content, response processes, internal
structure, relations to other variables, consequences.
9. Scoring: keys, reverse coding, subscales/total, missing-data
policy, transformations, reporting bands. [Scales]
10. Limitations & Use Guidance: appropriate/contraindicated
uses, known biases.
11. Maintenance Plan: review cadence, revalidation criteria,
contact/ownership.