1/77
Midterm 2
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
X
Observed score
T
True score
E
Error
Systematic errors
Errors inherent in the testing environment (or scale)
Random errors
Mood of a participant, guessing by participants, idiosyncrasies of participants
High alpha
Unidimensional scale (and not the whole scale)
Attenuation paradox
Increasing a test's reliability can, under certain conditions, lead to a decrease in its validity
Cronbach's alpha Diagnostic and education
.95, want it to be really high
Cronbach's alpha Research scales
.8
Spearman-Brown Formula
Benefits temper off at 19 items
Percent or raw agreement
Calculates the percentage of exact agreement for any type of scale or ratings
Kappa
For categorical ratings, takes into account the agreement between raters based on chance
Kappa 0.61 - 0.80
Substantial agreement
Kappa 0.81 - 1.00
Almost perfect agreement
Pearson's correlation
For interval or ratio data and two raters, assess the consistency across those raters (not exact agreement)
PC Very High/Strong
.7-.9
PC Moderate
.3-.7
PC Weak
.1-.3
ICC
For interval/ratio data, assess consistency across raters (not exact agreement)
ICC 0.75 to 0.90
Good reliability
ICC Greater than 0.90
Excellent reliability
Reliability
Patterns or correlations
Agreement
Exact same ratings
Validity
The degree to which evidence and theory support the interpretations of test scores for proposed uses of tests
Classic view
Tripartite view, 3 c's, criterion, content, and construct validity
Unified understanding of validity
Evidence based on test-criterion relations, Evidence based on response processes, Evidence based on relations to other variables, Evidence based on consequences of testing
Nominalist fallacy
Assumption that a test measures a construct simply because it is labeled as such
Factor analysis
Number and nature of latent factors (dimensions) underlying a set of items or variables
EFA
Exploratory factor analysis, generally used when limited research or theory is available
CFA
Generally used when research and theory are available and model specifications can be made
Advantages of CFA
Model specifications can be controlled by the researcher especially by knowing which items load onto which factors, how factors are correlated, and if measurement error variances covary
Factor loadings
Indicate the relationship between variables/items and their underlying latent factors
Eigenvalues
Indicate the amount of variance in the observed variables/items accounted for by each factor. Above 1 is valid
K1 Criterion
Components with an eigenvalue greater than one should be retained
Communalities
Measure the shared variance among items
Scree plot
Elbow, above 1 to count
One parameter
Difficulty or b-parameter, discrimination is fixed across all items
Two parameter
Difficulty or b-parameter, discrimination or a-parameter
Rasch model
Difficulty or b parameter, discrimination is fixed across all items as 1
Three parameter
b parameter, a parameter, c parameter
Step difficulties
Indicate the points on the ability scale which responses in adjacent categories are equally likely
Graded response model
Getting a higher score or getting a score
Partial credit model
Step difficulties
Generalized partial credit model
Like partial credit model but assumes that discrimination (a parameter) varies across items
A multidimensional IRT model
Used when items measure multiple distinct, often correlated, latent traits or dimensions
Threshold (b-parameter)
Indicates the point at which the probability of scoring in category k or higher is equal to 75%.
A parameter
Discrimination (the steepness of the IRF slope).
B parameter
Difficulty (the point of inflection of the IRF curve).
C parameter
Guessing (the point on the Y-axis where the curve tamps off to the left).
Orthogonal rotations
Factors assumed to be uncorrelated; examples include Varimax & Quartimax.
Oblique rotations
Factors assumed to be correlated; examples include Oblimin & Promax.
SRMR
Acceptable <.1, good fit <.08.
RMSEA
Acceptable <.08, good <.06.
CFI
Acceptable <.9, good <.95.
Good model fit
Means that the factor model reproduces the observed correlations well.
Unidentified model
Does not have enough information to estimate its parameters uniquely; it lacks the necessary degrees of freedom (Df <0).
Just-identified model
Has exactly enough information to estimate all its parameters (Df =0).
Overidentified model
Has more information than needed to estimate its parameters, allowing for testing the model's fit to the data (Df >0).
Model identification
Calculated as v(v+1)/2.
V
Number of items.
Df
v-needed, amount of money needed to spread evenly across all items.
Test retest reliability
Across time measurement (coefficient of stability only works for stable concepts).
Internal consistency
Across items measurement (alpha, omega).
Coefficient of equivalence
Across scales measurement.
Interrater agreement
Across raters measurement.
Cronbach's Alpha
Measure of internal consistency, should only be used with parallel and (essentially) tau-equivalent items.
Parallel items
Identical in every psychometric property.
Tau-equivalent items
Have equal true-score means and equal factor loadings, but their error variances can differ.
Essentially Tau Equivalent
Same as tau equivalent but allows different intercepts.
Congeneric items
Measure the same construct but with different loadings and intercepts.
Cronbach's A .70
For scales in the initial stages of development.
Cronbach's A .80
For basic research scales.
Cronbach's A .95
For individual diagnostic scales.
Noncognitive scales
In the .80s.
Cognitive scales
In the .90s.
1PL model
Assumes all items have the same discrimination power.
2PL model
Believes items vary in their ability to discriminate between individuals of different ability levels.
3PL model
Includes multiple-choice items where guessing might occur.