1/251
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Psychometric Theory
The science concerned with evaluating the attributes of psychological tests.
Psychological Test
A systematic procedure for comparing the behavior of two or more people OR the behavior of the same person at two different points in time.
It is a SAMPLE OF BEHAVIOR collected under STANDARDIZED CONDITIONS designed to measure a trait or CONSTRUCT of interest. It is scored or evaluated according to SYSTEMATIC PROCEDURES and usually renders QUANTITATIVE DATA; it is evaluated against NORMS.
Construct
An abstraction that we cannot directly see, feel, or touch - such as intelligence - but we can infer its presence from the individual's behavior (e.g., test performance shows slow processing speed)
Standardization
When a sample of behavior is collected under controlled conditions - specified by specific directions for administration that must be followed for each examinee without variation - ensures that extraneous sources of error are minimized.
Norms
Acquired when tests or scales are administered to a large, representative sample of individuals - allows us to ascertain what is typical or atypical performance
Norms Referenced Tests
Evaluated against a set of norms collected from a particular population.
Criterion Referenced Tests
Evaluated against a predetermined set of criteria (e.g., Classroom tests - trying to measure knowledge and there is a certain expectation everyone must achieve)
Measurement
A process by which numbers are assigned to observations.
Statistics
Procedures used to analyze the data generated by measurement (e.g., descriptive and inferential)
Levels of Measurement
Nominal
Ordinal
Interval
Ratio
Nominal Data
Classification into two or more distinct groups (e.g., yes/no, male/female) - Categories
Groups are presumed to be equal - one is not better than the other
Ordinal Data
Rank (e.g., grade in school, class rank)
Ranks need not be equally spaced - students might be ranked 1st, 2nd, or 3rd but we don't know how much the person in 1st won by and vice versa
Interval Data
Numerical scale with each point of the scale separated by an equal interval (e.g., GPA, scores on an IQ Test, scores on an MMPI-2 scale)
A little more precise - able to see how close or far a part two scores are
Ratio Data
An "absolute zero" exists (e.g., length, temperature on the Kelvin scale, time, number of items completed)
Rare in psychological measurement
Raw Score
Total number of points earned
If you have a test with 20 items and each correct answer earns 1 point, the range of possible raw scores are 0-20.
Frequency Distribution
Table that gives the number of people who obtained each possible score
Histogram
A bar graph depicting a frequency distribution
Frequency Polygon
Graph of a frequency distribution that shows the number of instances of obtained scores - the lines connect the top of the bars on the histogram
Measures of Central Tendency
Mean
Median
Mode
Mean
Average of a set of scores
Median
Score that falls exactly in the middle
Mode
The MOST common score
Measures of Variability
Range
Variance
Standard Deviation
Range
The difference between the highest and lowest scores in a distribution
Subject to outliers
Standard Deviation
Average amount by which a score deviates from the mean (square root of the variance)
Square root of the average of all squared deviations from the mean
Procedure:
1. Subtract the mean from each individual score
2. Square each of these differences
3. Add them together
4. Divide by N-1
5. Take the square root of this mean
Variance
Standard deviation squared
Positive Skew
Longer tail on the RIGHT
Most scores pile up at the LOW end (the left) and fewer at the high end - so more people do badly
Inadequate floor - it is too difficult for the test-takers so cannot differentiate among people at the low end of the ability scale (people crying on the floor doing bad)
Need to replace harder items with easier items
Mode < Median < Mean (not enough people are doing good)
Negative Skew
Longer tail is on the LEFT
Most scores pile up at the HIGH end (right) and fewer scores at the low end
Inadequate ceiling - it is too easy for the test-takers so cannot differentiate among people at the high end of the ability scale (people "raising" the roof happy they're doing good)
Need to replace easier items with harder items
Mean < Median < Mode (too many people doing good aka the most)
Kurtosis
How the distribution looks around the mode
Platykurtic
Flat curve - the mode isn't really standing out
Leptokurtic
Peaked curve - the mode is really standing out
Modality
Distributions can have more than one mode -> Multimodal
Bimodal - distribution has two modes
Normal distribution
A bell-shaped curve
Mean = Median = Mode
Symmetrical around the mean
Asymptotic tails - approach but never reach zero
68% of scores fall within 1SD below or above the mean
95% of scores fall within 2SD below or above the mean
99% of scores fall within 3SD below or above the mean
In the center (mean) - z score = 0 and t score = 50
Age-equivalent scores - Developmental Norms
Mental Age
Based on the average raw score of individuals of different age groups in the standardization sample
If an examinee gets a raw score of 25, and the mean raw score for 10 year olds is 25, then the examinee's mental age is 10
Grade-Equivalent Scores - Developmental Norms
Based on the average raw score of individuals in the standardization sample who are in a particular school grade
If an examinee gets a raw score of 25, and the mean raw score for those in grade 5 is 25, then the examinee's grade-equivalent score is 5.
**If a 4th grade child gets a score of 7.4 in Arithmetic, it does not mean that they have mastered 7th grade arithmetic - it means the child is well above average compared to other 4th grades - grade is comparable to the average raw score obtained by a group of students in the 4th month of 7th grade.
Within-Group Norms
Compare the examinee directly against their peers
Percentile Rank
Z-Scores
T-Scores
Standard Scores
Percentiles
Percentage of the sample that obtained scores that were equal to or lower than the score obtained by the examinee
Can be obtained from the frequency distribution by calculating the cumulative frequency
Are on an ORDINAL (not interval)
E.g., 50th percentile = 100 and 53rd percentile = 101
Cumulative Frequency
The number of people who obtained a raw score equal to or lower than a given raw score
Z-Score
How far away from the mean the raw score lies measured in units of standard deviation
Z = 0 corresponds to a score exactly at the mean
z = 1.00 corresponds to a score 1SD above the mean (can be negative)
z = x-M/SD
T-Score
Transforms the z-score so that the mean corresponds to T=50 and the standard deviation corresponds to 10 T-score units
T = 10z + 50
T = 50 corresponds to a score exactly at the mean
T = 60 corresponds to a score 1 SD above the mean
T = 40 corresponds to a score 1 SD below the mean
To determine how many SDs a T-score falls, calculate the z-score:
z = T - 50/10
Standard Scores
Converts the z-score to a scale with a mean of 100 and SD of 15
SS = 15z + 100 (always rounded)
Converting SS to z-scores:
z = SS - 100/15
Correlation
A statistic that describes the relationship between two variables (x and y)
If no relationship exists the correlation = 0
Positive Correlation
A correlation where as one variable increases, the other also increases, or as one decreases so does the other.
Both variables move in the same direction.
Higher scores on X are associated with higher scores on Y and vice versa
Negative Correlation
A correlation where as one variable increases, the other decreases.
Higher scores on X are associated with lower scores on Y and vice versa
Pearson Correlation
Statistic that allows us to express the relationship between X and Y (r) - applies to linear relationships
Both variables are interval or ratio** (e.g., WAIS IQ score and score on standardized reading test)
May take on values ONLY between -1.00 and +1.00
If there is no correlation, r = 0
+ correlation, r will be between 0 and +1.00
- correlation, r will be between -1.00 and 0
SIGN tells us the direction of the relationship
MAGNITUDE (ABSOLUTE VALUE) tells us the strength = the closer the magnitude is to 1, the stronger the relationship (-.70 is stronger than +.30)
P-Value
Tells us the chance that we will be wrong if we conclude that there is a relationship between two variables in a population.
P = .04 means there is a 4% chance that we will make an error
p < (or equal to) .05 is considered statistically significant
Homoscedasticity
All data points fall within a elliptical shape; range of values on Y are same for each value of X
Heteroscedasticity
Shape of data points deviates from ellipse and is fan-shaped; range of values on Y are not the same for each value of X
Restricted Range
Reduces the magnitude of the calculated r - means you sampled a very small piece of the distribution such that the scores obtained on one or both of the variables was much narrower
Common reason why population correlation coefficients can be underestimated by sample r's
Spearman rho
Correlation between ranks - both variables are Ordinal
E.g., Class rank in junior year and class rank in senior year
Phi
The correlation coefficient when both of the variables are measured as true dichotomies - both variables are nominal with 2 categories
E.g., Scores on two binary items
Tetrachoric
Correlation between two artificially dichotomous variables (no natural way of distinguishing between variables, you just want to see if someone passes or fails and you decided what that meant)
Contingency Coefficient
Correlation coefficient for nominal data with 1 or 2 categories
Point Biserial
The correlation coefficient used when one variable is a true dichotomy (nominal) and the other is continuous (interval or ratio)
E.g., Score on a binary item (0,1) and the total score on the test
Biserial
The correlation coefficient used when one variable is an artificial dichotomy and the other is continuous (interval or ratio)
E.g., scores on an anxiety test and classification of high/low based on anxiety scores
Eta (Curvilinear)
The correlation used when both variables are interval or ratio but not linear For example: The relationship between Extraversion and Sales Performance is curvilinear and the scatterplot resembles an inverted U, this means that **moderately extraverted people have better sales performance than highly extraverted people.
Coefficient of Determination
Obtained by squaring the correlation coefficient (r^2)
Interpreted as the percentage of variance in one variable that is predicted (explained by or shared with) the other variable
Example:
If r = .7 between IQ and reading test scores
The coefficient of determination (r^2) = .49 = 49%
So, 49% of the variance in reading test scores is predictable (or explained by) IQ scores and 51% of the variance in reading test scores is due to other factors
CORRELATION DOES NOT IMPLY CAUSALITY
A strong correlation between X and Y could mean:
1. X causes Y
2. Y causes X
3. A third, unmeasured variable influences both X and Y
Factor Analysis
A statistical procedure that identifies clusters of related items (called factors) on a test; used to identify underlying dimensions or constructs that can account for the pattern of correlations among variables
Reduces the number of variables we have to work with
Correlation Matrix
A table showing the relationships among discrete measures
Entries in diagonal are +1.00
Section above the +1.00 is a mirror image of the section below
Steps in a Factor Analysis
1. Deciding on the number of factors = number of variables
2. Extracting the factors
3. Examining the factor loadings
4. Performing a rotation
5. Examining the rotated factor loadings
6. Interpreting and naming the factors
Eigenvalue
Amount of variance associated with each factor
Scree Plot
A graphical representation of Eigenvalues (vertical axis) vs. the number of factors (horizontal axis)
Locate the place where there is a large drop - number of factors to extract is at the top of the drop
Factor Loading
Correlation of each of the original variables with each factor (that were extracted)
Orthogonal Factors
Factors that are not correlated with each other
Oblique Factors
Factors that are correlated with each other
Second-Order Factors
Factor analysis done on Oblique Factors (correlated factors) to further minimize number of factors since they are similar
Thurstone's Criteria
1. Eliminate negative factor loadings
2. Each variable has a high loading on only one factor
Multidimensional Test
Items groups into two or more separate factors
Test is measuring more than one construct
Unidimensional Test
All items load on a single factor
Test is measuring a single construct
Common Variance
Percentage of total variance in a test (1.00) that is shared with the other tests in the battery - what the test has in common with other tests included in the factor analysis
CV = Sum of the squared factor loadings for that test
Loading of a test on Factor I is .9 and loading on Factor II is -.1
CV = (.9)^2 + (-.1)^2 = .81 + .01 = .82
Means 82% of the variance of this test is shared with the other tests that were included in the factor analysis
Error Variance
Percentage of total variance in test that is due to random measurement error
Subtract the reliability of the test from 1.00
So if reliability is .9, the EV = .10
Specific Variance
Percentage of total variance in a test that is unique to that test and not shared with the other tests in a battery
First calculate CV and EV
Add the CV and EV together
Subtract the total from above from 1.00 to get SV
Total Variance
Common Variance + Specific Variance + Error Variance
Reliability
Is the test measuring anything other than error?
Reliability ensured we are measuring something meaningful - error is always present but is there more there?
The proportion of the variance in observed scores that is due to true differences among the test-takers on the trait being measured (Classical Reliability Theory)
Error
Occurs when measurement of a construct is confounded by factors that are not relevant to the construct we want to measure
Systematic Error
A "mistake" that can be corrected or eliminated
-Mistakenly keyed correct answers
-Discoverable and correctable
-Lack of familiarity with scoring criteria
Random Error
-Is impossible to eliminate and measure
-Inherent in any measurement attempt
-Is not correlated with the obtained scores (aka "random")
Obtained Score
Score we obtain whenever we administer a test - it is obtained or measured
-The person's measured standing on the trait we are interested in measuring
Might not be the same as the true score due to error
True Score
What the person's score would be if there were no measurement error - might be the same as the Observed Score - can never be measured precisely
The person's actual standing on the trait or the actual amount of the trait they possess
A theoretical construct that we cannot truly measure due to error
For very large samples, the Mean of the Observed Scores...
is equal to the mean of the True Scores.
For very large samples, the Variance of the Observed Scores...
is greater than the Variance of the True Scores.
Random measurement error is...
uncorrelated with True Scores.
Classical Reliability Theory
An individual's observed score can be partitioned into a true part & an error part. It's never going to be 100% true.
If XO is the observed score and XT is the true score, and e is the amount of error:
XO = XT + e
*Error cannot really be measured and error scores are random and uncorrelated with observed and true scores
Due to Error, Reliability (rxx) =
Variance (XT)/Variance (XO)
Error Variance
Proportion of variance in observed scores that is due to error
= Variance (e)/Variance (XO)
Reliability versus error variance will always equal 1.00
Sources of Error
Time Sampling
Item Sampling
Internal Consistency
Inter-rater Reliability
Time Sampling Error
Error associated with "when" the test is given
Test/Retest Method
*To account for time sampling error
The same exact test is re-administered to the same exact group of examinees at a later date
A correlation coefficient is calculated on the two sets of scores - Test-Retest Reliability Coefficient (rtt)(AKA the test's stability)
zR (predicted retest score expressed as z-score)
zO (original score)
rtt (test-retest reliability coefficient)
zR = rtt X zO
The reliability of a test using the test/retest method is BLANK if the interval is too short...
Overestimated
-Similar conditions affecting both test and retest
-Memory for original responses
The reliability of a test using the test/retest method is BLANK is the interval is too long...
Underestimated
-Real changes in the trait being measured
Practice Effect
Second administration is not equivalent to first administration due to previous exposure to items
If we assume that practice leads to better scores on retest, then practice will decrease the correlation between test and retest and lead to an under-estimate of the test/retest correlation and an over-estimate of time sampling error.
**On a test where there is a substantial practice effect (such as Block Design), the test-retest method will OVERESTIMATE the amount of time sampling error.
Regression Towards the Mean
The tendency for extreme or unusual scores to fall back (regress) toward their average - seen in test/retest method because the TRT coefficient will always be less than +1.00 (due to time sampling error), so retest scores are predicted to be closer to the mean than were the original scores
*Scores above the mean are predicted to decrease on retest
*Scores below the mean are predicted to increase on retest
*The lower the TRT reliability, the greater the regression on retest
Item Sampling Error
Error due to which items are selected due to not being certain that items are sampled randomly
Alternate Form
Accounts for Item Sampling Error
Two parallel (alternate) forms of the same test are constructed
-Each one is the same length
-Equivalent but not identical items on each
The forms are given to the same sample of examinees on the same day - order is counterbalanced
Correlation between the scores is known as Alternate Form Reliability
Split-Half Method
Solution to problems identified in Alternate Form Method
Checking the reliability by comparing the odd answers of one group by the even answers of another group
Reliability is related to test length, so all other things being equal, longer tests automatically have high reliability than shorter tests
This method will underestimate the Alternate Form Reliability and overestimate the amount of item sampling error
Spearman-Brown Formula
Solution to Split-Half Method
Enables us to predict what the Alternate Form Reliability would be from the Split Half Reliability
rtt (alternate form reliability) = 2rhh (split half reliability)/ 1 + rhh
Allows us to estimate what the reliability of the test would be if we added items, deleted items, and how many items would have to be added in order to achieve the desired reliability
Internal Consistency
Whether the items are all measuring the trait of interest
A group of items is homogenous or internally consistent when all the items are measuring the same construct equally well
Helps to ensure total scores on scales have the same meaning
Inter-Item Correlation
Assesses internal consistency
The correlation between each pair of items on the scale is calculated and the mean of these correlations is a measure of the scale's internal consistency - examine the extent to which scores on one item are related to scores on all other items in a scale
If there are n items on the test, then there are n(n-1)/2 unique correlations - these are phi coefficients (nominal x nominal)
Item-Total Correlation
Assesses internal consistency
Correlation between the item score (0,1) and the total score on the test
If there are n items, there will be n item-total correlations - this is the point biserial (nominal x interval/ratio)
The mean of these n item-total correlations is a measure of the scale's internal consistency
Kuder-Richardson Formula
Used to calculate internal consistency when items are dichotomous or nominal (yes/no, true/false)