Psychometrics/Psychological Testing

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/251

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

252 Terms

New cards

Psychometric Theory

The science concerned with evaluating the attributes of psychological tests.

New cards

Psychological Test

A systematic procedure for comparing the behavior of two or more people OR the behavior of the same person at two different points in time.

It is a SAMPLE OF BEHAVIOR collected under STANDARDIZED CONDITIONS designed to measure a trait or CONSTRUCT of interest. It is scored or evaluated according to SYSTEMATIC PROCEDURES and usually renders QUANTITATIVE DATA; it is evaluated against NORMS.

New cards

Construct

An abstraction that we cannot directly see, feel, or touch - such as intelligence - but we can infer its presence from the individual's behavior (e.g., test performance shows slow processing speed)

New cards

Standardization

When a sample of behavior is collected under controlled conditions - specified by specific directions for administration that must be followed for each examinee without variation - ensures that extraneous sources of error are minimized.

New cards

Norms

Acquired when tests or scales are administered to a large, representative sample of individuals - allows us to ascertain what is typical or atypical performance

New cards

Norms Referenced Tests

Evaluated against a set of norms collected from a particular population.

New cards

Criterion Referenced Tests

Evaluated against a predetermined set of criteria (e.g., Classroom tests - trying to measure knowledge and there is a certain expectation everyone must achieve)

New cards

Measurement

A process by which numbers are assigned to observations.

New cards

Statistics

Procedures used to analyze the data generated by measurement (e.g., descriptive and inferential)

New cards

Levels of Measurement

Nominal

Ordinal

Interval

Ratio

New cards

Nominal Data

Classification into two or more distinct groups (e.g., yes/no, male/female) - Categories

Groups are presumed to be equal - one is not better than the other

New cards

Ordinal Data

Rank (e.g., grade in school, class rank)

Ranks need not be equally spaced - students might be ranked 1st, 2nd, or 3rd but we don't know how much the person in 1st won by and vice versa

New cards

Interval Data

Numerical scale with each point of the scale separated by an equal interval (e.g., GPA, scores on an IQ Test, scores on an MMPI-2 scale)

A little more precise - able to see how close or far a part two scores are

New cards

Ratio Data

An "absolute zero" exists (e.g., length, temperature on the Kelvin scale, time, number of items completed)

Rare in psychological measurement

New cards

Raw Score

Total number of points earned

If you have a test with 20 items and each correct answer earns 1 point, the range of possible raw scores are 0-20.

New cards

Frequency Distribution

Table that gives the number of people who obtained each possible score

New cards

Histogram

A bar graph depicting a frequency distribution

New cards

Frequency Polygon

Graph of a frequency distribution that shows the number of instances of obtained scores - the lines connect the top of the bars on the histogram

New cards

Measures of Central Tendency

Mean

Median

Mode

New cards

Mean

Average of a set of scores

New cards

Median

Score that falls exactly in the middle

New cards

Mode

The MOST common score

New cards

Measures of Variability

Range

Variance

Standard Deviation

New cards

Range

The difference between the highest and lowest scores in a distribution

Subject to outliers

New cards

Standard Deviation

Average amount by which a score deviates from the mean (square root of the variance)

Square root of the average of all squared deviations from the mean

Procedure:

1. Subtract the mean from each individual score

2. Square each of these differences

3. Add them together

4. Divide by N-1

5. Take the square root of this mean

New cards

Variance

Standard deviation squared

New cards

Positive Skew

Longer tail on the RIGHT

Most scores pile up at the LOW end (the left) and fewer at the high end - so more people do badly

Inadequate floor - it is too difficult for the test-takers so cannot differentiate among people at the low end of the ability scale (people crying on the floor doing bad)

Need to replace harder items with easier items

Mode < Median < Mean (not enough people are doing good)

New cards

Negative Skew

Longer tail is on the LEFT

Most scores pile up at the HIGH end (right) and fewer scores at the low end

Inadequate ceiling - it is too easy for the test-takers so cannot differentiate among people at the high end of the ability scale (people "raising" the roof happy they're doing good)

Need to replace easier items with harder items

Mean < Median < Mode (too many people doing good aka the most)

New cards

Kurtosis

How the distribution looks around the mode

New cards

Platykurtic

Flat curve - the mode isn't really standing out

New cards

Leptokurtic

Peaked curve - the mode is really standing out

New cards

Modality

Distributions can have more than one mode -> Multimodal

Bimodal - distribution has two modes

New cards

Normal distribution

A bell-shaped curve

Mean = Median = Mode

Symmetrical around the mean

Asymptotic tails - approach but never reach zero

68% of scores fall within 1SD below or above the mean

95% of scores fall within 2SD below or above the mean

99% of scores fall within 3SD below or above the mean

In the center (mean) - z score = 0 and t score = 50

New cards

Age-equivalent scores - Developmental Norms

Mental Age

Based on the average raw score of individuals of different age groups in the standardization sample

If an examinee gets a raw score of 25, and the mean raw score for 10 year olds is 25, then the examinee's mental age is 10

New cards

Grade-Equivalent Scores - Developmental Norms

Based on the average raw score of individuals in the standardization sample who are in a particular school grade

If an examinee gets a raw score of 25, and the mean raw score for those in grade 5 is 25, then the examinee's grade-equivalent score is 5.

**If a 4th grade child gets a score of 7.4 in Arithmetic, it does not mean that they have mastered 7th grade arithmetic - it means the child is well above average compared to other 4th grades - grade is comparable to the average raw score obtained by a group of students in the 4th month of 7th grade.

New cards

Within-Group Norms

Compare the examinee directly against their peers

Percentile Rank

Z-Scores

T-Scores

Standard Scores

New cards

Percentiles

Percentage of the sample that obtained scores that were equal to or lower than the score obtained by the examinee

Can be obtained from the frequency distribution by calculating the cumulative frequency

Are on an ORDINAL (not interval)

E.g., 50th percentile = 100 and 53rd percentile = 101

New cards

Cumulative Frequency

The number of people who obtained a raw score equal to or lower than a given raw score

New cards

Z-Score

How far away from the mean the raw score lies measured in units of standard deviation

Z = 0 corresponds to a score exactly at the mean

z = 1.00 corresponds to a score 1SD above the mean (can be negative)

z = x-M/SD

New cards

T-Score

Transforms the z-score so that the mean corresponds to T=50 and the standard deviation corresponds to 10 T-score units

T = 10z + 50

T = 50 corresponds to a score exactly at the mean

T = 60 corresponds to a score 1 SD above the mean

T = 40 corresponds to a score 1 SD below the mean

To determine how many SDs a T-score falls, calculate the z-score:

z = T - 50/10

New cards

Standard Scores

Converts the z-score to a scale with a mean of 100 and SD of 15

SS = 15z + 100 (always rounded)

Converting SS to z-scores:

z = SS - 100/15

New cards

Correlation

A statistic that describes the relationship between two variables (x and y)

If no relationship exists the correlation = 0

New cards

Positive Correlation

A correlation where as one variable increases, the other also increases, or as one decreases so does the other.

Both variables move in the same direction.

Higher scores on X are associated with higher scores on Y and vice versa

New cards

Negative Correlation

A correlation where as one variable increases, the other decreases.

Higher scores on X are associated with lower scores on Y and vice versa

New cards

Pearson Correlation

Statistic that allows us to express the relationship between X and Y (r) - applies to linear relationships

Both variables are interval or ratio** (e.g., WAIS IQ score and score on standardized reading test)

May take on values ONLY between -1.00 and +1.00

If there is no correlation, r = 0

+ correlation, r will be between 0 and +1.00

- correlation, r will be between -1.00 and 0

SIGN tells us the direction of the relationship

MAGNITUDE (ABSOLUTE VALUE) tells us the strength = the closer the magnitude is to 1, the stronger the relationship (-.70 is stronger than +.30)

New cards

P-Value

Tells us the chance that we will be wrong if we conclude that there is a relationship between two variables in a population.

P = .04 means there is a 4% chance that we will make an error

p < (or equal to) .05 is considered statistically significant

New cards

Homoscedasticity

All data points fall within a elliptical shape; range of values on Y are same for each value of X

New cards

Heteroscedasticity

Shape of data points deviates from ellipse and is fan-shaped; range of values on Y are not the same for each value of X

New cards

Restricted Range

Reduces the magnitude of the calculated r - means you sampled a very small piece of the distribution such that the scores obtained on one or both of the variables was much narrower

Common reason why population correlation coefficients can be underestimated by sample r's

New cards

Spearman rho

Correlation between ranks - both variables are Ordinal

E.g., Class rank in junior year and class rank in senior year

New cards

Phi

The correlation coefficient when both of the variables are measured as true dichotomies - both variables are nominal with 2 categories

E.g., Scores on two binary items

New cards

Tetrachoric

Correlation between two artificially dichotomous variables (no natural way of distinguishing between variables, you just want to see if someone passes or fails and you decided what that meant)

New cards

Contingency Coefficient

Correlation coefficient for nominal data with 1 or 2 categories

New cards

Point Biserial

The correlation coefficient used when one variable is a true dichotomy (nominal) and the other is continuous (interval or ratio)

E.g., Score on a binary item (0,1) and the total score on the test

New cards

Biserial

The correlation coefficient used when one variable is an artificial dichotomy and the other is continuous (interval or ratio)

E.g., scores on an anxiety test and classification of high/low based on anxiety scores

New cards

Eta (Curvilinear)

The correlation used when both variables are interval or ratio but not linear For example: The relationship between Extraversion and Sales Performance is curvilinear and the scatterplot resembles an inverted U, this means that **moderately extraverted people have better sales performance than highly extraverted people.

New cards

Coefficient of Determination

Obtained by squaring the correlation coefficient (r^2)

Interpreted as the percentage of variance in one variable that is predicted (explained by or shared with) the other variable

Example:

If r = .7 between IQ and reading test scores

The coefficient of determination (r^2) = .49 = 49%

So, 49% of the variance in reading test scores is predictable (or explained by) IQ scores and 51% of the variance in reading test scores is due to other factors

New cards

CORRELATION DOES NOT IMPLY CAUSALITY

A strong correlation between X and Y could mean:

1. X causes Y

2. Y causes X

3. A third, unmeasured variable influences both X and Y

New cards

Factor Analysis

A statistical procedure that identifies clusters of related items (called factors) on a test; used to identify underlying dimensions or constructs that can account for the pattern of correlations among variables

Reduces the number of variables we have to work with

New cards

Correlation Matrix

A table showing the relationships among discrete measures

Entries in diagonal are +1.00

Section above the +1.00 is a mirror image of the section below

New cards

Steps in a Factor Analysis

1. Deciding on the number of factors = number of variables

2. Extracting the factors

3. Examining the factor loadings

4. Performing a rotation

5. Examining the rotated factor loadings

6. Interpreting and naming the factors

New cards

Eigenvalue

Amount of variance associated with each factor

New cards

Scree Plot

A graphical representation of Eigenvalues (vertical axis) vs. the number of factors (horizontal axis)

Locate the place where there is a large drop - number of factors to extract is at the top of the drop

New cards

Factor Loading

Correlation of each of the original variables with each factor (that were extracted)

New cards

Orthogonal Factors

Factors that are not correlated with each other

New cards

Oblique Factors

Factors that are correlated with each other

New cards

Second-Order Factors

Factor analysis done on Oblique Factors (correlated factors) to further minimize number of factors since they are similar

New cards

Thurstone's Criteria

1. Eliminate negative factor loadings

2. Each variable has a high loading on only one factor

New cards

Multidimensional Test

Items groups into two or more separate factors

Test is measuring more than one construct

New cards

Unidimensional Test

All items load on a single factor

Test is measuring a single construct

New cards

Common Variance

Percentage of total variance in a test (1.00) that is shared with the other tests in the battery - what the test has in common with other tests included in the factor analysis

CV = Sum of the squared factor loadings for that test

Loading of a test on Factor I is .9 and loading on Factor II is -.1

CV = (.9)^2 + (-.1)^2 = .81 + .01 = .82

Means 82% of the variance of this test is shared with the other tests that were included in the factor analysis

New cards

Error Variance

Percentage of total variance in test that is due to random measurement error

Subtract the reliability of the test from 1.00

So if reliability is .9, the EV = .10

New cards

Specific Variance

Percentage of total variance in a test that is unique to that test and not shared with the other tests in a battery

First calculate CV and EV

Add the CV and EV together

Subtract the total from above from 1.00 to get SV

New cards

Total Variance

Common Variance + Specific Variance + Error Variance

New cards

Reliability

Is the test measuring anything other than error?

Reliability ensured we are measuring something meaningful - error is always present but is there more there?

The proportion of the variance in observed scores that is due to true differences among the test-takers on the trait being measured (Classical Reliability Theory)

New cards

Error

Occurs when measurement of a construct is confounded by factors that are not relevant to the construct we want to measure

New cards

Systematic Error

A "mistake" that can be corrected or eliminated

-Mistakenly keyed correct answers

-Discoverable and correctable

-Lack of familiarity with scoring criteria

New cards

Random Error

-Is impossible to eliminate and measure

-Inherent in any measurement attempt

-Is not correlated with the obtained scores (aka "random")

New cards

Obtained Score

Score we obtain whenever we administer a test - it is obtained or measured

-The person's measured standing on the trait we are interested in measuring

Might not be the same as the true score due to error

New cards

True Score

What the person's score would be if there were no measurement error - might be the same as the Observed Score - can never be measured precisely

The person's actual standing on the trait or the actual amount of the trait they possess

A theoretical construct that we cannot truly measure due to error

New cards

For very large samples, the Mean of the Observed Scores...

is equal to the mean of the True Scores.

New cards

For very large samples, the Variance of the Observed Scores...

is greater than the Variance of the True Scores.

New cards

Random measurement error is...

uncorrelated with True Scores.

New cards

Classical Reliability Theory

An individual's observed score can be partitioned into a true part & an error part. It's never going to be 100% true.

If XO is the observed score and XT is the true score, and e is the amount of error:

XO = XT + e

*Error cannot really be measured and error scores are random and uncorrelated with observed and true scores

Due to Error, Reliability (rxx) =

Variance (XT)/Variance (XO)

New cards

Error Variance

Proportion of variance in observed scores that is due to error

= Variance (e)/Variance (XO)

Reliability versus error variance will always equal 1.00

New cards

Sources of Error

Time Sampling

Item Sampling

Internal Consistency

Inter-rater Reliability

New cards

Time Sampling Error

Error associated with "when" the test is given

New cards

Test/Retest Method

*To account for time sampling error

The same exact test is re-administered to the same exact group of examinees at a later date

A correlation coefficient is calculated on the two sets of scores - Test-Retest Reliability Coefficient (rtt)(AKA the test's stability)

zR (predicted retest score expressed as z-score)

zO (original score)

rtt (test-retest reliability coefficient)

zR = rtt X zO

New cards

The reliability of a test using the test/retest method is BLANK if the interval is too short...

Overestimated

-Similar conditions affecting both test and retest

-Memory for original responses

New cards

The reliability of a test using the test/retest method is BLANK is the interval is too long...

Underestimated

-Real changes in the trait being measured

New cards

Practice Effect

Second administration is not equivalent to first administration due to previous exposure to items

If we assume that practice leads to better scores on retest, then practice will decrease the correlation between test and retest and lead to an under-estimate of the test/retest correlation and an over-estimate of time sampling error.

**On a test where there is a substantial practice effect (such as Block Design), the test-retest method will OVERESTIMATE the amount of time sampling error.

New cards

Regression Towards the Mean

The tendency for extreme or unusual scores to fall back (regress) toward their average - seen in test/retest method because the TRT coefficient will always be less than +1.00 (due to time sampling error), so retest scores are predicted to be closer to the mean than were the original scores

*Scores above the mean are predicted to decrease on retest

*Scores below the mean are predicted to increase on retest

*The lower the TRT reliability, the greater the regression on retest

New cards

Item Sampling Error

Error due to which items are selected due to not being certain that items are sampled randomly

New cards

Alternate Form

Accounts for Item Sampling Error

Two parallel (alternate) forms of the same test are constructed

-Each one is the same length

-Equivalent but not identical items on each

The forms are given to the same sample of examinees on the same day - order is counterbalanced

Correlation between the scores is known as Alternate Form Reliability

New cards

Split-Half Method

Solution to problems identified in Alternate Form Method

Checking the reliability by comparing the odd answers of one group by the even answers of another group

Reliability is related to test length, so all other things being equal, longer tests automatically have high reliability than shorter tests

This method will underestimate the Alternate Form Reliability and overestimate the amount of item sampling error

New cards

Spearman-Brown Formula

Solution to Split-Half Method

Enables us to predict what the Alternate Form Reliability would be from the Split Half Reliability

rtt (alternate form reliability) = 2rhh (split half reliability)/ 1 + rhh

Allows us to estimate what the reliability of the test would be if we added items, deleted items, and how many items would have to be added in order to achieve the desired reliability

New cards

Internal Consistency

Whether the items are all measuring the trait of interest

A group of items is homogenous or internally consistent when all the items are measuring the same construct equally well

Helps to ensure total scores on scales have the same meaning

New cards

Inter-Item Correlation

Assesses internal consistency

The correlation between each pair of items on the scale is calculated and the mean of these correlations is a measure of the scale's internal consistency - examine the extent to which scores on one item are related to scores on all other items in a scale

If there are n items on the test, then there are n(n-1)/2 unique correlations - these are phi coefficients (nominal x nominal)

New cards

Item-Total Correlation

Assesses internal consistency

Correlation between the item score (0,1) and the total score on the test

If there are n items, there will be n item-total correlations - this is the point biserial (nominal x interval/ratio)

The mean of these n item-total correlations is a measure of the scale's internal consistency

100

New cards

Kuder-Richardson Formula

Used to calculate internal consistency when items are dichotomous or nominal (yes/no, true/false)