Chapter 8: Testing and Individual Differences

When we say that a test is standardized, we mean that the test items have been piloted on a similar population of people as those who are meant to take the test and that achievement norms have been established.
Such a group of people is known as the standardization sample.
The psychometricians (people who make tests) at ETS use the performance of the standardization sample on the experimental sections to choose items for future tests.

Reliability refers to the repeatability or consistency of the test as a means of measurement.
- The reliability of a test can be measured in several different ways.
Split-half reliability involves randomly dividing a test into two different sections and then correlating people’s performances on the two halves.
The correlation between performance on the different forms of the test is known as equivalent-form reliability.
Test-retest reliability refers to the correlation between a person’s score on one administration of the test with the same person’s score on a subsequent administration of the test.
A test is valid when it measures what it is supposed to measure.
Validity is often referred to as the accuracy of a test.
Face validity refers to a superficial measure of accuracy.
Face validity is a type of content validity.
Content validity refers to how well a measure reflects the entire range of material it is supposed to be testing.
Concurrent validity measures how much of a characteristic a person has now; is a person a good chef now?
Predictive validity is a measure of future performance; does a person have the qualities that would enable him or her to become a good chef?
Construct validity is thought to be the most meaningful kind of validity.

Aptitude tests measure ability or potential, while achievement tests measure what one has learned or accomplished.
Speed tests generally consist of a large number of questions asked in a short amount of time.
- The goal of a speed test is to see how quickly a person can solve problems.
Power tests consist of items of increasing difficulty levels.
Some tests are group tests while others are individual tests.
Group tests are administered to a large number of people at a time.

Intelligence is a commonly used term, it is an extremely difficult concept to define.
- Typically, intelligence is defined as the ability to gather and use information in productive ways.
Fluid intelligence refers to our ability to solve abstract problems and pick up new information and skills, while crystallized intelligence involves using knowledge accumulated over time.

Charles Spearman argued that intelligence could be expressed by a single factor.
He used factor analysis, a statistical technique that measures the correlations between different items, to conclude that underlying the many different specific abilities s that people regard as types of intelligence is a single factor that he named g for general.

Howard Gardner also subscribes to the idea of multiple intelligences.
Unlike many other researchers, however, the kinds of intelligences that this contemporary researcher has named thus far encompass a large range of human behavior.

Recently there has been a lot of discussion of EQ, which is also known as emotional intelligence.
One of the main proponents of EQ is Daniel Goleman.
EQ roughly corresponds to Gardner’s notions of interpersonal and intrapersonal intelligence.

Robert Sternberg is another contemporary researcher who has offered a somewhat nontraditional definition of intelligence.
Sternberg’s triarchic theory holds that three types of intelligence exist.

Alfred Binet was a Frenchman who wanted to design a test that would identify which children needed special attention in schools.
Binet came up with the concept of mental age, an idea that presupposes that intelligence increases as one gets older.
Louis Terman, a Stanford professor, used this system to create the measure we know as IQ and the test known as the Stanford-Binet IQ test.
IQ stands for intelligence quotient.
David Wechsler used a different way to measure intelligence.
The Wechsler Adult Intelligence Scale (WAIS) is used in testing adults, the Wechsler Intelligence Scale for Children (WISC) is given to children between the ages of 6 and 16, and the Wechsler Preschool and Primary Scale of Intelligence (WPPSI) can be administered to children as young as 4.
The Wechsler tests yield IQ scores based on what is known as deviation IQ.

Much discussion has centered on whether widely used IQ tests and the SAT are biased against certain groups.
Researchers seem to agree that although different races and genders may score differently on these tests, the tests have the same predictive validity for all groups.
SAT scores are equally good predictors of college grades for different genders and for different racial groups and thus, in a sense, the test is clearly not biased.

An important term that researchers use in discussing the effects of nature and nurture is heritability.
Heritability is a measure of how much of a trait’s variation is explained by genetic factors.
Performance on intelligence tests has been increasing steadily throughout the century, a finding known as the Flynn effect.
Research on identical twins separated at birth has found strong correlations in intelligence scores.
Psychologists agree that racial differences in IQ scores are explained by differences in environment.