Chapter 1–3 Notes: Foundations, Importance, and Ethical Considerations in Psychological Testing
Chapter organization and scope
- Chapter 1: What are psychological tests?
- Defines a psychological test and introduces tests you might not recognize as tests (e.g., driving tests, self-scored magazine quizzes).
- Covers brief history, three defining characteristics of tests, assumptions underlying testing, and common classifications.
- Ends with resources (print and online) for information about tests.
- Chapter 2: Why is psychological testing important?
- Discusses how test results inform comparative and absolute decisions in education, clinical, and organizational settings.
- Addresses controversies: intelligence tests, aptitude tests, integrity tests, and high-stakes testing in education.
- Chapter 3: Ethical responsibilities of publishers, users, and takers?
- Introduces ethics and professional standards and summarizes responsibilities for publishers, users, and takers, including testing special populations.
Learning objectives for Chapter 1
- Explain why you should care about psychological testing.
- Define what a psychological test is, including similarities and differences across tests.
- Trace the history from Binet and intelligence testing to modern tests.
- Describe the three defining characteristics of all psychological tests and the degree to which they can be demonstrated.
- Describe essential assumptions when using psychological tests.
- Describe major ways to classify psychological tests.
- Distinguish four commonly confused terms: psychological assessment, psychological test, psychological measurement, and survey.
- Locate print/online resources for information about psychological tests.
What is a psychological test? (central definition and scope)
- Examples include: intelligence tests (WAIS, Stanford-Binet), personality tests (MMPI, MBTI), achievement and aptitude tests, interest inventories, standardized college entrance exams, driving tests, structured interviews, assessment centers, and self-scored magazine tests.
- Even informal self-tests in magazines qualify as psychological tests under broad definitions.
- Core idea: a test is a procedure/instrument/device that measures samples of behavior to infer unobservable constructs or predict outcomes.
Three defining characteristics (what good tests share)
1) Representative sampling of behavior- Tests sample behaviors that are indicative of the construct of interest (e.g., physical ability via a range of tasks rather than a single activity).
- Example: to measure physical ability, sample a mix of individual and team sports to cover various strength, endurance, and precision skills.
2) Standardized conditions for administration - All test takers should experience the test under the same conditions to ensure comparable results.
- Environment, examiner behavior, examinee health/fatigue, and test wording can affect scores; standardization aims to minimize these effects across test takers.
3) Rules for scoring - Clear, consistent scoring rules ensure that responses are scored the same way by different examiners.
- Examples: fixed points for correct answers, or specific scoring rubrics for essay responses.
- Note: Not all tests demonstrate these characteristics to the same degree; some may be more representative, more standardized, or have stricter scoring rules than others.
Assumptions of psychological tests (critical underlying ideas)
1) Test validity: tests measure what they purport to measure or predict what they are intended to predict.- If a test is designed to measure mechanical ability, it should measure that construct; conclusions drawn from scores should be supported by validity evidence.
2) Test-retest reliability: test scores are stable over time for relatively stable traits. - If the trait is stable, scores should be similar when the test is administered again after some time (e.g., two weeks later).
3) Consistent interpretation of items: test-takers interpret items similarly (e.g., true/false statements).
4) Self-report accuracy: individuals can accurately report thoughts/feelings; memories are reliable; respondents can assess and report accurately.
5) Honest reporting: individuals may respond in socially desirable ways; validity checks may be needed to detect distorted responding.
6) Score error: observed scores = true score ± error; error may come from test itself, examiner, examinee, or environment. - Implication: tests are designed to minimize error, and users should consider the presence of error when interpreting scores.
- If a test is designed to measure mechanical ability, it should measure that construct; conclusions drawn from scores should be supported by validity evidence.
Classification of tests: common methods and distinctions
- Based on the behavior required (what the test-taker does):
- Maximal performance tests: require performing a well-defined task to the best of ability (e.g., WAIS-IV tasks, driving test tasks, GATB, classroom tests).
- Behavior observation tests: assess behavior in context without a single defined task (e.g., performance appraisals, naturalistic observations, some employment simulations).
- Self-report tests: assess beliefs, feelings, attitudes, or mental states via responses to questions (e.g., Hogan Personality Inventory, MBTI).
- Some tests blend these features (e.g., structured interviews with both technical questions and belief/opinion questions; observer-recorded behaviors).
- Standardization vs nonstandardization
- Standardized tests: designed to measure a specific construct, administered to a large standardization sample, with norms and explicit administration/scoring procedures.
- Nonstandardized tests: little or no standardization, often teacher-created or informal assessments for a single administration.
- Objective vs projective
- Objective tests: fixed responses (true/false, multiple-choice, Likert scales) with clear scoring criteria (e.g., GRE, GATB, WAIS, NEO-PI-3).
- Projective tests: unstructured/ambiguous stimuli; scoring involves subjective interpretation (e.g., Rorschach, TAT).
- Dimensional approach: tests are often categorized by what they measure (dimensions such as achievement, aptitude, intelligence, personality, interests, etc.).
- Practical example: NEO Personality Inventory samples 240 items across five personality dimensions; the test uses an objective self-report format with a 5-point response scale.
Distinguishing key terms: psychological assessment, psychological test, psychological measurement, surveys
- Psychological assessment vs psychological test
- Assessment is a broader process using multiple information-gathering methods (interviews, observations, tests).
- A psychological test is one tool within the assessment process.
- Example: a clinical assessment might include interviews, family/caregiver reports, observations, and a MMPI.
- Psychological test vs measurement
- Measurement is the broader process of assigning numerical values to attributes according to rules.
- A psychological test is a measurement instrument when its results are expressed as a score.
- Psychological tests vs surveys
- Tests focus on individual differences and typically yield a single overall score.
- Surveys focus on group-level information, often reporting at the item/question level (percent responses).
- Some surveys use scales that approximate tests, but the primary use often differs (individual vs group outcomes).
Locating information about tests (resources and process)
- Print resource books (highly recommended starting points):
- Tests in Print (TIP): descriptive listings of commercially published tests, including basic attributes (title, author, publisher, population, administration time, cost, form availability, etc.). It serves as an index to MMY and provides cross-references to reviews.
- Mental Measurements Yearbook (MMY): descriptive information and reviews of newly developed or revised tests; includes reliability/validity information, norms, and cross-references.
- Tests: descriptive entries for tests used by psychologists, educators and HR professionals; includes population, purpose, major features, admin time, cost, and availability.
- Test Critiques: reviews of frequently used tests with reliability/validity and test construction information.
- Personality Test and Reviews: bibliography and descriptive information for personality tests.
- How to use TIP and MMY (Buros Center guidance): descriptive entries, reliability/validity info, cross-references to reviews, and how-to resources for evaluating tests.
- On the Web resources (examples):
- American Psychological Association (APA): guidance on finding information about published tests and unpublished measures; contact publishers; use of TIP and MMY; Test Reviews Online.
- Buros Center for Testing: resources, code of fair testing practices, access to TIP and MMY, and guidance for test evaluation.
- Test Collection at ETS: largest database of tests (20,000+ tests, published and unpublished); includes descriptions, authors, publication date, population, and uses; helps order tests.
- O*NET Resource Center: occupational information and testing/assessment guides (e.g., Testing and Assessment: A Guide to Good Practices for Workforce Investment Professionals; Tests and Other Assessments: Helping You Make Better Career Decisions; Employer’s Guide to Good Practices).
- PsycTESTS: database of thousands of measures and instruments originally developed but not commercially available; downloadable with psychometric data.
- PsycINFO and HaPI: bibliographic databases indexing tests and measures and unpublished instruments.
- Unpublished tests resources:
- Directory of Unpublished Experimental Mental Measures (Goldman & Mitchell)
- Measures for Psychological Assessment: A Guide to 3,000 Original Sources (Chun, Cobb & French)
- Tests in Microfiche (ETS): archive of out-of-date or unpublished tests
In the News Box 1.1: Combating Terrorism (illustrative ethical considerations)
- Highlights that intelligence and personality tests are two of many tests; current debates surround how tests could be used in forensics, law enforcement, and terrorism prevention.
- Example: a 2017 BBC News report on a study comparing terrorists and non-terrorists using moral cognition tasks; findings suggested terrorists may focus on outcomes rather than intentions, which could inform forensic psychology profiles.
- Emphasizes the need for careful interpretation, further validation, and consideration of variability across different terrorist groups and contexts.
History of psychological testing (high-level timeline highlights from the text)
- Ancient China (circa 2200 BCE): Xia Dynasty royal examinations; later dynasties (Tang, Ming) expanded examination use for civil service; some evidence suggests limited use of results, with aristocracy retaining many positions.
- 1791 France and 1833 Britain: early standardized examination systems influencing civil service selection (e.g., Indian civil service in Britain).
- 1860s-1883 US: Pendleton Civil Service Act (1883) established merit-based hiring via competitive exams.
- Charles Darwin (Origin of Species, 1859) and Sir Francis Galton: ideas about individual differences; Galton introduced “mental tests” and measured motor/sensory functioning.
- Late 19th century: Wilhelm Wundt establishes the psychological laboratory; James McKeen Cattell and others expand research on individual differences; emphasis on formal testing to address social problems (remedial placement, battle readiness, hiring).
- Early 20th century: Théodore Simon and Alfred Binet develop intelligence testing for children (Binet-Simon Scale, 1905) to differentiate normal vs. intellectually subnormal performance; standardization sample of 50 children used as a frame of reference; introduced concept of mental age (MA) and chronological age (CA).
- 1916: Lewis Terman adapts Binet-Simon as Stanford-Binet; introduces Intelligence Quotient (IQ) index and broad age applicability.
- 1930s: Wechsler-Bellevue Scale (WBIS) for adults; later WAIS editions (WAIS-II in 1946, WAIS-R in 1981, WAIS-III in 1997, WAIS-IV in 2008) reflecting evolving theory and clinical utility.
- 1930s-1940s: Rorschach and TAT projective tests emerge to explore unconscious processes; Woodworth’s PDS and Woodworth Psychoneurotic Inventory play early roles in personality assessment.
- 1940s-1950s: development of vocational tests (GATB in 1947) to predict job success; ethical standards for psychology (APA, 1953) established to protect rights of test takers.
- 21st century: rapid growth of testing in education and workplaces; No Child Left Behind Act (NCLB, 2001) pushed for nationwide standardized testing; subsequent policy changes with Every Student Succeeds Act (2015).
- Industry scale: massive growth in test publishing, standardized testing market; significant use of tests in recruitment, education, and certification.
Tests by subject and dimension (overview of common classifications)
- Subject categories (per Mental Measurements Yearbook): 18 major categories including Achievement, Behavior assessment, Developmental, Education, Intelligence and general aptitude, Personality, Neuropsychological, Vocations, etc.
- Dimensions often labeled as: achievement, aptitude, intelligence, personality, interests, etc.
- Example highlights:
- Achievement tests: measure prior learning in a specific domain (e.g., a German language test, psychology exam). Used to compare knowledge over time, assign grades, identify needs, and measure progress.
- Aptitude tests: assess potential for learning or performance in new tasks; used for predicting success in specific domains and guiding career decisions.
- Intelligence tests: broad cognitive ability measures; used to screen for gifted or intellectually challenged programs; often used in educational/clinical settings.
- Interest inventories: guide career decisions and educational planning; not designed to predict success but to frame possibilities.
- Personality tests: measure characteristic patterns of thoughts and behavior; can be objective (e.g., MBTI, NEO-PI-3) or projective (e.g., TAT).
- NE0 Personality Inventory (example): objective self-report measuring Five-Factor Model dimensions: Neuroticism, Extroversion, Openness, Agreeableness, Conscientiousness; sample items illustrate response options (SD, D, N, A, SA).
- Specifics about tests: Wechsler scales emphasize verbal, performance, and processing indices; TAT/Rorschach emphasize qualitative story/interpretation approaches.
Psychological assessment, tests, measurements, and surveys (terminology clarification)
- Psychological assessment: broader evaluative process using multiple methods (interviews, observations, tests) to understand and predict behavior. Tests are one component.
- Psychological test: a specific measurement instrument used to quantify or infer attributes; can be considered a type of measurement when results are scores.
- Measurement: broader process of assigning numerical values to attributes using rules; involves development of scoring systems and norms.
- Surveys: collect information about groups; typically report at the item/question level and focus on group outcomes rather than individual scores.
- Important distinctions to remember:
- Tests focus on individual differences and typically yield a single overall score.
- Surveys focus on group differences and are often reported as frequencies/percentages per response.
- Some surveys can function like tests if they produce scaled or composite scores.
Locating information about tests (practical approach)
- Start with TIP and MMY for descriptive entries and reviews.
- Use the Classified Subject Index and Index of Names to locate tests, authors, reviewers, and cross-references.
- On the Web: APA, Buros Center for Testing, ETS Test Collection, O*NET, PsycTESTS, PsycINFO, HaPI for unpublished measures.
Practical examples and resources mentioned in the text
- WAIS-IV, GRE General Test, SAT, GMAT, MBTI, GATB, Minnesota Multiphasic Personality Inventory-2 (MMPI-2).
- Unpublished and noncommercial tests available via PsycTESTS, HaPI, and the Directory of Unpublished Experimental Mental Measures.
- Common Core State Standards (2009-2014) as another example of accountability through testing in education.
In-the-Story/illustrative boxes and boxes that frame ethical and societal issues
- In the News Box 1.1: Highlights ongoing debates about the use and interpretation of tests in high-stakes contexts (e.g., terrorism profiling, forensics).
- For Your Information Boxes (Examples):
- 1.1: Historical overview from ancient China to the 20th century, including development of Binet-Simon, Stanford-Binet, WBIS/WAIS, Rorschach, TAT, GATB, and ethics (APA, 1953).
- 1.2: Sample items from the NEO Personality Inventory (demonstrates Likert-type self-report scales across five dimensions).
- 1.3: Overview of resource books (TIP, MMY, Tests, Test Critiques, Personality Test and Reviews, etc.) with purposes and contents described.
- 1.4: Locating unpublished psychological tests (print/nonprint resources).
Appendix/summary features to guide study and evaluation
- Chapter Summary emphasizes: psychological testing goes beyond intelligence and personality tests; tests are measurement tools requiring behavior; three defining characteristics; necessary assumptions; multiple classifications; resources for information; and the practical reach of tests in society.
- Key concepts list (bolded terms in text) includes: achievement tests, aptitude tests, behavior, inference, intelligence tests, norming, standardization, standardized tests, self-report tests, reliability, validity, etc.
- Critical thinking prompts encourage comparing two tests measuring the same construct, discussing test history, evaluating test quality (WAIS-IV), and exploring implications of poor testing.
Notable formulas and numerical references mentioned
- Important numerical example in the history of intelligence testing: Binet-Simon Scale originally used 30 items to measure mental ability.
- Standardization sample concept explained via MA/CA and mental age concepts (e.g., mental age compared to chronological age to interpret scores).
- IQ concept introduced with the historical development of the IQ index (Stanford-Binet) – commonly associated with ratio-based IQ in early scales, later transformed into standardized scoring systems (not explicitly shown in the excerpt).
- Conceptual formula (illustrative, standard in the field) for IQ: IQ = rac{MA}{CA} imes 100 where MA = mental age and CA = chronological age. This reflects the historical basis for IQ indexing introduced with the Binet-Simon and later adaptations.
- Examples of test items and formats (from NE0 sample items, WAIS-IV tasks, TAT prompts, etc.) illustrate the variety of content and scoring rules used in different test types.
Practical implications and takeaways
- The value of high-quality tests for making informed decisions in education, employment, and clinical settings.
- The risk of low-quality tests leading to poor decisions (e.g., hiring, placement, medical decisions).
- The importance of understanding test characteristics (representativeness, standardization, scoring) and assumptions to critique tests appropriately.
- The role of ethical standards and professional guidelines (as elaborated in Chapter 3) in publishers, users, and test-takers’ responsibilities.
Quick reference checklist for evaluating a psychological test (from Table 1.1 and associated discussion)
- General descriptive information: title, author, publisher, administration time, cost, proprietary status.
- Purpose and nature: what it measures/predicts, behavior required, target population, test nature (maximal vs observation vs self-report), format.
- Practical evaluation: existence and quality of the test manual, instructions clarity, administrator qualifications, face validity.
- Technical evaluation: norm groups, types of norms, subgroup norms, evidence of reliability, validity, standard error of measurement, confidence intervals.
- Test reviews: strengths/weaknesses from peer reviews and published studies using the test.
- Summary: overall strengths and weaknesses and fit for intended use.
Summary takeaways for study and exam preparation
- Psychological tests are structured instruments designed to sample behavior under standardized conditions and score according to fixed rules to infer psychological attributes or predict outcomes.
- Tests differ in what they measure, how they are administered, how they are scored, and their psychometric quality; there is no single universal definition, but common core features exist.
- Understanding the differences among assessment, testing, measurement, and surveys is essential for correctly interpreting results and making informed decisions.
- Historical context (from Binet to WAIS to modern tests) anchors the understanding of how tests were developed, standardized, and used in society.
- A wide range of resources (TIP, MMY, online databases, and unpublished measures) exist to locate information about tests and aid in selecting appropriate measures for practice or research.
Note on format for exam-ready notes
- Use the three defining characteristics, assumptions, and classification methods as core framework for any essay or short-answer questions.
- Be ready to discuss ethical implications of testing, especially in high-stakes or sensitive contexts, drawing on In the News Box 1.1 and ethical standards alluded to for Chapter 3.
- For questions comparing two tests, focus on the test type (maximal vs self-report vs observation), standardization, scoring rules, and psychometric quality (reliability/validity evidence).
Terms to memorize (from Key Concepts section)
- achievement tests, aptitude tests, behavior, behavior observation tests, emotional intelligence, inference, intelligence tests, interest inventories, measurement, measurement instrument, nonstandardized tests, norms, objective tests, personality tests, projective tests, psychological assessments, psychological construct, psychological test, psychometrics, self-report tests, standardization sample, standardized tests, surveys, tests of maximal performance, vocational tests
Short glossary-style recall prompts (to study)
- What is the difference between a psychological test and a survey?
- What are the three defining characteristics of a psychological test?
- What are the main differences among maximal performance, behavior observation, and self-report tests?
- How do norms and standardization enhance the interpretation of test scores?
- What is the role of reliability and validity in judging a test’s quality?
End-of-chapter connections
- Chapter 1 lays the foundation for understanding tests, their history, and how to critique them; Chapter 2 expands on the importance and application of testing in society and addresses common controversies; Chapter 3 discusses ethical responsibilities of publishers, users, and takers, particularly in sensitive populations and professional practice.