Here is a 50-item multiple-choice exam based on the provided text about validity, with four answer choices for each question and the correct answers indicated.
**VALIDITY EXAM**
**Instructions:** Choose the best answer for each question.
1. In everyday language, when is something considered valid?
a) If it is popular or widely accepted.
b) If it is **sound, meaningful, or well grounded on principles or evidence**.
c) If it is new or innovative.
d) If it is easy to understand.
2. In the context of psychological assessment, validity is related to the:
a) Length of the test.
b) Difficulty of the test items.
c) **Meaningfulness of a test score**.
d) Format of the test.
3. Validity in psychological assessment specifically refers to:
a) How easy the test is to administer.
b) How quickly the test can be scored.
c) A judgment based on evidence about the **appropriateness of inferences drawn from test scores**.
d) The consistency of test scores over time.
4. The process of gathering and evaluating evidence about validity is called:
a) Reliability assessment.
b) **Validation**.
c) Standardization.
d) Norming.
5. Who is primarily responsible for supplying validity evidence in the test manual?
a) The test user.
b) The test administrator.
c) **The test developer**.
d) The educational institution.
6. When is it absolutely necessary for a test user to conduct local validation studies?
a) When the test is administered to a large group.
b) When the test is scored electronically.
c) When the test user plans to **alter the format, instructions, language, or content of the test**.
d) When the test has high face validity.
7. Which form of validity is considered the simplest and most superficial?
a) Content validity.
b) Criterion-related validity.
c) Construct validity.
d) **Face validity**.
8. Face validity primarily relates to:
a) What the test truly measures.
b) The statistical properties of the test.
c) **What a test appears to measure to the person being tested**.
d) The relationship between test scores and a criterion.
9. If a test appears to measure what it purports to measure "on the face of it," it is said to have:
a) High content validity.
b) High criterion-related validity.
c) High construct validity.
d) **High face validity**.
10. Which type of validity describes a judgment of how adequately a test samples behavior representative of the universe of behavior the test was designed to sample?
a) Face validity.
b) **Content validity**.
c) Criterion-related validity.
d) Construct validity.
11. What is another term for the content of a test when discussing content validity?
a) Criterion.
b) Construct.
c) **Test items**.
d) Face.
12. A detailed definition and characteristics of the construct being studied are typically found in a:
a) Test manual's reliability section.
b) **Test blueprint**.
c) Validation study report.
d) Normative data table.
13. The content validity of a test can be affected by:
a) The length of the test administration time.
b) The number of test-takers.
c) **Culture**.
d) The reliability coefficient.
14. Which type of validity is a judgment of how adequately a test score can be used to infer an individual’s probable standing on some measure of interest?
a) Content validity.
b) Face validity.
c) **Criterion-related validity**.
d) Construct validity.
15. The standard against which a test or a test score is evaluated is called the:
a) Predictor.
b) **Criterion**.
c) Construct.
d) Variable.
16. If test scores and criterion measures are obtained at about the same time, the validity evidence obtained is referred to as:
a) Predictive validity.
b) **Concurrent validity**.
c) Content validity.
d) Face validity.
17. Concurrent validity helps answer the question:
a) Does the test predict future outcomes?
b) **Does the test match current performance?**
c) Does the test adequately sample the content domain?
d) Does the test appear to measure what it intends to measure?
18. Predictive validity measures the relationship between test scores and a criterion measure obtained at a:
a) Previous time.
b) **Future time**.
c) Concurrent time.
d) Random time.
19. Predictive validity helps answer the question:
a) Does the test match current performance?
b) Does the test adequately sample the content domain?
c) Does the test appear to measure what it intends to measure?
d) **Does the test predict future outcomes?**
20. The extent to which a particular trait exists in the population is known as the:
a) Hit rate.
b) Miss rate.
c) **Base rate**.
d) Validity coefficient.
21. The proportion of people a test accurately identifies as possessing a particular trait is the:
a) Miss rate.
b) Base rate.
c) **Hit rate**.
d) False positive rate.
22. The proportion of people the test fails to identify as having or not having a particular characteristic is the:
a) Hit rate.
b) Base rate.
c) **Miss rate**.
d) True positive rate.
23. In the example provided, what is the base rate of depression in the population?
a) 95,000 out of 1,000,000.
b) 5,000 out of 1,000,000.
c) **100,000 out of 1,000,000**.
d) 95%.
24. In the example, what is the hit rate of the depression scale?
a) 5%.
b) 10%.
c) **95%**.
d) 100%.
25. In the example, what is the miss rate of the depression scale?
a) 95%.
b) 10%.
c) 0%.
d) **5%**.
26. A false positive occurs when the test predicts the presence of a characteristic, but the test-taker:
a) Actually possesses the characteristic.
b) Has not been tested.
c) **Does not possess the characteristic**.
d) Has an unknown status.
27. Diagnosing someone with depression when they are actually normal is an example of a:
a) False negative.
b) True positive.
c) **False positive**.
d) True negative.
28. A false negative occurs when the test predicts the absence of a characteristic, but the test-taker:
a) Does not possess the characteristic.
b) Has not been tested.
c) Has an unknown status.
d) **Actually possesses the characteristic**.
29. Failing to diagnose someone who has depression is an example of a:
a) False positive.
b) True positive.
c) True negative.
d) **False negative**.
30. The correlation coefficient that provides a measure of the relationship between test scores and scores on the criterion measure is the:
a) Reliability coefficient.
b) Standard error of measurement.
c) **Validity coefficient**.
d) Coefficient of determination.
31. The degree to which an additional predictor explains something about the criterion measure that is not explained by predictors already in use is called:
a) Concurrent validity.
b) Predictive validity.
c) **Incremental validity**.
d) Content validity.
32. If a personality test helps predict job success even better than an IQ test already in use, the personality test demonstrates:
a) Concurrent validity.
b) Predictive validity.
c) **Incremental validity**.
d) Content validity.
33. Which type of validity is a judgment about the appropriateness of inferences drawn from test scores regarding individual standings on a variable called a construct?
a) Criterion-related validity.
b) Face validity.
c) Content validity.
d) **Construct validity**.
34. A(n) ________ is an informed, hypothesized, scientific idea developed to describe or explain behavior.
a) Criterion.
b) **Construct**.
c) Variable.
d) Attribute.
35. How uniform a test is in measuring a single concept refers to the evidence of:
a) Changes with age.
b) Pretest-posttest changes.
c) **Homogeneity**.
d) Distinct groups.
36. If scores on a test vary in a predictable way as a function of membership in some group, this provides evidence from:
a) Convergent evidence.
b) Discriminant evidence.
c) Pretest-posttest changes.
d) **Distinct groups**.
37. Scores on a test undergoing construct validation tending to correlate highly with scores on established tests measuring the same construct is known as:
a) Discriminant evidence.
b) Evidence of homogeneity.
c) **Convergent evidence**.
d) Evidence of changes with age.
38. A validity coefficient showing little relationship between test scores and variables with which they should not theoretically be correlated provides:
a) Convergent evidence.
b) Evidence of homogeneity.
c) Evidence from distinct groups.
d) **Discriminant evidence**.
39. If a newly developed Spirituality scale does not correlate with an established religiosity scale, this is an example of:
a) Convergent evidence.
b) Evidence of homogeneity.
c) Evidence from distinct groups.
d) **Discriminant evidence**.
40. Factor analysis is a class of mathematical procedures designed to identify:
a) Reliability coefficients.
b) Standard errors of measurement.
c) **Factors or specific variables**.
d) Validity coefficients for criterion measures.
41. Estimating or extracting factors and deciding how many factors to retain are steps in:
a) Confirmatory factor analysis.
b) **Exploratory factor analysis**.
c) Discriminant analysis.
d) Regression analysis.
42. Researchers testing the degree to which a hypothetical model fits the actual data are using:
a) Exploratory factor analysis.
b) **Confirmatory factor analysis**.
c) Item response theory.
d) Classical test theory.
43. A factor inherent in a test that systematically prevents accurate, impartial measurement is known as:
a) Test reliability.
b) **Test bias**.
c) Test standardization.
d) Test norming.
44. A judgment resulting from the intentional or unintentional misuse of a rating scale is a:
a) Standard error.
b) **Rating error**.
c) Validity coefficient.
d) Reliability coefficient.
45. The tendency of a rater to be lenient in scoring is known as:
a) Severity error.
b) Central tendency error.
c) Halo effect.
d) **Leniency error**.
46. The tendency to be harsh or too generous in rating is called:
a) Leniency error.
b) Central tendency error.
c) Halo effect.
d) **Severity error**.
47. A rater exhibiting a reluctance to give ratings at either the positive or negative extreme is demonstrating:
a) Leniency error.
b) Severity error.
c) **Central tendency error**.
d) Halo effect.
48. The tendency to give a ratee a higher rating than they objectively deserve due to the rater’s overall impression is the:
a) Leniency error.
b) Severity error.
c) Central tendency error.
d) **Halo effect**.
49. The extent to which a test is used in an impartial, just, and equitable way is referred to as:
a) Test reliability.
b) Test validity.
c) **Test fairness**.
d) Test utility.
________________________________
1. According to the text, a distinguishable, relatively enduring way in which one individual varies from another is a:
a) state
b) construct
c) trait
d) overt behavior
--> c) trait \
2. A psychological characteristic that distinguishes one person from another but is relatively less enduring is a:
a) state
b) trait
c) construct
d) error variance
--> a) state \
3. An informed, scientific concept developed to describe or explain behavior is a:
a) trait
b) state
c) construct
d) cumulative score
--> c) construct \
4. Overt behavior, in the context of assessment, refers to:
a) internal thoughts and feelings
b) an observable action or the product of an observable action
c) enduring personality characteristics
d) temporary emotional states
--> b) an observable action or the product of an observable action \
5. The assumption that psychological traits and states can be quantified and measured suggests that:
a) all psychological constructs are directly observable
b) measurement is always perfectly accurate
c) once defined, constructs can be assessed using appropriate methods
d) subjective experiences cannot be measured
--> c) once defined, constructs can be assessed using appropriate methods \
6. Cumulative scoring, as described in the text, involves:
a) comparing an individual's score to a norm group
b) evaluating performance against a set standard
c) measuring a trait through a series of test items
d) assessing changes in a state over time
--> c) measuring a trait through a series of test items \
7. The assumption that test-related behavior predicts non-test-related behavior implies that:
a) test scores are the only indicator of future performance
b) individuals behave identically in test and real-world situations
c) responses on a test can offer insights into behavior outside the test context
d) all tests are perfect predictors of future behavior
--> c) responses on a test can offer insights into behavior outside the test context \
8. According to the text, a competent test user should:
a) believe that the tests they use are flawless
b) only rely on test scores for decision-making
c) understand and appreciate the limitations of the tests they use
d) ignore data from other sources
--> c) understand and appreciate the limitations of the tests they use \
9. Which of the following is given as an example of a limitation of self-report personality tests?
a) They are too objective
b) They always accurately reflect the test-taker's personality
c) They primarily explore the subjective perception of the test-taker
d) They do not allow for exploration of internal feelings
--> c) They primarily explore the subjective perception of the test-taker \
10. In assessment, error variance refers to:
a) the consistency of test scores over time
b) the degree to which a test measures what it intends to measure
c) components of a test score attributable to sources other than the measured trait
d) systematic biases in test administration
--> c) components of a test score attributable to sources other than the measured trait \
11. The standardization of test procedures aims to:
a) increase the cost of test administration
b) make tests more subjective
c) minimize unfairness and bias in assessment
d) eliminate all sources of error variance
--> c) minimize unfairness and bias in assessment \
12. The text suggests that a world without tests would likely be a "nightmare" because:
a) everyone would get the same opportunities regardless of merit
b) tests are the only way to understand individual differences
c) there would be no way to ensure competent professionals in critical fields
d) people enjoy taking tests
--> c) there would be no way to ensure competent professionals in critical fields \
13. A "good test," according to the source, includes:
a) ambiguous instructions to encourage critical thinking
b) high cost to ensure quality
c) clear instructions for administration, scoring, and interpretation
d) subjective scoring to allow for individual examiner judgment
--> c) clear instructions for administration, scoring, and interpretation \
14. **Psychometric soundness** encompasses which two key characteristics of a good test?
a) Economy and clarity
b) Objectivity and subjectivity
c) Reliability and validity
d) Standardization and norming
--> c) Reliability and validity \
15. **Reliability** of a test refers to its:
a) accuracy in measuring a specific construct
b) ability to predict future behavior
c) consistency in producing similar results under similar conditions
d) degree of standardization across administrations
--> c) consistency in producing similar results under similar conditions \
16. **Validity** of a test refers to its:
a) consistency over multiple administrations
b) accuracy in measuring what it purports to measure
c) economy in terms of time and cost
d) clarity of instructions for test-takers
--> b) accuracy in measuring what it purports to measure \
17. Standardized scores that help interpret an individual's test performance by comparing them to a larger group are called:
a) raw scores
b) validity coefficients
c) reliability estimates
d) norms
--> d) norms \
18. A **normative sample** is:
a) the group of individuals currently taking a test
b) a panel of experts who review test content
c) a group of people whose performance on a test is analyzed for reference
d) the test developers themselves
--> c) a group of people whose performance on a test is analyzed for reference \
19. The process of deriving norms for a test is called:
a) validation
b) reliability analysis
c) norming
d) standardization
--> c) norming \
20. **User norms** are often provided by test developers and are based on:
a) formal sampling methods to ensure national representation
b) extensive longitudinal studies
c) descriptive statistics of a group of test-takers in a given time period
d) expert opinions on expected performance
--> c) descriptive statistics of a group of test-takers in a given time period \
21. **Standardization** of a test involves:
a) only determining the reliability of the test
b) only establishing the validity of the test
c) administering the test to a representative sample to establish norms
d) translating the test into multiple languages
--> c) administering the test to a representative sample to establish norms \
22. **Sampling** in test development refers to:
a) the process of scoring individual test responses
b) the statistical analysis of test item difficulty
c) selecting a portion of the population deemed representative of the whole
d) the method of administering the test (e.g., online vs. paper-and-pencil)
--> c) selecting a portion of the population deemed representative of the whole \
23. Selecting subgroups within a defined population to be proportionately represented in a sample is called:
a) incidental sampling
b) purposive sampling
c) stratified sampling
d) simple random sampling
--> c) stratified sampling \
24. In **stratified-random sampling**:
a) subgroups are selected based on the researcher's judgment
b) the most easily accessible members of the population are chosen
c) each member of a subgroup has the same chance of being included in the sample
d) subgroups are represented based on convenience
--> c) each member of a subgroup has the same chance of being included in the sample \
25. Selecting a sample that the researcher believes to be representative of the population is:
a) stratified random sampling
b) incidental sampling
c) purposive sampling
d) simple random sampling
--> c) purposive sampling \
26. Selecting a sample based on convenience is referred to as:
a) stratified sampling
b) purposive sampling
c) simple random sampling
d) incidental sampling
--> d) incidental sampling \
27. Establishing a standard set of instructions and conditions for test administration ensures that:
a) all test-takers will perform equally well
b) the test becomes more valid
c) the scores of the normative sample are more comparable with future test-takers' scores
d) the test is easier to score
--> c) the scores of the normative sample are more comparable with future test-takers' scores \
28. After collecting and analyzing test data from a normative sample, the test developer will:
a) discard any scores that deviate significantly from the mean
b) administer the test to an even larger sample
c) summarize the data using descriptive statistics
d) revise the test items based on subjective judgment
--> c) summarize the data using descriptive statistics \
29. **Percentiles** indicate:
a) the average score of a particular age group
b) the grade level at which a certain score is typical
c) the percentage of people whose score falls below a particular raw score
d) how many items an individual answered correctly
--> c) the percentage of people whose score falls below a particular raw score \
30. A percentile rank answers the question:
a) "How many people scored above this individual?"
b) "What was the average score on the test?"
c) "What percent of the scores fall below a particular score?"
d) "How consistent are the test scores?"
--> c) "What percent of the scores fall below a particular score?" \
31. **Age norms** indicate:
a) the typical score for students in a specific grade
b) how an individual's performance compares to a national sample
c) the average performance of test-takers at different age levels
d) the percentage of test-takers of a certain age who passed the test
--> c) the average performance of test-takers at different age levels \
32. **Grade norms** are designed to indicate:
a) the age range for which a test is appropriate
b) how an individual student compares to their local peers
c) the average test performance of test-takers in a given school grade
d) the national average score for students of all ages
--> c) the average test performance of test-takers in a given school grade \
33. **National norms** are derived from a normative sample that is:
a) collected within a specific geographic region
b) composed of individuals with a particular characteristic
c) nationally representative of the population at the time of the norming study
d) easily accessible to the test developers
--> c) nationally representative of the population at the time of the norming study \
34. **Anchor norms** are used to:
a) establish local performance standards
b) segment a normative sample by specific criteria
c) compare an individual's test results to a national sample
d) track changes in performance over time
--> c) compare an individual's test results to a national sample \
35. A normative sample segmented by criteria such as age or gender creates:
a) national norms
b) anchor norms
c) subgroup norms
d) user norms
--> c) subgroup norms \
36. **Local norms** provide normative information:
a) based on a nationally representative sample
b) for specific age or grade levels across the country
c) with respect to the local population's performance on a test
d) that is easily generalizable to other regions
--> c) with respect to the local population's performance on a test \
37. A **fixed reference group scoring system** uses:
a) criteria based on expert judgment
b) comparisons to current test-takers
c) the distribution of scores from one specific group of test-takers as the basis for future scoring
d) multiple cut-off scores to categorize performance
--> c) the distribution of scores from one specific group of test-takers as the basis for future scoring \
38. In **norm-referenced evaluation**, the focus is on:
a) whether an individual has mastered a specific skill
b) whether a pre-defined standard has been met
c) how an individual performed relative to other people who took the test
d) the specific content knowledge an individual possesses
--> c) how an individual performed relative to other people who took the test \
39. In **criterion-referenced testing**, the focus is on:
a) an individual's ranking within a group
b) comparing an individual's score to the average score
c) what the test-taker can or cannot do relative to a set standard
d) how consistently an individual performs on repeated testing
--> c) what the test-taker can or cannot do relative to a set standard \
40. Criterion-referenced tests are frequently used to gauge:
a) personality traits
b) general intelligence
c) achievement or mastery
d) aptitude for future learning
--> c) achievement or mastery \
41. A potential drawback of criterion-referenced testing is that:
a) it is difficult to determine if someone is truly proficient
b) it does not allow for comparison between individuals
c) the criteria are set by people who may have biases
d) it cannot be used to assess complex skills
--> c) the criteria are set by people who may have biases \
42. **Culture** is a major factor in:
a) only the interpretation of test scores
b) only the administration of tests
c) only the scoring of tests
d) administration, scoring, and interpretations of tests
--> d) administration, scoring, and interpretations of tests \
43. When selecting a test for use with a specific group, a responsible test user should:
a) assume that all tests are universally applicable
b) only consider the test's reliability and validity coefficients
c) research the test's available norms to determine their appropriateness for the target population
d) prioritize tests that are the least expensive to administer
--> c) research the test's available norms to determine their appropriateness for the target population \
44. A "do" in culturally informed assessment is to:
a) assume a "one-size-fits-all" view of assessment
b) take for granted that a test impacts all groups similarly
c) be aware of the cultural assumptions on which a test is based
d) assume all cultural communities will deem particular tests appropriate
--> c) be aware of the cultural assumptions on which a test is based \
45. A "don't" in culturally informed assessment is to:
a) strive to incorporate assessment methods that complement diverse worldviews
b) be knowledgeable about alternative measurement procedures
c) take for granted that a test is based on assumptions that impact all groups in much the same way
d) score and interpret data in its cultural context
--> c) take for granted that a test is based on assumptions that impact all groups in much the same way \
46. When considering cultural equivalence issues in assessment, it is important to consider the equivalence of:
a) only the statistical properties of the test
b) only the test format and administration procedures
c) the language used and the constructs measured
d) only the test's reliability over different cultural groups
--> c) the language used and the constructs measured \
47. When scoring, interpreting, and analyzing assessment data, it is crucial to consider:
a) only the individual test scores
b) ignoring any potential cultural influences
c) the cultural context and cultural hypotheses as possible explanations for findings
d) solely relying on national norms
--> c) the cultural context and cultural hypotheses as possible explanations for findings \
48. True or False: A test with high reliability in one cultural group will automatically have high reliability in all cultural groups.
a) True, reliability is a universal property of a test.
b) False, reliability can be affected by cultural factors and test content relevance.
c) True, as long as the test is properly translated.
d) False, reliability is solely determined by the number of test items.
--> b) False, reliability can be affected by cultural factors and test content relevance. \
49. True or False: Using national norms is always appropriate when assessing individuals from diverse cultural backgrounds.
a) True, national norms provide the best overall comparison group.
b) False, national norms may not accurately reflect the performance of specific cultural subgroups.
c) True, as long as the test has been standardized on a large national sample.
d) False, local norms are always better.
--> b) False, national norms may not accurately reflect the performance of specific cultural subgroups. \
50. True or False: Test bias can be completely eliminated through careful test development and standardization procedures.
a) True, rigorous psychometric methods can ensure a completely unbiased test.
b) False, bias is inherent in all forms of assessment and cannot be fully eliminated.
c) False, while standardization minimizes bias, some cultural or subgroup differences may still exist.
d) True, if test developers follow ethical guidelines strictly.
--> c) False, while standardization minimizes bias, some cultural or subgroup differences may still exist. \