1/59
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
A test that measures what the test taker has learned is classified as a test of:
-aptitude
-achievement
-affect
-performance
achievement
Which of the following is most likely to increase reliability?
-a homogeneous population
-test items that are at a maximum level of difficulty
-test items that are very easy
-increasing the number of items from 20 to 50
increasing the number of items from 20 to 50
Which of the following is a drawback in using percentiles?
-they are not easy to interpret
-they are rarely used
-they are ordinally ranked
-there is no distortion of the underlying measured scale
they are ordinally ranked
According to your text, which of the following item formats is most desirable for group administered tests of intellect and achievement.
-essay
-multiple choice
-oral
-true/false
multiple choice
A college senior took the GRE and scored in the 84th percentile. Assuming a normal distribution, the student's GRE score is:
-60
-115
-600
-750
600
A college entrance exam that is reasonably accurate in predicting subsequent grade point average of examinees possesses:
-face validity
-concurrent validity
-content validity
-criterion-related validity
criterion-related validity
After scoring high on a job performance test, Tom was hired with great expectation to succeed. However, Tom was terminated after two months due to low performance. According to the decision theory, this outcome is called:
-false negative
-false positive
-negative outcome
-positive error
false positive
Which of the following item formats is most likely to result in response sets?
-true-false
-multiple choice
-forced choice
-matching
true-false
Mary read only four chapter summaries in her introductory psychology book, attended only five class lectures, but scored 90% on the exam. What most likely has taken place?
-time-of-measurement error
-sampling-of-questions error
-scoring error
-mary is a great guesser
sampling-of-questions error
The Sonic Boom Aerospace Company routinely uses aptitude tests to screen for the best qualified potential employees. John Smith scored too low on his test to be hired, but he is the boss's nephew, and the supervisor is instructed to hire him anyway. Six months later, John is fired for incompetency. An evaluation of the validity of the test would conclude that the prediction was a:
-false positive
-false negative
-hit
-miss
hit
Which of the following is not used to measure internal consistency?
-coefficient alpha
-spearman-brown
-the kuder-richardson method
-test-retest reliability
test-retest reliability
According to item response theory, the standard error of measurement:
-is larger in the middle of the distribution.
-is larger in the extreme low end of the distribution, but not the extreme high end of the distribution.
-is larger in the extreme high end of the distribution, but not the extreme low end of the distribution.
-is larger in the extreme low and high ends of the distribution.
is larger in the extreme low end of the distribution, but not the extreme high end of the distribution.
The margin of prediction error caused by the imperfect criterion-related validity of the test is indicated by the:
-standard error of measurement
-standard error of estimate
-standard deviation error of measurement
-standard deviation error of estimate
standard error of estimate
Most psychological characteristics are at best measured on:
-ordinal and ratio scales
-nominal and ordinal scales
-ordinal and interval scales
-nominal and ratio scales
ordinal and interval scales
If one wants to see the relation between test scores and expected outcome on a relevant task, one would refer to a(n):
-predictor table
-criterion chart
-expectancy table
-norming table
expectancy table
Which of the following is not an assumption of the classical test theory?
-mean error of measurement is 0
-true scores and errors are correlated
-errors on different tests are not correlated
-measurement errors are random
true scores and errors are correlated
The Army Alpha was:
-an individually administered nonverbal intelligence test
-a test of recent immigrants to determine if they had adequate intelligence to serve in the U.S. military during World War I
-an individually-administered intelligence test used to identify officer training candidates
-a group administered test for Army recruits during World War I
Which of the following is true as it relates to Classical Test Theory?
-measurement errors are unsystematic and unpredictable
-true scores and errors are correlated
-errors on different tests are correlated
-the mean of the distribution of measurement errors is equal to one
measurement errors are unsystematic and unpredictable
Bill's z-score on a classroom test is -1.5. About what percentage of the students in the class scored higher than Bill (assume a normal distribution)?
-60
-70
-80
-90
90
What is the lowest value that could be calculated for a difficulty index?
-(-1.0)
-(-.30)
-(.00)
-(+.30)
.00
In order to determine whether a test designed to measure personality is valid, one would probably want to use which of the following types of validity?
-content
-criterion-related
-construct
-incremental
construct
When the reliability of a test is close to zero, this means that:
-the test can still be used in applied settings
-the test should not be used in applied settings
-measurement error is practically non-existent
-the test may still be valid
the test should not be used in applied settings
A data set contains the following: 5, 12, 60, 60, 63, 65, 67, 75, 78, 79, 86, 87, 89, 90, 91, 92, 94, 98. When graphed, the distribution can be described as:
-negatively skewed
-positively skewed
-normally distributed
-leptokurtic
negatively skewed
A group administered test is administered to a large group of individuals. The mean on the exam is 50 and the standard deviation is 6. The test-retest reliability is .89 and the Serr is 2. One can be 95% confident that students with a score of 63 on the test have a true score between:
-51 and 75
-57 and 69
-59 and 67
-61 and 65
59 and 67
About what percentage of test takers can be expected to achieve a Verbal SAT score above 400?
-64
-75
-80
-84
84
The range of values for a discrimination index (D) is _________________, although we never want an item to have an index__________________.
-(-1.0 - +1.0; less than 0.0)
-(-1.0 - +1.0; greater than 0.0)
-(0.00 - 1.00; less than .05)
-(0.00 - 1.00; greater than .05)
-1.0 - +1.0; less than 0.0
Comptech is developing a test to predict the performance of newly hired employees. The scores on a 40-item paper-and-pencil test taken before they are hired will be correlated with scores on a computer-based test of job performance taken after 6 months of employment. Some of the questions on the paper-and-pencil test are very similar to those on the computer-based test, differing only in terms of slight changes in wording. Of the factors known to affect criterion related validity, which of the following is described in this example?
-insufficient training
-faulty base-rate averages
-criterion contamination
-predictor bias
criterion contamination
Jennifer is taking a test to determine her political orientation. The test requires her to circle a number between one (strongly disagree) and five (strongly agree). This scaling method is known as:
-likert scales
-method of equal-appearing intervals
-expert rankings
-guttman scales
Likert scales
Which of the following are true of licensing and certification tests?
-used for self-knowledge and are norm-referenced
-used for classification and are criterion-referenced
-used for program evaluation and are criterion-referenced
-used for diagnosis and are norm-referenced
used for classification and are criterion-referenced
Ninety-five percent of the scores in a normal distribution fall within approximately two standard deviations from the mean. Ninety-five percent of the scores in a non normal distribution fall within approximately:
-one standard deviation from the mean
-two standard deviations from the mean
-three standard deviations from the mean
-cannot be determined from information given
cannot be determined from information given
Which of the following statistics is relatively unaffected by highly skewed data?
-mean
-standard deviation
-variance
-median
median
Which of the following guidelines is correct when creating a good test question?
-popular misconceptions or statements are not good distractors because test takers will easily identify and dismiss familiar items as incorrect
-place as little of the item in the stem as possible
-all answer options should be about the same length
-use "All of the Above" as an answer choice in as many items as is feasible
all answer options should be about the same length
Discriminant validity can be assessed in order to establish:
-content validity
-construct validity
-criterion-related validity
-convergent validity
construct validity
Tim scored 130 on an intelligence test. Tim is in the upper _______ percent of the distribution of IQ scores (assume normal distribution)?
-1%
-2.5%
-5%
-7.5%
2.5%
What is the relationship between reliability and validity?
-they are both affected by systematic and unsystematic errors
-both can be described with a single statistic
-unreliable tests will not be found to be valid
-validity helps to determine the reliability of a test
unreliable tests will not be found to be valid
An economics professor computes the correlation between 2 variables. Variable 1 is the Medal won by an athlete in the Summer or Winter Olympics (gold, silver, bronze), and Variable 2 is amount of money earned (in U.S. dollars) by the athlete in commercial endorsements in the two years following the Olympics in which the athlete competed. Which correlation statistic should the professor use?
-pearson
-spearman
-biserial
-point biserial
spearman
When making a decision about an individual a reliability coefficient of ____ or higher is needed.
-.75
-.90
-1.00
-there is no exact value
90
An item analysis of an item on a classroom achievement test revealed a difficulty (p) index of .2 and a discrimination index(d) of -.60. What should be done with this test item?
-it's a good item, keep it
-it's too easy, discard it
-it's a terrible item that should be eliminated
-it borders on being ok, revise it
it's a terrible item that should be eliminated
If a raw score is at the 2nd percentile (assume normal distribution), which of the following are corresponding standard(ized) scores (approximately) in this order: z, T, IQ?
-(1.00, 50, 115)
-(-1.00, 40, 85)
-(-2.00, 30, 70)
(2.00, 70, 130)
-2.00, 30, 70
More variability within the group being tested is one of the factors that affects criterion related validity. In comparison to homogeneous groups, heterogeneous groups:
-produce lower validity estimates
-produce higher validity estimates
-produce equivalent validity estimates
-have no effect on validity
produce higher validity estimates
On exams, ceiling effects occur when:
-half of the test takers score near the bottom of the scale
-the complexity of the test questions is too difficult for the test takers who then guess on most questions
-a significant number of test takers score perfect or near perfect on the exam
-an inexperienced examiner interprets the test scores as elevated and reports false results
a significant number of test takers score perfect or near perfect on the exam
Sampling a restricted range of subjects will cause criterion-related validity to be:
-lower
-higher
-1.00
-0.00
lower
A sample of college students is selected to participate in a research study. Before selecting the sample, the researchers determined what percentage of the student population are in their first year, second year, third year, fourth year or fifth year and beyond. Next, the researchers randomly select from the student population, making sure that the percentage of each group in the sample matches the percentage found in the student population. The researchers are using:
-simple random sampling
-stratified random sampling
-random cluster sampling
-random group sampling
stratified random sampling
When evaluating items for a classroom test, internal consistency can be assessed by comparing items to:
-the total test score
-an external criterion
-the item validity index
-the item difficulty index
the total test score
A test that produces a significant floor effect:
-has too many high scores
-lacks face validity
-will also have a significant ceiling effect
-may need to have some easier questions added to the test
may need to have some easier questions added to the test
Split-half reliability is:
-the correlation of two forms of the same test given in a single administration
-the correlation of scores on a test between two independent scorers in a single administration
-the correlation of scores on a given test given between two administrations
-the correlation of two halves of a test given in a single administration
the correlation of two halves of a test given in a single administration
Which of the following is a disadvantage of essay tests?
-essay tests take more time to prepare
-essay tests are harder for examinees to bluff
-essay tests cover more material than multiple choice tests
-scoring of an essay test is relatively subjective
scoring of an essay test is relatively subjective
Which of the following is a norm-referenced test?
-IQ test
-psychologist's licensing exam
-driver's license test
-a typing test
IQ test
Which sampling technique allows every person in the population to have an equally likely chance of being chosen for the sample?
-stratified random sampling
-standardization sampling
-cluster sampling
-simple random sampling
simple random sampling
The "brass instruments era" in testing refers to the:
-time period when tests of musical ability were first developed
-first time in history standardized tests were used
development of precise measure of intelligence during the 20th century
-use of instruments to measure sensory threshold and reaction time objectively
use of instruments to measure sensory threshold and reaction time objectively
If the predictive validity coefficient for a job selection test is .65, which of the following statements is reasonable?
-there is insufficient validity to use the test to select employees
-the standard error of measurement will be too large to use this test
-the majority of the variance in job performance is accounted for or can be explained by knowing a person's score on the test
-the test will likely be a useful predictor
the test will likely be a useful predictor
What is the item discrimination index?
-it is a statistical index of how efficiently an item discriminates between persons scoring high and scoring low on the exam
-it is a useful tool for identifying items that should be revised or thrown out due to the total number of test takers who answered the item correctly
-it is the product of a test item's internal consistency as indexed by the correlation with the total score and its variability as indexed by the standard deviation
-it consists of the product of a test item's standard deviation and the point-biserial correlation coefficient with the criterion
-it is a statistical index of how efficiently an item discriminates between persons scoring high and scoring low on the exam
it is a statistical index of how efficiently an item discriminates between persons scoring high and scoring low on the exam
An item analysis of an item (with 4 multiple choice options—a, b, c and d) on a classroom achievement test taken by 90 students reveals a Lower Bound of .33. The U value is computed and is found to be 24. Also, 11 students in the upper group answered the question correctly, while 5 in the lower group answered the question correctly. Of the 90 test takers, 28 answered the question correctly. Based on this information, how would you evaluate the item?
-it is moderately difficult and a good discriminator, keep it as is
-it is moderately difficult and a marginal discriminator, minor revision is needed
-it is a difficult item and a good discriminator, keep it as is
-it is a difficult item and a marginal discriminator, minor revision is needed
It is a difficult item and a marginal discriminator, minor revision is needed.
When theory predicts that consistent group differences will be found for a test (e.g. older children score higher on a test of self-sufficiency than younger children), we are looking to support:
-construct validity
-homogeneous group reliability
-research demonstrating validity shrinkage
-criterion-related concurrent validity
construct validity
Professor Cooper gives a test to her students; while scoring the exams, she accidently gives 10 extra points to each student. The result of this error in scoring:
-produces systematic error and reliability is affected
-produces systematic error and reliability is not affected
-produces unsystematic error and reliability is affected
-produces unsystematic error and reliability is not affected
produces systematic error and reliability is not affected
Research results suggest that the relation between test anxiety and school achievement:
-is positively correlated
-is negatively correlated
-has no systematic relationship
-is unimportant because anxiety is assumed to be part of any test score and is therefore already accounted for
is negatively correlated
To construct a test with high item validity, one should choose items having:
-high inter-item correlations and high validity indexes
-high inter-item correlations and low validity indexes
-low to moderate inter-item correlations and high validity indexes
-low to moderate inter-item correlations and low validity indexes
low to moderate inter-item correlations and high validity indexes.
Jenny, a graduate student at CSUF, designed a scale to measure math competency in fifth graders. To establish reliability, she used the test-retest method. Which error affects this method?
-time of measurement
-item sampling
-subject sampling
-errors in test split
time of measurement
What type of test is most likely to use a standardized T score?
-personality tests
-scholastic admissions tests
-intelligence tests
-aptitude tests
personality tests
We can more accurately estimate a person's true score on a test when the:
-standard error of estimate is minimized
-standard error of the mean is minimized
-standard error of measurement is minimized
-standard deviation of test scores is minimized
standard error of measurement is minimized