1/129
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Dynamic characteristic of the trait
A researcher develops a new test for "Creative Problem Solving." To establish its reliability, she administers the test to a group of college students on the first day of the semester and again on the last day. She finds a low correlation between the scores. Which of the following is the most likely threat to test-retest reliability in this scenario?
Parallel-Forms Reliability
A psychometrician is creating two versions of an achievement test to be used for pre-test and post-test evaluation. He ensures that both versions have the same number of items, the same format, and cover the same content. He also confirms that the mean and variance of scores for both forms are statistically equal. What type of reliability is he trying to establish?
Cronbach’s Alpha (α)
Dr. Santos develops a new 10-item scale to measure anxiety. The items are all rated on a 5-point Likert scale. To assess its internal consistency, she should use which of the following statistical tools?
The test may be measuring only a single, narrow factor
A test developer finds that her new test has a very high internal consistency (α = .95) but it is designed to measure a multifaceted construct like "Job Readiness," which includes skills, personality, and interests. What is a potential issue with this high alpha?
57-73
A clinician administers a depression inventory to a client. The client scores 65, and the test manual reports a Standard Error of Measurement (SEM) of 4. The clinician wants to be 95% confident about the range containing the client's true score. What is this range?
Predictive Validity
A company uses a pre-employment test to select programmers. To validate the test, they correlate the test scores of newly hired programmers with their 6-month performance reviews. This is an example of what kind of validity?
Content Validity
A professor creates a final exam for a history course. He carefully maps out all the topics covered during the semester and ensures the exam questions are proportionally distributed according to the time spent on each topic. What type of validity is he prioritizing?
Convergent validity and Discriminant validity
A researcher is developing a new scale for "Emotional Intelligence." She finds that scores on her scale are highly correlated with scores on a well-established "Empathy" scale, but have a very low correlation with an "IQ" test. This pattern provides evidence for:
Very beneficial
A test has a validity coefficient of .38 for predicting success in a sales job. According to the provided interpretation guidelines, how useful is this test?
National anchor norms
A school district wants to compare the performance of its students on two different standardized reading tests (Test A and Test B). To do this, they use an equivalency table that links scores on Test A to corresponding scores on Test B. This table was likely created using:
The student's score was higher than or equal to 85% of the students in the normative sample.
A student scores in the 85th percentile on a national achievement test. What does this mean?
A negatively skewed distribution
A test developer creates a test with items that are all very easy. What is the likely outcome of the distribution of scores?
-1.67
A psychologist is assessing a patient for a suspected cognitive disorder. The patient's score on a memory test is 75. The test has a mean of 100 and a standard deviation of 15. What is the patient's Z-score?
Good
A test has a reliability coefficient of 0.84. What is the interpretation of this value?
Specificity
A university wants to use an entrance exam to predict which students will graduate with honors. They find that the test correctly identifies 80% of the students who do graduate with honors but also incorrectly identifies 30% of students who do not graduate with honors as being likely to do so. The 30% figure represents an issue with the test's:
To undergo further training to ensure they are applying a coding system in the same way.
Two psychologists observe a child's behavior on the playground and rate the frequency of aggressive acts. They use a well-defined coding system, but their ratings only have a correlation of .55. To improve interrater reliability, their best course of action would be:
Classical Test Theory
A test developer wants to create a very short screening tool for depression. She knows that by having fewer items, she is likely to decrease the test's reliability. This is a core concept of which theory?
Reliable, but not valid.
A test is found to be highly reliable (r = .92) but it does not correlate with any real-world outcome it is supposed to predict. This test can be described as:
Cut score
A school uses a reading test to place students into remedial, regular, or advanced classes. The score used to separate students into these groups is called a:
Random error
A researcher is concerned that scores on her new personality test are being influenced by the test-takers' mood on the day of the test. This is an example of what type of error?
Test bias
A new test of mechanical aptitude shows a high correlation with job performance for male mechanics but a low correlation for female mechanics. This is an example of:
Two standard deviations above the mean.
A psychologist uses a test that yields a T-score. A client receives a T-score of 70. This score is:
Discrimination
A test developer is using Item Response Theory (IRT) to analyze a test item. She is interested in how well the item differentiates between people with high and low levels of the trait. She is looking at the item's:
Stratified sampling
A psychologist wants to create a norm group for a new test for Filipino college students. She ensures that her sample includes students from Luzon, Visayas, and Mindanao, and from public and private universities, in the same proportions as they exist in the national population. This is an example of what sampling method?
It is used to make decisions with significant consequences for the test-taker.
A test is considered a "power test" when:
Experimental Design
A researcher wants to study the effect of a new therapy on anxiety levels. She randomly assigns participants to either a treatment group or a control group. What research design is she using?
Correlational Design
A school psychologist wants to see if there is a relationship between the number of hours students spend on social media and their GPA. She collects data from 200 students but does not manipulate any variables. This is an example of a:
Ordinal
A market researcher asks shoppers to rate their satisfaction with a new product as "Very Unsatisfied," "Unsatisfied," "Neutral," "Satisfied," or "Very Satisfied." What scale of measurement is being used?
Median
A clinical psychologist is reviewing a patient's daily mood ratings, which are recorded as a score from 1 to 100. Which measure of central tendency would be most appropriate to summarize the patient's typical mood if the data is heavily skewed due to a few extremely bad days?
One standard deviation above the mean
A set of test scores has a mean of 80 and a standard deviation of 6. A student scores 86. This score is:
There is a strong negative relationship between TV watching and fitness.
A researcher finds a correlation of r = -0.75 between time spent watching TV and physical fitness scores. How should this be interpreted?
One-Way ANOVA
A researcher wants to compare the average test scores of three different classrooms that used three different teaching methods. What is the most appropriate statistical test to use?
A Type I error
In hypothesis testing, a researcher concludes that a new drug is effective, but in reality, it is not. This is an example of:
Chi-Squared Test of Independence
A researcher wants to determine if there is a relationship between gender (Male/Female) and passing or failing a certification exam. What statistical test should be used?
Platykurtic
A distribution of scores that is relatively flat with thin tails and a low frequency of outliers is described as:
Point Biserial
A researcher is studying the relationship between a person's score on a true/false personality test and their annual income in pesos. What is the appropriate correlation coefficient to calculate?
It increases the risk of a Type II error.
If a researcher sets their significance level (alpha) at p < 0.01 instead of p < 0.05, what is the effect on hypothesis testing?
Levene’s Test
A test developer wants to check if the variances of test scores are equal across three different age groups before running an ANOVA. What test should she use?
Regression
A company wants to predict an applicant's future job performance score (on a scale of 1-100) based on their score from a pre-employment test. The statistical technique they should use is:
Strong positive correlation
A scatter plot shows data points forming a tight cluster from the bottom left to the top right. This indicates a:
Cross-Sectional Design
A researcher is studying the development of language skills in children. She tests a group of 3-year-olds, a group of 4-year-olds, and a group of 5-year-olds all at the same time. What research design is this?
Squaring the correlation coefficient (r).
The "coefficient of determination" is calculated by:
Mann-Whitney U Test
A researcher wants to compare the ranking of preferred leadership styles between a group of managers and a group of non-managerial employees. The data is ordinal. What non-parametric test is appropriate?
68%
A test has a normal distribution with a mean of 50 and a standard deviation of 10. Approximately what percentage of scores will fall between 40 and 60?
Constant Sum
A researcher develops a survey where participants must allocate a total of 100 points among five different job characteristics (e.g., salary, work-life balance, career growth) based on their importance. This is an example of what type of scaling?
Test Conceptualization
A test developer is in the initial stage of creating a new test for "digital literacy." She is brainstorming what the test should measure, its objective, the target population, and the ideal format. This stage is known as:
Foils or distractors
A psychometrician is writing items for a multiple-choice test. For one item, the correct answer is 'B'. He writes options 'A', 'C', and 'D' to be plausible but incorrect alternatives. These incorrect options are called:
The item is flawed because the p-value is equal to the chance of guessing correctly.
During item analysis of a 4-option multiple-choice test, an item has a difficulty index (p) of 0.25. What does this suggest?
Negative item-discrimination index (d)
A test developer analyzes an item and finds that students who scored high on the overall test tended to get the item wrong, while students who scored low on the overall test tended to get it right. This item would have a:
Correction for guessing
To avoid the influence of guessing on a true/false test, a scoring rule is implemented where the final score is the number of correct answers minus the number of incorrect answers. This is an example of:
Item pool
A test developer creates a large reservoir of questions that can be used to generate future versions of an exam. This collection of test questions is known as an:
CAT
A university uses a computerized entrance exam where the difficulty of the next question presented to a student depends on their answer to the previous question. This is an example of:
Likert Scale
A psychologist is creating an attitude scale where respondents must agree or disagree with statements like "I believe exercise is essential for a healthy lifestyle." The statements range from very positive to very negative. This is characteristic of a:
Floor effect
A test developer wants to ensure her test is not too difficult for the intended population. She is concerned about a potential:
Test tryout
After developing a new test, a psychometrician administers it to a sample of test-takers that is representative of the target population. This phase is called the:
Double-barreled item
A test item reads: "Do you agree that the university should not decrease funding for the library and student sports?" This item is problematic because it is a:
Differential Item Functioning (DIF)
During test revision, a developer finds that an item is answered correctly more often by male test-takers than by female test-takers, even when both groups have the same overall ability level on the construct being measured. This item is exhibiting:
Validity shrinkage
A test developer validates a test on a group of college students and finds a high validity coefficient. He then uses the test on a new, different group of students and finds the validity coefficient is slightly lower. This phenomenon is known as:
Q-Sort Technique
A psychologist asks a client to sort a deck of 100 cards with personality statements on them into nine piles, from "most like me" to "least like me," with a specified number of cards required for each pile. This scaling technique is called:
Good item
An item on a test has a discrimination index (d) of 0.35. According to the provided guidelines, this is considered a:
One group known to have mastered the skill and another group known not to have mastered it.
A test developer is creating a test for a criterion-referenced purpose (e.g., passing a licensing exam). During the pilot test, she should administer the test to:
Scaling
The process of setting rules for assigning numbers in measurement is called:
Constructed-response format
A test question that requires the test-taker to write a few sentences in response to a prompt is an example of a(n):
Item characteristic curve
A test developer plots a graph for a single item, showing the probability of a correct response on the y-axis and the test-taker's overall ability level on the x-axis. This graph is called an:
Test Revision
The final stage of the test development process, where a test's content and format are modified to improve its effectiveness, is:
Legal Context
A lawyer requests a psychological evaluation for her client to determine if he is mentally capable of understanding the legal proceedings against him. This assessment is taking place in which context?
Retrospective Assessment
A psychologist is asked to evaluate a historical figure's state of mind at the time of a major event, using diaries, letters, and historical accounts. This is an example of:
Criterion variance and information variance.
A primary advantage of using a structured clinical interview over an unstructured one is that it reduces:
Behavioral observation using the SORC model
A school psychologist observes a child in the classroom to understand the triggers and consequences of his disruptive behavior. She notes what happens immediately before the behavior and immediately after. This approach is best described as:
It can be subjective to evaluate and lacks standardization.
A major limitation of using portfolio assessment for evaluating job applicants is:
Interpretative report
A company uses a computer program to score and generate a detailed report on a candidate's personality test, including narrative statements about their likely work style. This is an example of a(n):
The assumption of error in the assessment process
The assumption that a person's score on a test is composed of their true ability plus some random influence is a core tenet of:
Confirmatory function
A psychologist is using a test battery to assess a client with a complex referral question. She uses a test of cognitive ability, a personality inventory, and a clinical interview. She finds that the results from all three tools point towards a diagnosis of major depressive disorder. In this case, the different tools are serving a(n):
It increases efficiency in scoring and data management.
A primary advantage of computer-assisted psychological assessment (CAPA) over traditional paper-and-pencil testing is:
The actuarial or mechanical approach
An approach to assessment that relies on statistical rules and probabilities, often using computer algorithms to generate findings, is known as:
WAIS-IV
A psychologist needs to conduct a comprehensive assessment of cognitive abilities for a 45-year-old man suspected of having early-onset dementia. Which of the following tests would be most appropriate?
RPM
A school psychologist wants to assess a 7-year-old child who is nonverbal and has recently immigrated to the Philippines. She wants a measure of general intelligence that minimizes the influence of language and culture. Which test is a good choice?
Mechanics
An HR manager is selecting candidates for a mechanical engineering position. She wants to assess their ability to understand basic mechanical principles of machinery and tools. Which subtest from the Flanagan Industrial Tests would be most relevant?
NEO Five-Factor Inventory (NEO-PI-3)
A clinician is assessing a client and wants to get a broad overview of their personality based on the "Big Five" model. Which of the following tests is based on this model?
F (Infrequency) Scale
A psychologist is assessing a client who may be exaggerating her psychological problems to get disability benefits. On the MMPI-2, which validity scale would be most helpful in detecting this "faking bad" response style?
The influence of social desirability bias.
The Edwards Personal Preference Schedule (EPPS) uses a forced-choice format where test-takers must choose between two statements of equal social desirability. This is done to minimize:
PUP
A Filipino psychologist wants to use a personality inventory that was developed locally and measures Filipino-oriented traits and behaviors. Which of the following would be an appropriate choice?
Social (S)
A career counselor is helping a high school student explore potential careers based on his interests. The student enjoys helping people, teaching, and working in groups. According to Holland's RIASEC model, this student would likely score highest on which theme?
Routing test
A psychologist is administering the Stanford-Binet (SB-5). She starts with the Vocabulary subtest and, based on the examinee's performance, determines which level of items to proceed with. The Vocabulary subtest is serving as a(n):
Block Design
The subtest on the WAIS-IV that requires an individual to copy two-dimensional designs using three-dimensional blocks, measuring nonverbal reasoning and visuo-spatial processing, is called:
(Bor) Borderline Features
A psychologist wants to assess a client for borderline personality features, such as emotional lability, impulsivity, and relationship problems. Which scale on the Personality Assessment Inventory (PAI) would be most relevant?
Kinetic Family Drawing Test (KFD)
A child is asked to draw a picture of his family where everyone is doing something. This is the standard instruction for the:
Thematic Apperception Test (TAT)
A therapist uses a projective test where a client is shown a series of ambiguous pictures of people and is asked to tell a story about each picture. The therapist is likely using the:
High Anxiety vs. Low Anxiety
The Myers-Briggs Type Indicator (MBTI) assesses personality on four dichotomies. Which of the following is NOT one of those dichotomies?
Achievement test
A test that measures what a person has learned in a specific academic subject, like math or history, is classified as a(n):
Aptitude test
The Differential Aptitude Tests (DAT) are designed to measure a student's potential for learning in various areas to aid in educational and vocational guidance. This makes it primarily a(n):
MCMI-III
A psychologist is assessing a client for psychopathology and wants a test specifically designed for clinical populations that is grounded in an evolutionary theory of personality. Which test would be most appropriate?
Standard Progressive Matrices (SPM)
A test user is looking for a brief, nonverbal measure of abstract reasoning for an adult of average intelligence. Which version of the Raven's Progressive Matrices would be most suitable?
Have a substantial understanding and highly specialized educational background, like a Registered Psychologist.
A test is described as requiring a "Level C" qualification for its users. This means the test user must:
Sentence Completion Test
A client is asked to complete the phrase "My greatest fear is ______." This is an item from what type of test?
Building rapport
Before administering a standardized test, the examiner engages in some brief, friendly conversation with the test-taker to make them feel more comfortable. This is part of:
Accommodation
An examiner is administering a test to a child with a visual impairment. She provides a large-print version of the test booklet. This is an example of a(n):
Flynn Effect
The observation that average scores on IQ tests have been steadily increasing over the decades is known as the:
Back translation
A test developer translates a personality inventory from English to Filipino. To ensure the translation is accurate, she has another bilingual expert translate the Filipino version back into English, and then compares it to the original. This process is called:
Drift
During a behavioral observation, a supervisor notices that two of his raters, who were initially very accurate, have slowly started to define "disruptive behavior" in their own unique ways, leading to inconsistent ratings. This issue is known as: