Psych Assessment Drills from St. Anne

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/129

There's no tags or description

Looks like no tags are added yet.

Last updated 2:59 PM on 6/2/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

130 Terms

New cards

Dynamic characteristic of the trait

A researcher develops a new test for "Creative Problem Solving." To establish its reliability, she administers the test to a group of college students on the first day of the semester and again on the last day. She finds a low correlation between the scores. Which of the following is the most likely threat to test-retest reliability in this scenario?

New cards

Parallel-Forms Reliability

A psychometrician is creating two versions of an achievement test to be used for pre-test and post-test evaluation. He ensures that both versions have the same number of items, the same format, and cover the same content. He also confirms that the mean and variance of scores for both forms are statistically equal. What type of reliability is he trying to establish?

New cards

Cronbach’s Alpha (α)

Dr. Santos develops a new 10-item scale to measure anxiety. The items are all rated on a 5-point Likert scale. To assess its internal consistency, she should use which of the following statistical tools?

New cards

The test may be measuring only a single, narrow factor

A test developer finds that her new test has a very high internal consistency (α = .95) but it is designed to measure a multifaceted construct like "Job Readiness," which includes skills, personality, and interests. What is a potential issue with this high alpha?

New cards

57-73

A clinician administers a depression inventory to a client. The client scores 65, and the test manual reports a Standard Error of Measurement (SEM) of 4. The clinician wants to be 95% confident about the range containing the client's true score. What is this range?

New cards

Predictive Validity

A company uses a pre-employment test to select programmers. To validate the test, they correlate the test scores of newly hired programmers with their 6-month performance reviews. This is an example of what kind of validity?

New cards

Content Validity

A professor creates a final exam for a history course. He carefully maps out all the topics covered during the semester and ensures the exam questions are proportionally distributed according to the time spent on each topic. What type of validity is he prioritizing?

New cards

Convergent validity and Discriminant validity

A researcher is developing a new scale for "Emotional Intelligence." She finds that scores on her scale are highly correlated with scores on a well-established "Empathy" scale, but have a very low correlation with an "IQ" test. This pattern provides evidence for:

New cards

Very beneficial

A test has a validity coefficient of .38 for predicting success in a sales job. According to the provided interpretation guidelines, how useful is this test?

New cards

National anchor norms

A school district wants to compare the performance of its students on two different standardized reading tests (Test A and Test B). To do this, they use an equivalency table that links scores on Test A to corresponding scores on Test B. This table was likely created using:

New cards

The student's score was higher than or equal to 85% of the students in the normative sample.

A student scores in the 85th percentile on a national achievement test. What does this mean?

New cards

A negatively skewed distribution

A test developer creates a test with items that are all very easy. What is the likely outcome of the distribution of scores?

New cards

-1.67

A psychologist is assessing a patient for a suspected cognitive disorder. The patient's score on a memory test is 75. The test has a mean of 100 and a standard deviation of 15. What is the patient's Z-score?

New cards

Good

A test has a reliability coefficient of 0.84. What is the interpretation of this value?

New cards

Specificity

A university wants to use an entrance exam to predict which students will graduate with honors. They find that the test correctly identifies 80% of the students who do graduate with honors but also incorrectly identifies 30% of students who do not graduate with honors as being likely to do so. The 30% figure represents an issue with the test's:

New cards

To undergo further training to ensure they are applying a coding system in the same way.

Two psychologists observe a child's behavior on the playground and rate the frequency of aggressive acts. They use a well-defined coding system, but their ratings only have a correlation of .55. To improve interrater reliability, their best course of action would be:

New cards

Classical Test Theory

A test developer wants to create a very short screening tool for depression. She knows that by having fewer items, she is likely to decrease the test's reliability. This is a core concept of which theory?

New cards

Reliable, but not valid.

A test is found to be highly reliable (r = .92) but it does not correlate with any real-world outcome it is supposed to predict. This test can be described as:

New cards

Cut score

A school uses a reading test to place students into remedial, regular, or advanced classes. The score used to separate students into these groups is called a:

New cards

Random error

A researcher is concerned that scores on her new personality test are being influenced by the test-takers' mood on the day of the test. This is an example of what type of error?

New cards

Test bias

A new test of mechanical aptitude shows a high correlation with job performance for male mechanics but a low correlation for female mechanics. This is an example of:

New cards

Two standard deviations above the mean.

A psychologist uses a test that yields a T-score. A client receives a T-score of 70. This score is:

New cards

Discrimination

A test developer is using Item Response Theory (IRT) to analyze a test item. She is interested in how well the item differentiates between people with high and low levels of the trait. She is looking at the item's:

New cards

Stratified sampling

A psychologist wants to create a norm group for a new test for Filipino college students. She ensures that her sample includes students from Luzon, Visayas, and Mindanao, and from public and private universities, in the same proportions as they exist in the national population. This is an example of what sampling method?

New cards

It is used to make decisions with significant consequences for the test-taker.

A test is considered a "power test" when:

New cards

Experimental Design

A researcher wants to study the effect of a new therapy on anxiety levels. She randomly assigns participants to either a treatment group or a control group. What research design is she using?

New cards

Correlational Design

A school psychologist wants to see if there is a relationship between the number of hours students spend on social media and their GPA. She collects data from 200 students but does not manipulate any variables. This is an example of a:

New cards

Ordinal

A market researcher asks shoppers to rate their satisfaction with a new product as "Very Unsatisfied," "Unsatisfied," "Neutral," "Satisfied," or "Very Satisfied." What scale of measurement is being used?

New cards

Median

A clinical psychologist is reviewing a patient's daily mood ratings, which are recorded as a score from 1 to 100. Which measure of central tendency would be most appropriate to summarize the patient's typical mood if the data is heavily skewed due to a few extremely bad days?

New cards

One standard deviation above the mean

A set of test scores has a mean of 80 and a standard deviation of 6. A student scores 86. This score is:

New cards

There is a strong negative relationship between TV watching and fitness.

A researcher finds a correlation of r = -0.75 between time spent watching TV and physical fitness scores. How should this be interpreted?

New cards

One-Way ANOVA

A researcher wants to compare the average test scores of three different classrooms that used three different teaching methods. What is the most appropriate statistical test to use?

New cards

A Type I error

In hypothesis testing, a researcher concludes that a new drug is effective, but in reality, it is not. This is an example of:

New cards

Chi-Squared Test of Independence

A researcher wants to determine if there is a relationship between gender (Male/Female) and passing or failing a certification exam. What statistical test should be used?

New cards

Platykurtic

A distribution of scores that is relatively flat with thin tails and a low frequency of outliers is described as:

New cards

Point Biserial

A researcher is studying the relationship between a person's score on a true/false personality test and their annual income in pesos. What is the appropriate correlation coefficient to calculate?

New cards

It increases the risk of a Type II error.

If a researcher sets their significance level (alpha) at p < 0.01 instead of p < 0.05, what is the effect on hypothesis testing?

New cards

Levene’s Test

A test developer wants to check if the variances of test scores are equal across three different age groups before running an ANOVA. What test should she use?

New cards

Regression

A company wants to predict an applicant's future job performance score (on a scale of 1-100) based on their score from a pre-employment test. The statistical technique they should use is:

New cards

Strong positive correlation

A scatter plot shows data points forming a tight cluster from the bottom left to the top right. This indicates a:

New cards

Cross-Sectional Design

A researcher is studying the development of language skills in children. She tests a group of 3-year-olds, a group of 4-year-olds, and a group of 5-year-olds all at the same time. What research design is this?

New cards

Squaring the correlation coefficient (r).

The "coefficient of determination" is calculated by:

New cards

Mann-Whitney U Test

A researcher wants to compare the ranking of preferred leadership styles between a group of managers and a group of non-managerial employees. The data is ordinal. What non-parametric test is appropriate?

New cards

68%

A test has a normal distribution with a mean of 50 and a standard deviation of 10. Approximately what percentage of scores will fall between 40 and 60?

New cards

Constant Sum

A researcher develops a survey where participants must allocate a total of 100 points among five different job characteristics (e.g., salary, work-life balance, career growth) based on their importance. This is an example of what type of scaling?

New cards

Test Conceptualization

A test developer is in the initial stage of creating a new test for "digital literacy." She is brainstorming what the test should measure, its objective, the target population, and the ideal format. This stage is known as:

New cards

Foils or distractors

A psychometrician is writing items for a multiple-choice test. For one item, the correct answer is 'B'. He writes options 'A', 'C', and 'D' to be plausible but incorrect alternatives. These incorrect options are called:

New cards

The item is flawed because the p-value is equal to the chance of guessing correctly.

During item analysis of a 4-option multiple-choice test, an item has a difficulty index (p) of 0.25. What does this suggest?

New cards

Negative item-discrimination index (d)

A test developer analyzes an item and finds that students who scored high on the overall test tended to get the item wrong, while students who scored low on the overall test tended to get it right. This item would have a:

New cards

Correction for guessing

To avoid the influence of guessing on a true/false test, a scoring rule is implemented where the final score is the number of correct answers minus the number of incorrect answers. This is an example of:

New cards

Item pool

A test developer creates a large reservoir of questions that can be used to generate future versions of an exam. This collection of test questions is known as an:

New cards

CAT

A university uses a computerized entrance exam where the difficulty of the next question presented to a student depends on their answer to the previous question. This is an example of:

New cards

Likert Scale

A psychologist is creating an attitude scale where respondents must agree or disagree with statements like "I believe exercise is essential for a healthy lifestyle." The statements range from very positive to very negative. This is characteristic of a:

New cards

Floor effect

A test developer wants to ensure her test is not too difficult for the intended population. She is concerned about a potential:

New cards

Test tryout

After developing a new test, a psychometrician administers it to a sample of test-takers that is representative of the target population. This phase is called the:

New cards

Double-barreled item

A test item reads: "Do you agree that the university should not decrease funding for the library and student sports?" This item is problematic because it is a:

New cards

Differential Item Functioning (DIF)

During test revision, a developer finds that an item is answered correctly more often by male test-takers than by female test-takers, even when both groups have the same overall ability level on the construct being measured. This item is exhibiting:

New cards

Validity shrinkage

A test developer validates a test on a group of college students and finds a high validity coefficient. He then uses the test on a new, different group of students and finds the validity coefficient is slightly lower. This phenomenon is known as:

New cards

Q-Sort Technique

A psychologist asks a client to sort a deck of 100 cards with personality statements on them into nine piles, from "most like me" to "least like me," with a specified number of cards required for each pile. This scaling technique is called:

New cards

Good item

An item on a test has a discrimination index (d) of 0.35. According to the provided guidelines, this is considered a:

New cards

One group known to have mastered the skill and another group known not to have mastered it.

A test developer is creating a test for a criterion-referenced purpose (e.g., passing a licensing exam). During the pilot test, she should administer the test to:

New cards

Scaling

The process of setting rules for assigning numbers in measurement is called:

New cards

Constructed-response format

A test question that requires the test-taker to write a few sentences in response to a prompt is an example of a(n):

New cards

Item characteristic curve

A test developer plots a graph for a single item, showing the probability of a correct response on the y-axis and the test-taker's overall ability level on the x-axis. This graph is called an:

New cards

Test Revision

The final stage of the test development process, where a test's content and format are modified to improve its effectiveness, is:

New cards

Legal Context

A lawyer requests a psychological evaluation for her client to determine if he is mentally capable of understanding the legal proceedings against him. This assessment is taking place in which context?

New cards

Retrospective Assessment

A psychologist is asked to evaluate a historical figure's state of mind at the time of a major event, using diaries, letters, and historical accounts. This is an example of:

New cards

Criterion variance and information variance.

A primary advantage of using a structured clinical interview over an unstructured one is that it reduces:

New cards

Behavioral observation using the SORC model

A school psychologist observes a child in the classroom to understand the triggers and consequences of his disruptive behavior. She notes what happens immediately before the behavior and immediately after. This approach is best described as:

New cards

It can be subjective to evaluate and lacks standardization.

A major limitation of using portfolio assessment for evaluating job applicants is:

New cards

Interpretative report

A company uses a computer program to score and generate a detailed report on a candidate's personality test, including narrative statements about their likely work style. This is an example of a(n):

New cards

The assumption of error in the assessment process

The assumption that a person's score on a test is composed of their true ability plus some random influence is a core tenet of:

New cards

Confirmatory function

A psychologist is using a test battery to assess a client with a complex referral question. She uses a test of cognitive ability, a personality inventory, and a clinical interview. She finds that the results from all three tools point towards a diagnosis of major depressive disorder. In this case, the different tools are serving a(n):

New cards

It increases efficiency in scoring and data management.

A primary advantage of computer-assisted psychological assessment (CAPA) over traditional paper-and-pencil testing is:

New cards

The actuarial or mechanical approach

An approach to assessment that relies on statistical rules and probabilities, often using computer algorithms to generate findings, is known as:

New cards

WAIS-IV

A psychologist needs to conduct a comprehensive assessment of cognitive abilities for a 45-year-old man suspected of having early-onset dementia. Which of the following tests would be most appropriate?

New cards

RPM

A school psychologist wants to assess a 7-year-old child who is nonverbal and has recently immigrated to the Philippines. She wants a measure of general intelligence that minimizes the influence of language and culture. Which test is a good choice?

New cards

Mechanics

An HR manager is selecting candidates for a mechanical engineering position. She wants to assess their ability to understand basic mechanical principles of machinery and tools. Which subtest from the Flanagan Industrial Tests would be most relevant?

New cards

NEO Five-Factor Inventory (NEO-PI-3)

A clinician is assessing a client and wants to get a broad overview of their personality based on the "Big Five" model. Which of the following tests is based on this model?

New cards

F (Infrequency) Scale

A psychologist is assessing a client who may be exaggerating her psychological problems to get disability benefits. On the MMPI-2, which validity scale would be most helpful in detecting this "faking bad" response style?

New cards

The influence of social desirability bias.

The Edwards Personal Preference Schedule (EPPS) uses a forced-choice format where test-takers must choose between two statements of equal social desirability. This is done to minimize:

New cards

PUP

A Filipino psychologist wants to use a personality inventory that was developed locally and measures Filipino-oriented traits and behaviors. Which of the following would be an appropriate choice?

New cards

Social (S)

A career counselor is helping a high school student explore potential careers based on his interests. The student enjoys helping people, teaching, and working in groups. According to Holland's RIASEC model, this student would likely score highest on which theme?

New cards

Routing test

A psychologist is administering the Stanford-Binet (SB-5). She starts with the Vocabulary subtest and, based on the examinee's performance, determines which level of items to proceed with. The Vocabulary subtest is serving as a(n):

New cards

Block Design

The subtest on the WAIS-IV that requires an individual to copy two-dimensional designs using three-dimensional blocks, measuring nonverbal reasoning and visuo-spatial processing, is called:

New cards

(Bor) Borderline Features

A psychologist wants to assess a client for borderline personality features, such as emotional lability, impulsivity, and relationship problems. Which scale on the Personality Assessment Inventory (PAI) would be most relevant?

New cards

Kinetic Family Drawing Test (KFD)

A child is asked to draw a picture of his family where everyone is doing something. This is the standard instruction for the:

New cards

Thematic Apperception Test (TAT)

A therapist uses a projective test where a client is shown a series of ambiguous pictures of people and is asked to tell a story about each picture. The therapist is likely using the:

New cards

High Anxiety vs. Low Anxiety

The Myers-Briggs Type Indicator (MBTI) assesses personality on four dichotomies. Which of the following is NOT one of those dichotomies?

New cards

Achievement test

A test that measures what a person has learned in a specific academic subject, like math or history, is classified as a(n):

New cards

Aptitude test

The Differential Aptitude Tests (DAT) are designed to measure a student's potential for learning in various areas to aid in educational and vocational guidance. This makes it primarily a(n):

New cards

MCMI-III

A psychologist is assessing a client for psychopathology and wants a test specifically designed for clinical populations that is grounded in an evolutionary theory of personality. Which test would be most appropriate?

New cards

Standard Progressive Matrices (SPM)

A test user is looking for a brief, nonverbal measure of abstract reasoning for an adult of average intelligence. Which version of the Raven's Progressive Matrices would be most suitable?

New cards

Have a substantial understanding and highly specialized educational background, like a Registered Psychologist.

A test is described as requiring a "Level C" qualification for its users. This means the test user must:

New cards

Sentence Completion Test

A client is asked to complete the phrase "My greatest fear is ______." This is an item from what type of test?

New cards

Building rapport

Before administering a standardized test, the examiner engages in some brief, friendly conversation with the test-taker to make them feel more comfortable. This is part of:

New cards

Accommodation

An examiner is administering a test to a child with a visual impairment. She provides a large-print version of the test booklet. This is an example of a(n):

New cards

Flynn Effect

The observation that average scores on IQ tests have been steadily increasing over the decades is known as the:

New cards

Back translation

A test developer translates a personality inventory from English to Filipino. To ensure the translation is accurate, she has another bilingual expert translate the Filipino version back into English, and then compares it to the original. This process is called:

100

New cards

Drift

During a behavioral observation, a supervisor notices that two of his raters, who were initially very accurate, have slowly started to define "disruptive behavior" in their own unique ways, leading to inconsistent ratings. This issue is known as: