Phase 3: Planning Standardization

0.0(0)

Studied by 6 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/33

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No study sessions yet.

34 Terms

New cards

Steps in Phase 3

Describe the reference or norm group and sampling plan for standardization
Describe your choice of scaling methods and the rationale for this choice
outline the reliability studies to be performed and their rationale
outline the validity studies to be performed and their rationale
include any special studies that may be needed for development of this test or to support proposed interpretations of performance
List the components of the test - e.g manual, record forms, test booklets, stimuli materials etc

New cards

Sampling Plan

defining the target population for comparison (age range, special needs etc)
- the first group that everyone is compared to

Looking at other norm groups
- other groups you want to compare against
you want a true random sample but this is often not possible
sample has to be representive
- best method is to use a population proportionate stratified random sampling plan
Determine the appropiate size of the sample overall

New cards

Standardized Sample

a sample of test takers who represent the population for which the test is intended to measure
- determines the norms and forms of the reference group all the examinees are compared to

New cards

Population

all members of the target audience

New cards

Sample

administering a survey to a representative subset of the population

New cards

Types of Sampling (Selecting the Appropiate respondents)

Probability Sampling
Simple Random Sampling
Systematic Random Sampling
Stratified Random Sampling
Cluster Sampling
Nonprobability Sampling
Convenience Sampling

New cards

Probability Sampling

Uses statistics to ensure that a sample is representative of a population

New cards

Simple Random Sampling

every member of a popuation has an equal chance of being chosen as a member of the sample

New cards

Systematic Random Sampling

Choosing every nth person (e.g every third person)

New cards

Stratified Random Sampling

Population is divided into subgroups

New cards

Cluster Sampling

used when it is not possible to list all of the individuals who belong to a particular population and is a method often used with surveys that large target populations

New cards

Nonprobability Sampling

Is a type of sampling in which not everyone has an equal chance of being selected from the population

New cards

Convience Sampling

The survey researcher uses any available group of participants to represent the population

New cards

Sample Size

number of people needed to represent the target population accurately
Depends on the factors of the test plan of how many ppl you need - larger the better i believe (thats what she said)

New cards

Homogeneity of the Population

how similar the people in your population are to one another

New cards

Sampling Error

a statistic that reflects how much error can be attributed to the lack of representation of the target population by the sample of respondents chosen

New cards

Distributing the Survey

how will the instrument/test be given to the respondent
mail, phone, weblink, in person

New cards

Specifying Administration and Scoring Methods

Determine such things as how test will be administered which will influence the format and content of the test items
- orally, written, computer, groups, individual
Methods of scoring chosen like if its scored by hand by test administrator, scoring software, sent to test publisher for scoring

New cards

Types of Raw Scoring Methods

Cumulative/Summative Model
Ipsative Model
Categorical Model

New cards

Cumulative/Summative Model

most common
assumes that the more a test taker responds in a particular fashion the more they have of the attribute being measured
using this model, the test taker recieves 1 point for each correct answer and the total number of correct answers becomes the raw score on the test
correct responses or responses on the Likert scale are summed
- data can be interpreted with reference to norms
Semantic Differential: Adjective pairs at each end of the continuum (e.g rich/poor)
Visual analog: the researcher assign scores through the continuum (e.g rating pain levels, each number a diff level of pain)

New cards

Ipsative Model

test takers are given 2 or more options to choose from - mostly uses forced choice items
most used in personality testing - test taker indicates which items are most like them and least like them
measures an individual's personal growth, strengths, or preferences relative to themselves over time, rather than comparing them to others (normative) or external standards (criterion-referenced)
all items are chosen to be equally desireable
typically yields nominal data because it places test takers in categories - e.g., The # of T or F, Y or N, Agree or Disagree

New cards

Categorical Model

Is used to put the test taker in a particular group or class
test takers scores are not compared to that of other test takers but compare the scores on various scales within the test taker (which scores are highest and lowest)
Typically yields nominal data b/c it places test takers in categories
- counts the number true and false answers, agree and disagree

New cards

Piloting and Revising Tests

cant assume the test will perform as expected
test developers conduct studies to determine how well a new test

performs.

Pilot test is a scientifically investigation of evidence that suggests that the test scores are reliable and valid for their specified purpose
involves administering the test to sample from target audience
analyze data and revise test to fix any problems uncovered
- many aspects to consider

New cards

Setting up the Pilot Test

Test situation should match actual circumstances in which test will be used
- e.g if the test is designed to diagnose emotional disabilities in adolescents, the participants for the pilot study should be adolescents.
The sample should be large enough to provide the power to conduct statistical tests to compare the responses of each group

the test setting of the pilot test should mirror the planned test setting.
- e.g If school psychologists will use the test, the pilot test should be conducted in a school setting using school psychologists as administrators.

developers must follow the american psychological associations codes of ethic
- strict rules for confidentiality and publish only combined results
- test takers understand that theyre in a research study/scores are used for research purposed

New cards

Conducting the Pilot Test

a scientific evaluation of the tests performance
depth and range depends on the size and complexity of the target audience and the construct being measured
- e.g tests designed for use in a single company or college program require less extensive studies than do tests designed for large audiences such as students applying for graduate school

adhere strictly to test procedures outlined in test administration instructions
generally require large sample
may also ask participants about the testing experience
pilot studies often require gathering extra data such as a criterion measure
and the length of time needed to complete the test.
important to recognize the problems with the test administration, make all necessary revisions before continuing, and conduct a new pilot test that yields appropriate results.

New cards

Analyzing the Results

Can gather both quantitative and qualitative information for things like item characteristics, internal consistency, test-rest, inter-rater, convergent and discriminate validity and sometimes predictive validity
Qualitative data may be used to help make decsions

New cards

Conducting Quantitative Item Analysis

Item analysis: how developers evaluate the performance of each test item
Item difficulty: The percentage of test takers who respond correctly vs total number of people to assess the p value (percentage/probability value)
- understand how difficult an item is
p value of .5 is optimal (higher mean its too easy, lower is too hard)
- 0 to .2 (too difficult) and .9 to 1 (too easy)

New cards

Discrimination Index

Compares the performance of those who obtained very high test scores (Upper group) with the performance of those who obtained very low test scores (lower group) on each item
D= U(# of upper group who responded correctly/total # in upper group x 100) - L (# of lower group who responded correctly/total # of lower group x 100)
D = U - L
A discrimination index of 30 and above is desirable
Negative numbers: those who scored low on the test overall responded to the item correctly and those who scored high on the test responded incorrectly.

The upper group and lower group are formed by ranking the final test scores from lowest to highest and then taking the upper third and the lower third to use in the analysis.

New cards

Item-Total Correlation

a measure of the strength and direction of the relation between the way test takers responded to one item and the way they responded to all of the items as a whole
Items that have little or no correlation with the total item score may measure a different construct from that being measured by the other items.

New cards

Interitem Correlation Matrix

displays the correlation of each item with every other item
Usually each item has been coded as a dichotomous variable (correct (1) or incorrect (0))
- Therefore, the interitem correlation matrix will be made up of phi coefficients
provides important information for increasing the test’s internal consistency.
- drop items that dont correlate with other items measuring the same construct

New cards

Phi Coefficients

The results of correlating two dichotomous (having only 2 values) variables

New cards

Item Response Theory

estimates of the ability of test takers that is independent of the difficulty of the items presented as well as estimates of item difficulty and discrimination that are independent of the ability of the test takers.
relates the performance of each item to a statistical estimate of the test taker’s ability on the construct being measured

New cards

Item Characteristic Curves

Part of Item response theory
the line that results when we graph the probability of answering an item correctly with the level of ability on the construct being measured.
provides a picture of the item’s difficulty and how well it discriminates high performers from low performers

New cards

Item Bias

when an item is easier for one group than for another group.