1/28
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
What are the 7 steps in phase 3?
Sampling plan
Selecting appropriate respondents
Specifying administration and scoring methods
Piloting and revising tests
Analyzing the Results
Revising the test
Validation and cross-validation
Step 1: Sampling Plan
Define target population (age, special needs, etc.) and comparison group.
Standardization sample: represents the population the test is intended for; determines norms.
Ideally use random sampling; sample must be representative
Best method: use population proportionate stratified random sampling (age, gender, SES, culture, education).
Determine appropriate sample size.
Step 2: Selecting Appropriate Respondents
Population: all target audience member
Sample: Survey given to subset of population
Probability sampling: Uses statistics to ensure sample is representative
Types: simple random, stratified random, cluster, systematic random.
Non-probability sampling: Sampling where not all have equal chance of being selected
Types of Probability Sampling
Simple random sampling
Stratified random sampling
Cluster sampling
Systematic random sampling
Simple Random Sampling
Every member of population has an equal chance of selection to be in sample
Stratified Random Sampling
Population divided into subgroups (e.g., age, gender, SES).
Cluster sampling
Used when it’s not possible to list all individuals in a population; often used for surveys with large target populations.
The population is divided into clusters, then clusters are randomly selected.
Systematic Random Sampling
Select every nth person (e.g., every 5th).
Nonprobability Sampling
Convenience Sampling
Select any available participants; not all have equal chance.
Sample size
Refers to the number of people needed to represent the target population accurately.
Homogeneity of the population:
How similar the people in your population are to one another
The more dissimilar members are = more variation in sample
Sampling error
A statistic that shows how much error is due to lack of representation of the target population
Step 3: Specifying Administration & Scoring
Decide how the test is administered (oral, written, computer, group/individual).
Decide scoring method: hand, software, or publisher.
Raw scoring methods: Cumulative/Summative
Assumes that the more a test taker responds in a particular way, the more they have of the attribute being measured
(e.g., more correct answers, or higher Likert scale ratings).
Raw scoring methods: Ipsative Model
Test takers are given 2 or more options to choose from
Uses forced-choice items
Typically yields nominal data because it puts test takers in categories (e.g., # of T/F, Y/N, Agree/Disagree).
Shows the test taker where they are relative to themselves
Example: place an “X” next to the word in each pair that best describes your personality
Raw scoring methods: Categorical
Puts test taker in particular group/class
Scores are not compared to other test takers, but compared within the test taker (which scores are high or low).
Yields nominal data, placing test takers in categories (e.g., # of T/F, Y/N, Agree/Disagree).
Step 4: Piloting and Revising Tests
Cant assume the test will perform as expected
Pilot test investigates the test’s reliability and validity
Administer test to sample from target audience
Analyze data and revise test to fix any problems uncovered, many aspects to consider
Setting up the Pilot Test
Test situation should match actual circumstances in which test will be used (in sample characteristic setting)
Must use apa code of ethics
Conducting the Pilot Test
Evaluates test performance
Depth and breadth depends on the size and complexity of target audience and construct being measured
Step 5: Analyzing the Results
Can gather both quant and qualitative info
Use quantitative info for such things as item characteristics, internal consistency, test-retest, inter-rater convergent and discriminate validity and in some instances predictive validity
Qualitative data may be used to help make decisions
Conducting Quantitative Item Analysis
How developers evaluate the performance of each test items
Conducting Quantitative Item difficulty (p-value)
The % of test takers who respond correctly - optimal ≈ 0.5.
Conducting Quantitative Discrimination index (D)
Compares how well an item separates high scorers from low scorers
Formula:
D = U (% in upper group correct) − L (% in lower group correct)
Desirable value: 30 and above.
Conducting Quantitative Item-total correlation
A measure of strength and direction of relation between the way test takers respond to one item and they way they respond to all items as a whole
0.3 and above is desirable
Interitem correlation matrix
Displays the correlation of each item with every other item.
Checks internal consistency; Phi coefficients for dichotomous items.
Phi coefficients: The results of correlating two dichotomous variables
Item-response theory (IRT)
Estimates a test taker’s ability regardless of how hard or easy the items are.
Estimates item difficulty and discrimination regardless of the ability of the people taking the test.
Item bias
Occurs when an item is easier for one group than another.
Test items should be equally difficult for all groups.
To eliminate bias use Item Characteristic Curves (ICCs) to evaluate.
STEP 6: Revising the Test
Finalize items based on content validity, difficulty, discrimination, inter-item correlation, and bias.
When new items need to be revised or add items → re-pilot must occur to ensure that changes produce desired results
Step 7: Validation and Cross-Validation
Validation: gathering evidence that the test reliably and accurately measures the intended construct.
Content validity: First checked during test development to ensure the test measures the right constructs (construct validity).
Criterion prediction: determined later through data collection.
Cross-validation: After final revisions show reliable and valid scores, the test is administered to a different sample to confirm results.