PSY333: Measurement & Testing (some level of finals)
#separator:tab #html:true A type of assessment that yields scores based on responses from test forms. test "First used the term ""mental tests""" Cattell Associated with the first modern-day intelligence test (measure higher mental processes) Binet First psychological laboratory that used experimental research Wundt First use of the term intelligence quotient (IQ); revised Binet Terman Associated with the Stanford Achievement Test Thorndike What was the era that first widely used group testing? WWI Group administation of intelligence test for the military; reading literacy Army Alpha Used as an intelligence test, but is the language-free version Army Beta Research on vocational assessments Thorndike Person involved in occupation selection for large groups of high school students Miner First -- much more general career counseling for the future aptitude tests Strong First modern personality inventory (WWI); measured suspectibility to mental health problems Woodworth's Personal Data Sheet measure whether or not you're ready for something aptitude "Defines purpose of test; demographics are considered; what context the test is on<br><img src=""step 1 identification.png"" width=""470"">" Step 1: Determine the goals of your client Asks the questions: What behaviors, content, skills is it intended to measure? What is the <u>theory </u>that the trait is based on? What about subsets/domains it is based on? Operationalization of test forms. Step 2: Choose instrument types to reach client goals Item formats are determined; test is written and item reviewers make sure it measures what is intended to measure Step 3: Access information about possible instruments Before this is done, a pilot test is done to make sure the items are valid, reliable, and fair, among other items. <i>Then</i>, this happens. Step 4: Examine validity, reliability, cross-cultural, fairness, and practicality of the possible instruments Validation process pilot test Determines test length, testing time, scoring approaches, and test procedures, administers test materials. Step 5: Choose an Instrument Wisely Tests which can be administered, scored, and interpreted by laypeople Level A Tests that require a psychology degree or coursework in testing Level B Tests that require an advanced psychology degree, a license and/or advanced training for that particular test Level C Knowledge or skill not related to the purpose of the test is required to answer an item correctly. cognitive sources of construct-irrelevant variance Language or images causes strong emotions that may interfere with the ability to respond to an item correctly (i.e. political opinions, beliefs) affective sources of construct-irrelevant variance aspects of tests interfere with the test takers' ability to attend to, see, hear, or sense the items or stimuli (consider disabled people!) physical sources of construct-irrelevant variance statistical relationship between two variables correlation coefficient "Used to visually examine data, especially to discover patterns (such as curvilinear relationships)<br><img src=""scatter plot.png"">" scatter plot "an increase in one variable is related to an increase in the other variable<br><img src=""pos relationship.png"">" positive relationship "an increase in one variable is related to a decrease in the other variable<br><img src=""neg relationship.png"">" negative relationship "two variables that are not related to each other<br><img src=""no relationship.png"">" no relationship "<div><font color=""#000000"">±0.70 - 1.00</font></div>" strong correlation "<div> <font color=""#000000"">±0.30 ~ 0.69</font></div>" moderate strength "<div> <font color=""#000000"">±.00 ~ 0.29</font></div>" no strength "<div> <font color=""#000000""><span style=""background-color: rgb(255, 255, 0);"">Whether scores from a test is </span><span style=""background-color: rgb(255, 255, 0);"">consistent measure </span><span style=""background-color: rgb(255, 255, 0);"">of individuals’ true scores</span> </font></div>" reliability To measure reliability, we use correlation coefficient caused by test administrators or the testing environment method error Error associated with test takers, subjects themselves trait error Relationship between scores on the same test administered twice with a time interval between the administration test-retest reliability e.g., subjects may get better at second testing, subjects knowing how they answered in a similar test form practice effects Coefficients of two equivalent tests are compared (time interval) alternate-forms reliability obtaining a reliability coefficient by assessing how items are correlated as a group internal consistency internal consistency; correlation between scores from even-numbered items and scores from odd-numbered items split-half reliability whether a test measures what it is supposed to measure (<i>accuracy)</i> validity Does the ______ ______ cover a representative sample of behaviors to be measured in its entirety? Content experts content validity Does a test predict the target trait it is intended to measure? criterion validity Focuses on the prediction of current performance or psychological behavior concurrent validity Focuses on the prediction of future performance or psychological behavior predictive validity Does an assessment measure a theoretical construct that it is designed to measure (e.g., intelligence)? construct validity Are two assessments measuring the <u>same (or similar)</u> construct related? convergent validity Are two asssessments measuring different constructs <u>not related</u>? discriminant validity Found construct you want to measure from the test scores factor analysis whether an individual's score is not affected by potential bias inherent in a test, test procedure and interpretation fairness Fairness did not get much attention until the 1960s (civil rights movement) Equal testing condition + proctors fairness in testing process Idea that all items should behave equally across all examinees fairness as lack of measurement bias accessibility in testing; showing their status on target without being advantaged or disadvantaged by their individual characteristics or opportunity to learn fairness in access to the construct as measured Statistical approach to examine test fairness by identifying items that perform differentially across subgroups of test takers while controlling for test takers' ability differential item functioning examining response processes through probing questions cognitive interview tests that measure what one has learned<br>e.g., high school exit exams achievement testing measure what one is capable of learning<br>e.g., intelligence tests aptitude testing used to assess habits, temperament, likes and dislikes, character, and similar behaviors personality assessment tests that assess problem areas of learning; often used to assess learning disabilities diagnostic tests tests that measure a broad range of cognitive ability<br>e.g., SATs cognitive ability tests tests that measure a broad range of cognitive functioning in general intelligence, intellectual disabilities, giftedness, changes in overall cognitive functioning intellectual and cognitive functioning tests that measure one aspect of ability; likelihood of success in a vocation special aptitude tests tests that measure many aspects of ability; likelihood of success in multiple vocations multiple aptitude tests tests that measure likes and dislikes as well as one's personality orientation toward the world of work; career counseling interest inventories a tool whereby an individual identifies whether he or she has, or does not have, specific attributes or characteristics classification methods tests that measure one's readiness for moving ahead in school. used to assess readiness to enter first grade readiness tests How do you calculate IQ (use / as a division sign)? mental age/chronological age x 100 What formula is used for split-half reliability due to the test being cut in half? Spearman-Brown formula "<span style=""color: rgb(0, 0, 0);"">visual for a categorical, discrete variable</span>" "bar graph<br><img src=""Screenshot 2023-02-08 114332.png"">" visual for continuous variables "histogram<br><img src=""histogram.png"">" used to see the distributional shape of data "frequency polygon<br><img src=""freq_poly.jpg"">" "(Type of curve)<br><img src=""paste-2fe6b0bb553a7741fc5cf57bc207a669dd093661.jpg"">" positively skewed "(Type of curve)<br><img src=""negative skewed.png"">" negatively skewed "Left to right, how are measures of central tendency distributed in positively skewed distributions?<br><img src=""positively skewed.png"">" Mode < Median < Mean "Left to right, how are measures of central tendency distributed in negatively skewed distributions?<br><img src=""negative skewed.png"">" Mode > Median > Mean<br> avg of squared distance from the mean variance the difference between an individual score and the mean deviation score scores that are compared to a set of test scores called the norm group norm referenced scores are compared to a predetermined standard; i.e. mastering a certain level of knowledge, used for diagnoses criterion-referenced scores proportion of people falling at and below a score in a standard normal distribution percentile µ = 50, σ = 10; used for personality tests T-scores µ = 100, σ = 15; used for tests of intelligence deviation IQ µ = 5, σ = 2, round to nearest whole number; used for achievement testing Stanines µ = 5.5, σ = 2, round to nearest whole number; used for personality inventories and questionnaires Sten scores µ = 50, σ = 21.06; used for educational tests NCE scores µ = 500, σ = 100 SAT scores µ = 21, σ = 5 ACT scores µ and σ are artbitrarily set by publisher Publisher type scores σ of test scores x √1 - reliability of a test SEM Tells us how much error there is in the test and ultimately how much any individual's score might fluctuate due to this error standard error of measurement problems with the _____ of questions comprehension failure in the information retrieving to answer (related to background characteristics) information retrieval low motivation/intention of faking or impression enhancement decision process mismatch in the choice of response option; difference in interpretation of option meanings response process "<img src=""IMG_1557.jpg"">" interquartile range formula Deviation score X (raw score) - M (mean score) Deviation score squared Variance σ x √1 - reliability of a test standard error of measurement Converting into a standard score (z-score x σ) + µ "involved in ""faking"", telling the truth" decision process "mapping the response; what is the time of ""always"" or ""almost never""?" response process
#separator:tab #html:true A type of assessment that yields scores based on responses from test forms. test "First used the term ""mental tests""" Cattell Associated with the first modern-day intelligence test (measure higher mental processes) Binet First psychological laboratory that used experimental research Wundt First use of the term intelligence quotient (IQ); revised Binet Terman Associated with the Stanford Achievement Test Thorndike What was the era that first widely used group testing? WWI Group administation of intelligence test for the military; reading literacy Army Alpha Used as an intelligence test, but is the language-free version Army Beta Research on vocational assessments Thorndike Person involved in occupation selection for large groups of high school students Miner First -- much more general career counseling for the future aptitude tests Strong First modern personality inventory (WWI); measured suspectibility to mental health problems Woodworth's Personal Data Sheet measure whether or not you're ready for something aptitude "Defines purpose of test; demographics are considered; what context the test is on<br><img src=""step 1 identification.png"" width=""470"">" Step 1: Determine the goals of your client Asks the questions: What behaviors, content, skills is it intended to measure? What is the <u>theory </u>that the trait is based on? What about subsets/domains it is based on? Operationalization of test forms. Step 2: Choose instrument types to reach client goals Item formats are determined; test is written and item reviewers make sure it measures what is intended to measure Step 3: Access information about possible instruments Before this is done, a pilot test is done to make sure the items are valid, reliable, and fair, among other items. <i>Then</i>, this happens. Step 4: Examine validity, reliability, cross-cultural, fairness, and practicality of the possible instruments Validation process pilot test Determines test length, testing time, scoring approaches, and test procedures, administers test materials. Step 5: Choose an Instrument Wisely Tests which can be administered, scored, and interpreted by laypeople Level A Tests that require a psychology degree or coursework in testing Level B Tests that require an advanced psychology degree, a license and/or advanced training for that particular test Level C Knowledge or skill not related to the purpose of the test is required to answer an item correctly. cognitive sources of construct-irrelevant variance Language or images causes strong emotions that may interfere with the ability to respond to an item correctly (i.e. political opinions, beliefs) affective sources of construct-irrelevant variance aspects of tests interfere with the test takers' ability to attend to, see, hear, or sense the items or stimuli (consider disabled people!) physical sources of construct-irrelevant variance statistical relationship between two variables correlation coefficient "Used to visually examine data, especially to discover patterns (such as curvilinear relationships)<br><img src=""scatter plot.png"">" scatter plot "an increase in one variable is related to an increase in the other variable<br><img src=""pos relationship.png"">" positive relationship "an increase in one variable is related to a decrease in the other variable<br><img src=""neg relationship.png"">" negative relationship "two variables that are not related to each other<br><img src=""no relationship.png"">" no relationship "<div><font color=""#000000"">±0.70 - 1.00</font></div>" strong correlation "<div> <font color=""#000000"">±0.30 ~ 0.69</font></div>" moderate strength "<div> <font color=""#000000"">±.00 ~ 0.29</font></div>" no strength "<div> <font color=""#000000""><span style=""background-color: rgb(255, 255, 0);"">Whether scores from a test is </span><span style=""background-color: rgb(255, 255, 0);"">consistent measure </span><span style=""background-color: rgb(255, 255, 0);"">of individuals’ true scores</span> </font></div>" reliability To measure reliability, we use correlation coefficient caused by test administrators or the testing environment method error Error associated with test takers, subjects themselves trait error Relationship between scores on the same test administered twice with a time interval between the administration test-retest reliability e.g., subjects may get better at second testing, subjects knowing how they answered in a similar test form practice effects Coefficients of two equivalent tests are compared (time interval) alternate-forms reliability obtaining a reliability coefficient by assessing how items are correlated as a group internal consistency internal consistency; correlation between scores from even-numbered items and scores from odd-numbered items split-half reliability whether a test measures what it is supposed to measure (<i>accuracy)</i> validity Does the ______ ______ cover a representative sample of behaviors to be measured in its entirety? Content experts content validity Does a test predict the target trait it is intended to measure? criterion validity Focuses on the prediction of current performance or psychological behavior concurrent validity Focuses on the prediction of future performance or psychological behavior predictive validity Does an assessment measure a theoretical construct that it is designed to measure (e.g., intelligence)? construct validity Are two assessments measuring the <u>same (or similar)</u> construct related? convergent validity Are two asssessments measuring different constructs <u>not related</u>? discriminant validity Found construct you want to measure from the test scores factor analysis whether an individual's score is not affected by potential bias inherent in a test, test procedure and interpretation fairness Fairness did not get much attention until the 1960s (civil rights movement) Equal testing condition + proctors fairness in testing process Idea that all items should behave equally across all examinees fairness as lack of measurement bias accessibility in testing; showing their status on target without being advantaged or disadvantaged by their individual characteristics or opportunity to learn fairness in access to the construct as measured Statistical approach to examine test fairness by identifying items that perform differentially across subgroups of test takers while controlling for test takers' ability differential item functioning examining response processes through probing questions cognitive interview tests that measure what one has learned<br>e.g., high school exit exams achievement testing measure what one is capable of learning<br>e.g., intelligence tests aptitude testing used to assess habits, temperament, likes and dislikes, character, and similar behaviors personality assessment tests that assess problem areas of learning; often used to assess learning disabilities diagnostic tests tests that measure a broad range of cognitive ability<br>e.g., SATs cognitive ability tests tests that measure a broad range of cognitive functioning in general intelligence, intellectual disabilities, giftedness, changes in overall cognitive functioning intellectual and cognitive functioning tests that measure one aspect of ability; likelihood of success in a vocation special aptitude tests tests that measure many aspects of ability; likelihood of success in multiple vocations multiple aptitude tests tests that measure likes and dislikes as well as one's personality orientation toward the world of work; career counseling interest inventories a tool whereby an individual identifies whether he or she has, or does not have, specific attributes or characteristics classification methods tests that measure one's readiness for moving ahead in school. used to assess readiness to enter first grade readiness tests How do you calculate IQ (use / as a division sign)? mental age/chronological age x 100 What formula is used for split-half reliability due to the test being cut in half? Spearman-Brown formula "<span style=""color: rgb(0, 0, 0);"">visual for a categorical, discrete variable</span>" "bar graph<br><img src=""Screenshot 2023-02-08 114332.png"">" visual for continuous variables "histogram<br><img src=""histogram.png"">" used to see the distributional shape of data "frequency polygon<br><img src=""freq_poly.jpg"">" "(Type of curve)<br><img src=""paste-2fe6b0bb553a7741fc5cf57bc207a669dd093661.jpg"">" positively skewed "(Type of curve)<br><img src=""negative skewed.png"">" negatively skewed "Left to right, how are measures of central tendency distributed in positively skewed distributions?<br><img src=""positively skewed.png"">" Mode < Median < Mean "Left to right, how are measures of central tendency distributed in negatively skewed distributions?<br><img src=""negative skewed.png"">" Mode > Median > Mean<br> avg of squared distance from the mean variance the difference between an individual score and the mean deviation score scores that are compared to a set of test scores called the norm group norm referenced scores are compared to a predetermined standard; i.e. mastering a certain level of knowledge, used for diagnoses criterion-referenced scores proportion of people falling at and below a score in a standard normal distribution percentile µ = 50, σ = 10; used for personality tests T-scores µ = 100, σ = 15; used for tests of intelligence deviation IQ µ = 5, σ = 2, round to nearest whole number; used for achievement testing Stanines µ = 5.5, σ = 2, round to nearest whole number; used for personality inventories and questionnaires Sten scores µ = 50, σ = 21.06; used for educational tests NCE scores µ = 500, σ = 100 SAT scores µ = 21, σ = 5 ACT scores µ and σ are artbitrarily set by publisher Publisher type scores σ of test scores x √1 - reliability of a test SEM Tells us how much error there is in the test and ultimately how much any individual's score might fluctuate due to this error standard error of measurement problems with the _____ of questions comprehension failure in the information retrieving to answer (related to background characteristics) information retrieval low motivation/intention of faking or impression enhancement decision process mismatch in the choice of response option; difference in interpretation of option meanings response process "<img src=""IMG_1557.jpg"">" interquartile range formula Deviation score X (raw score) - M (mean score) Deviation score squared Variance σ x √1 - reliability of a test standard error of measurement Converting into a standard score (z-score x σ) + µ "involved in ""faking"", telling the truth" decision process "mapping the response; what is the time of ""always"" or ""almost never""?" response process