1/137
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
test
a measurement device or technique used to quantify behavior or aid in the understanding and predictions of behavior
item
a specific stimulus to which a person responds overtly (questions on a test)
achievement testing
testing previous learning
aptitude testing
testing potential for learning or acquiring a specific skill
intelligence testing
testing a person's general potential to solve problems
If a test is reliable its results are what?
accurate, dependable, consistent, or repeatable
what are test batteries?
two or more tests used in conjunction
standardization. why is it important to obtain a standardization samle?
The uniform procedures used in the administration and scoring of a test. This is important because without it the meaning of scores would be almost impossible to evaluate
Define representative sample. Know when and why representative samples are collected.
sample comprised of individuals similar to those for whom the test is to be used
When: when used for the general population, a rep. Sample must reflect all segments of the population
Why: it can be used as a standardized sample and be representative of an entire population which raised the validity
hypothetical construct
Processes that are not directly measurable, but which are inferred to have real existence and to give rise to measurable phenomena
operational definition
defining a way to measure a hypothetical construct
measurable phenomenon
All phenomena the construct generates (gives rise) (ex. Panic attacks and operational def if minutes spent worrying, # of anxious thoughts)
psychological testing
the process of measuring psychology-related variables by obtaining information
psychological assessment
comprised of tests, interviews, case studies, behavioral observations, apparatus, etc. it is comprised of psychological tests
What is psychometry? What are the two major properties of psychometry?
Psychometry: the branch of psychology dealing with the properties of psychological tests
-Reliability: dependability, consistency, or repeatability of the test results (measuring tool)
-Validity: Does a test measure what it purports to measure? Accuracy
norm referenced tests
compares a test takers performance to others
Criterion referenced tests
measures performance against an established criterion
What types of questions are answered by psychologists through assessment?
Diagnosis and Treatment Planning, Monitor Treatment Progress, Help clients make more effective life choices/changes, Program evaluation, Helping third parties make informed decisions
In what settings do psychologists assess and what is their primary responsibility in each?
Inpatient, Schools, Forensic (legal) settings, Employment settings, such as corporations and law firms, Career counseling, Pre-marital counseling
What are the three properties of scales that make scales different from one another?
magnitude, equal intervals, absolute zero
magnitude
(property of scale) property of moreness
equal intervals
(property of scales) Difference between 2 points on a scale has the same meaning as the difference between 2 other points that differ by the same number of units
absolute zero
(property of scales) nothing of the measured property exists
Know the four scales of measurement
(NOIR) nominal, ordinal, interval, ratio
nominal scale
Variables can be named - put into categories. What properties does it have?
-Values symbolize category membership, and can be: Classified, Counted, Proportioned
-data cannot be meaningfully: Ranked, Added/Subtracted, divided to form averages/ratios
-ex. labels
ordinal scale
Assignment of ranks according to the degree to which the measured attribute is present/absent. What properties does it have?
-data can be: Classified, Counted, Proportioned, Rank-ordered
-data cannot be: Added/Subtracted or Divided to form averages/ratios
-ex. how do you feel on a scale of 1-10
interval scale
Adjacent values on the scale represent equal intervals in magnitude of the attribute being measured. What properties does it have?
-data can thus be: Classified, Counted, Proportioned, Rank-ordered, added (creating a total-scale score), Subtracted, divided to form averages (calculating a scale mean)
-data does not have a "0" point and cannot be: Divided to form ratios
-temperature
ratio scale
Measured on a scale with a true "0" point. Allows all mathematical operations. It can be meaningfully: Classified, Counted, Proportioned, Rank-ordered, added (creating a total-scale score), Subtracted, divided to form averages (calculating a scale mean), divided for form ratios
-ex. weight scale
percentiles
Percentage of test-takers whose scores fall below a given raw score
percentile rank
the percentage of scores below a specific score in a distribution of scores. equation is (BN/)X100. B is # of cases below individual score and N is total # of scores
-Ex. Runner finishes 62nd out of 63. 1/63 = .016 = 1.6
Central tendency
indices of the central value or location of a frequency distribution with respect to the X Axis.
Three types of central tendency
mean, median, mode
mean
the arithmetic average of a distribution, obtained by adding the scores and then dividing by the number of scores. measure of central tendeny
median
Middle score in the distribution (50% ↑ 50% ↓) Rank the scores (include repeating scores) from lowest to highest If an odd number of scores, select the middle score. If an even number of scores, take the average of the middle scores. measure of central tendency
mode
measure of central tendency, the most frequently occurring score in the distribution
advantages and disadvantages of mean
Advantages: the best choice when we need a measure of central tendency to reflect the total scores, stable from sample to sample, most resistant to chance sample variation
Disadvantages: reactive to exact position of each score, it gives undue weightage to extreme values
advantages and disadvantages of median
Advantages: less sensitive to extreme scores, distributions that are skewed the median is the best measure
Disadvantages: it responds to how many scores are above or below but not how far away the scores are
advantages and disadvantages of mode
Advantages: easy to obtain, only measure that can be used for nominal scale
Disadvantages: not stables sample to sample, there may be more than one mode for a particular set of scores
variance
standard deviation squared
standard deviation
the positive square root of the variance
The Normal distribution
is the most common continuous probability distribution. The function gives the probability that an event will fall between any two real number limits as the curve approaches 0 on either side of the mean. Area underneath the normal curve is always equal to 1
skewness
A measure of the shape of a data distribution
positive skew
tail points towards to the right (towards + end)
negative skew
tail points to the left (towards - end)
Kurtosis
Index of the "peakedness" vs. "flatness" of a distribution
Playtikurtic
Negative kurtosis = flatter distribution (Plate)
Leptokurtic
Positive kurtosis = more peaked distribution (Leaping)
Mesokurtic
kurtosis at zero. normal distribution (Medium)
What is a z score? How is it calculated?
difference between a score and the mean, divided by the SD
How are T scores different from Z scores?
T scores (Unlike Z Scores): Mean = 50 and standard deviation = 10. Are all positive, Values > 70 are often considered "clinically significant" (2 SD's above)
IQR
Discards the distribution's upper and lower 25% and taking what remains IQR = Q3 - Q1 (middle 50% or 75th to 25th%)
Norm (testing)
Testing in which scores are compared with the average performance of others. Test norms are created during the standardization process and must be periodically updated.
standardization
develop specific (standardized) procedures for the administration, scoring, and interpretation of a test
To avoid bias, how should error be distributed in a psychological test?
Double blind, random sampling, want error to be unsystematic and random!
What are the five characteristics of a good theory?
Has explanatory power, broad scope, systematic, fruitful, Parsimonious
What is a scatterplot (scatter diagram)? How does it work?
a picture of the relationship between 2 variables. each point on the diagram shows where a particular individual scored on both X and Y
What is the Correlation Coefficient? With what concept should correlation not be confused?
Describes how much two measures or items
covary. How similar is the variance between the variables. Not to be confused with causation
Understand and be able to differentiate and plot positive, negative, and 0 correlation

What is the principle of least squares? How does it relate to the regression line?
statistical procedure to find the best fit for a set of data points by minimizing the sum of the offsets or residuals of points from the plotted curve
covariance
relationships between variables (How much both variables change together)
What is the Pearson product moment correlation? What meaning do the values -1.0 to 1.0 have?
It is a ratio used to determine the degree of variation in one variable that can be estimated knowledge about variation in the other variable. The closer to +1, the stronger the positive correlation is. The closer r is to -1, the stronger the negative correlation is. If |r| = 1 the variables are perfectly correlated! (continuous variables)
residual
the difference between the predicted and the observed values
What is the standard error of estimate? What is its relationship to the residuals?
the standard deviation of the residuals
skrinkage
the amount of decrease observed
when a regression equation is created for one population and applied to another
What is restricted range? To what does it lead?
Using a sample of people who won't fit the test or test is too easy or hard. It reduces range and variability (leads to flooring and ceiling effect)
What is factor analysis?
Studies interrelationships among items within a test. Data-reduction technique. Can be used as measure of internal consistency
What is the co-efficient of determination? What is the purpose of the co-efficient of determination?
The proportion of the total variation in scores on Y that we explain through X (r^2)
Know the different types of correlations and when they are used.
biserial correlation, point biserial correlation, phi coefficient, spearmans rho
biserial correlation
expresses the relationship between a continuous variable and an artificial dichotomous variable.
point biserial correlation
used when the dichotomous variable is true, meaning that the variable naturally forms two categories
phi coefficient
when both variables are dichotomous and at least one of the dichotomies is true
spearman's rho
method of correlation for finding the association between two sets of ranks tetrachoric correlation: if both dichotomous variables are artificial
What is the regression formula? Understand the different components of the formula and how they are applied.
equation of the form Y = a + bX, where X is the explanatory variable and Y is the dependent variable. The slope of the line is b, and a is the intercept (the value of y when x = 0)
simple linear regression
seeks to find the linear explanation for the relationship between 2 variables
multiple regression
multivariate analysis that considers the relationship among combinations of three or more variables; the goal is to find the linear combination of the three variables that provides the best prediction
Reliability
dependability, consistency, or repeatability of the test results (measuring tool)
-The proportion of true to total observed score variance
What contributes to measurement error?
test construction, test administration, test scoring and interpretation
What components make up Classical Test Score Theory?
assumes that each person has a true score that would be obtained is there were no errors in measurement. This is used to understand and improve reliability of test. X=T+E (observed score=true score+error)
observed score
true score plus error
In what ways can error impact the observed score?
error pulls from the true score
Test reliability is usually estimated in one of what three ways?
test retest method, parallel forms method, internal consistency
test-retest method
evaluates the error associated with taking a test at two different time (rorshark ink blot tests are not appropriate for this evaluation) there is a possibility of a carryover effect: when the first testing session influences scores from the second session (some remember their answers from their first test). or practice effects, when some skills improve with practice
Parallel forms method
this compares two equivalent forms of a test that measures the same attribute
Advantages: Reduces memory bias, One of the most rigorous assessment of reliability
Disadvantages: Hard to construct
Internal consistency
examine how people perform on similar subjects of items selected from the same form of the measure
Define parallel/alternate forms reliability. What are its advantages and disadvantages?
different forms of the same test (ACT/SAT). They are hard to make the same in difficulty but are better tests to administer.
Define split half reliability
a test is given and divided into halves that are scored separately. the results from each half are compared to one another
how do you measure split half reliability (internal consistency)
-this can cause problems when one half is more difficult than the other, if this is the case its better to use the odd-even system where one subscore is obtained from odd numbered items and vice versa
-to estimate the reliability you need to find the correlation between the two halves
-spearman brown formula: corrects for the half length: corrected r= 2r/1+r where r is the estimated correlation between the two halves of the test if each test had the total # of items in the test
How do the different aspects of internal consistency differ?
-internal consistency: examine how people perform similar subsets of items selected from the same form of the measure consistency of items within the same test. evaluate the extent in which the different items on a test measure the same ability or trait
-Split half: corrected correlation between two halves of the test
-KR20: requires you to find the proportion of people who got each item "correct"
-Alpha: designed to use KR20 with tests where there is not a right or wrong answer (such as a likert scale test) more general reliability estimate
Understand the major components of inter-rater reliability.
three different ways to do this: most common method is to record the percentage of times that two or more observers agree. Kappa statistic is the best method for assessing the level of agreement among several observers
What is the Kappa statistic and how does it relate to reliability?
measures Inter rater reliability (observations of the samples with more types of judgment) indicates the actual agreement as a proportion of the potential agreement following correction for chance agreement
What does the standard error of measurement do?
uses standard deviation of errors as the basic measure of error. allows us to estimate the degree to which a test provides inaccurate readings. the larger the standard error of measurement, the less certain we can be about the accuracy with which an attribute measured
What factors should be considered when choosing a reliability coefficient?
-Homogeneity v heterogeneity of items: is the test measuring a multi-faceted or uni-faceted construct?
-Dynamic v static characteristics: is the true score fluctuating or relatively stable over time? Does it change frequently or from situation to situation?
Why types of irregularities might make reliability coefficients biased or invalid?
How could you introduce bias basically. environment, personal, evaluator bias, not the same scoring material, tired participants, personal effects
How can one address/improve low reliability?
Increase the number of items, Factor and Item analysis, the reliability of a test depends on the extent to which all of the items measure one common characteristic, Correction for attention: estimating what the correlation between tests would have been if there had been no measurement error, Estimate the true correlation if the test did not have measurement error
What is the purpose of factor and item analysis?
to see if a certain item is bringing the reliability down. see how many factors there are in the test
What example was given in class regarding reliability
rubber yardstick
Define and be able to apply the broad definition of validity.
the usefulness and meaning of the results. can be defined as an agreement between a test score or measure and the quality it is believed to measure. is can also be defined as the answer to a question. does the test measure what is it supposed to measure?
What are the three main types of validity evidence?
construct related, criterion related, content related
What prerequisites exist for validity?
test needs to be RELIABLE. you can't have validity without reliability (you can have reliability without validity)
Define Face Validity. How does it differ from other aspects of validity?
the mere appearance of that measure has validity
Not technically validity but still important. When its obvious what you are measuring.