Looks like no one added any tags here yet for you.
Types of Central Tendencies
mean
median
mode
Mean
It is the sum of all the values in a group, divided by the number of values in that group.
Median
the number/value that is in the middle of a set of numbers/values
Mode
the number/value that appears the most often in a set of numbers/values
Standard deviation
a measure of variability within a set of scores
Normal curve
is the visual representation of a distribution of scores with three specific characteristics. (also known as a bell-shaped curve)
Centered on the mean
What does a percentile tell us?
a percentile is the percentage of people scoring less than a particular raw score.
It is a way to understand raw scores
Z score
Tells if a raw score is above or below the mean
most frequently used standard score
allows the comparison of scores across different distributions
z scores are calculated by
transform a raw score into a z score, subtract the mean from the raw score and then divide that difference by the standard deviation of the set of scores.
The formula used is z = X-x/SD, where z= the z score, X = the raw score, X= the mean of the set of test scores and SD = the standard deviation of the set of test scores.
How is a T score calculated from a z-score?
T = 50 +10z where T = T score and z = z score.
Where do z scores fit on the normal curve?
positive
negative
Positive z scores
always fall to the right of the mean (on the normal curve) and are in the upper half of the distribution of all the scores.
Negative z scores
always fall to the left of the mean and are in the lower half of the distribution.
What makes z-scores important and useful?
These standardized scores indicate how many standard deviations a data point is from the mean, allowing for comparison across different datasets. They help identify outliers, assess probabilities in a normal distribution, and facilitate the understanding of relative performance in statistics.
How to determine percentiles?
To determine it, sort the data from least to greatest, then use the formula:
Pr = B/N×100, where
Pr = the percentiles.
B = the number of observations with lower values.
N = the total number of observations.
Why are stanine scores useful?
to compare and scale test scores into a single-digit number on a scale from 1 to 9
useful because:
Easy to compute
Represent equal units
What is the relationship between standard error of measurement (SEM) and reliability?
As reliability decreases, the SEM increases
as reliability increases, the SEM decreases
Standard error of measurement:
is a simple way to quantify how much a test score varies for an individual from time to time and from test to test.
standard deviation of repeated tests
Standard error of measurement formula
SEM=SD√(1-r), where
SEM = the standard error of measurement
SD = the standard deviation for the set of test scores
r = the reliability coefficient of the test
Item Response Theory (IRT)
theory is a perspective on how test items should be developed and evaluated
The theory focuses on and estimates the ability of the test taker independent of the difficulty of the items.
In IRT, what do the x-axis represent?
represents the construct, the latent or underlying trait or ability that the individual test taker brings to the item itself.
The underlying ability is referred to as theta and represented as θ.
above average ability to the right
below-average ability to the left
In IRT, what do the y-axis represent?
is the probability of a correct response given a certain level of ability or θ.
The lower this value, the more difficult the particular item and vice versa.
What is the primary advantage of IRT over Classical Theory?
Information function – provides detailed item-level information
IRT allows the estimation of reliability of a group of items used together without collecting data on that particular test or mix of items.
In IRT, how do we determine “worthiness” of an item?
Analyzing its psychometric properties
Use values “a” and “b”
a = discrimination
b = difficulty
In IRT, when does the test developer consider the test development process complete
when each item fits the difficulty and discrimination level that the test author feels adequate.
What tool led to more use of IRT?
Computers
Computerized adaptive testing
What does a steep curve on an item characteristic curve (ICC) indicate?
Discrimination level
the steeper the curve, the stronger the relationship between ability and the chance of getting a question right.
When the curve is steep in Item Characteristic Curve, even a change in ability leads to a change in the item difficulty
small
What does the slope tell us?
Discrimination level (identified with a)
What does a score of zero mean on the ability scale?
Average level of ability
What does a vertical straight line on an item characteristic curve (ICC) indicate?
An item with a perfect discrimination
What does a horizontal straight line on an item characteristic curve (ICC) indicate?
Little discrimination power
an unattractive item, one that has little discrimination power and difficulty that does not change as a function of ability.
In IRT, what is an important characteristic of reliability for good test items?
discrimination parameter
What is IRT’s greatest advantage?
helps us determine if a test is providing accurate scores on people
What process has a standardized test undergone?
a process of development that includes defining its purpose, creating test items, conducting pilot testing, analyzing results for reliability and validity, and ensuring fairness across diverse populations. This process ensures that the test measures what it is intended to measure consistently and accurately.
What can be said of teacher-made achievement tests?
It is constructed by a teacher, and the effort placed on establishing validity or reliability, norming, or the development of scoring systems varies from nonexistent to thorough.
They are very situation specific and defined to suit a particular need.
What is a table of specifications?
A grid (with either one or two dimensions) that serves as a guide to the construction of an achievement test
A simple table tends to align the number of items with what?
columns
6 levels of abstraction
knowledge
comprehension
application
analysis
synthesis
evaluation
Knowledge in Bloom’s Taxonomy
focus on the recall of information
Comprehension in Bloom’s Taxonomy
focus on the understanding of information and require test taker to interpret facts, compare, and contrast different facts, infer cause adn effect, predict the consequences of a certain event
Application in Bloom’s Taxonomy
the use of information, methods, and concepts, as well as problem solving
Analysis in Bloom’s Taxonomy
require the test taker to look for and (if successful) see patterns among parts, recognize hidden meanings, and identify the parts of a problem.
Synthesis in Bloom’s Taxonomy
requires the test taker to use old ideas to create new ones and to generalize from given facts
Evaluation in Bloom’s Taxonomy
requires that the test taker compare and discriminate between ideas and make choices based on a reasonable and well-thought-out argument.
Bloom’s Taxonomy Knowledge key words
List, define, tell, describe, identify, show, label, collect, examine, tabulate, quote, name, who, when, where
Bloom’s Taxonomy Comprehension key words
Summarize, describe, interpret, contrast, predict
Bloom’s Taxonomy Application key words
Apply, demonstrate, calculate, complete, illustrate, show
Bloom’s Taxonomy Analysis key words
Analyze, separate, order, explain, connect
Bloom’s Taxonomy Synthesis key words
Combine, integrate, modify, rearrange, substitute
Bloom’s Taxonomy Evaluation key words
Assess, decide, rank, recommend, convince
What did Gesell do?
Gesell started the Child Study Center at Yale University.
documented the importance of maturation in the growth and development of young children.
He micro-studied maturation and created extensive film libraries of individual growth and development
What type of reliability would aptitude test developers be most interested in?
test-retest
Clerical Aptitude test
find individuals that are qualified to do clerical work
This test measures speed and accuracy in clerical work. Variable measures are accuracy of number and name comparison.
Mechanical Aptitude test
tests focus on a variety of abilities that fall into the psychomotor domain
Ex: assembly tests and reasoning tests
Artistic Aptitude test
find/evaluate artistic talent
Readiness Aptitude test
tests if an individuals is ready to move up/on in school
To determine potential in a future setting, what kind of validity must an aptitude test have?
predictive validity
What are the steps in creating a standardized test?
development of preliminary ideas
test specifications
test items are written
items are used in a trial setting
item are rewritten
final tests are assembled
an extensive national standardization effort
preparation of all necessary materials
Norm-referenced test:
It allow you to compare one individual’s test performance with the test performance of other individuals.
Criterion-referenced test
test where there is a predefined level of performance used for evaluation
Differential Aptitude Test
measures students’ ability to learn or to succeed in many different areas.
used for students in grades 7 to 12 and adults
What concepts does the term “high stakes” in regard to testing refer to?
a situation where the results of a single test have significant consequences
What are two types of Aptitude Tests?
norm-referenced & criterion-
referenced
What does plus mean regarding correlations?
signifies a positive correlation, meaning that when one variable increases, the other variable also increases (they move in the same direction)
What does minus mean regarding correlations?
indicates a negative correlation, meaning that when one variable increases, the other decreases (they move in opposite directions).
What determines the strength of a correlation?
correlation coefficient
How close the data points are to the line of best fit
Based on the size of the correlation coefficient
Value .8 to 1.0 refers to very strong relationship.
Value .6 to .8 refers to strong relationship.
Value 4 to .6 refers to moderate relationship.
Value .2 to .4 refers to weak relationship.
Value 0 to .2 refers to weak or no relationship.
GRE: Graduate Record Exam
is used to determine whether students are ready for graduate school
What does the GRE: Graduate Record Exam assess?
Verbal reasoning, quantitative reasoning, analytical reasoning
GRE: Graduate Record Exam age group
college juniors and seniors making applications to graduate school
General Educational Development Test (GED)
was designed to “assess skills representative of the typical outcomes of a traditional high school education.”
What does the General Educational Development Test (GED) assess?
writing skills, social studies, science, literature and the arts, and mathematics.
General Educational Development Test (GED) age group
no specified level
Terranova tests
Designed to measure achievement in the basic skills taught in schools throughout the nation.
What does Terranova tests assess?
Reading (visual recognition, word analysis, vocabulary, comprehension); spelling, language (mechanics, expression); mathematics (computation, concepts, and applications); study skills; science; social studies
Terranova Test age group
kindergarten - 12
Iowa Assessments
provide a comprehensive assessment of student progress in the basic skills.
What does Iowa Assessments assess?
sections for listening, word analysis, vocabulary, reading, language, and mathematics.
Iowa Assessments age group
k - 8
Denver Developmental Screening (Denver II)
It uses a visual form for recording scores that aligns with child’s chronological age and accounts for prematurity
Denver Developmental Screening (Denver II) age group
birth to age 6
Which validity is most used when validating an achievement test?
content validity