week 2- ppt

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/238

There's no tags or description

Looks like no tags are added yet.

Last updated 8:16 AM on 5/18/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

239 Terms

New cards

Reliability

The consistency of an instrument in measuring a specific dimension.

New cards

reliability

________ in psychological testing wherein the administration or development of a psychological test that measures dimensions about mental health, personality, intelligence, etc. is done

ensure that the psychological tests that are being developed or administered are consistent in measuring the dimensions that needs to be measured

New cards

Reliability testing

Done when developing psychological tests to lessen the Error Variance.

New cards

Error Variance

reliability testing is done when developing psychological tests to lessen what?

New cards

Error Variance

This is the statistical variability of scores caused by the influence of variables other than the independent variable (the variable that was intended to measure).

New cards

Reliability

is the degree to which a measure is free from random errors

New cards

reliability testing

Test developers conduct ___________ to lessen those errors thus the variability of scores can be considered as true variance and not a variance that was causes by extraneous variables or errors

New cards

errors

Test developers conduct reliability testing to lessen those ______ thus the variability of scores can be considered as true variance and not a variance that was causes by extraneous variables or errors

New cards

Test Score Theory

This theory assumes that each person has a true score that would be obtained if there were no errors in measurement; However, because measuring instruments are imperfect, the score observed for each person almost always differs from the person’s ability or characteristic.

New cards

Test Construction; Test Administration; Test Scoring and Interpretation.

Sources of Error Variance

New cards

Test Construction

One source of variance or difference during ___________ is item sampling or content sampling.

New cards

item sampling or content sampling

One source of variance or difference during test construction is

New cards

Item sampling or content sampling

Refers to variation among items within a test as well as to variation among items between tests.

New cards

Test Administration

Sources of error variance that occur during test administration may influence the testtaker’s attention or motivation.

New cards

Test environment, test variables, and examiner-related variables.

Test Administration: Sources of Error Variance classifications

New cards

Test Administration: Sources of Error Variance classifications

Test environment, test variables, and examiner-related variables.

These can cause the test taker to make mistakes in entering a test response thus contributing to the increase of variance of test scores

New cards

Test Environment

Examples of these are the following: Room Temperature; Level of Lighting; Amount of Ventilation; Noise.

New cards

Room Temperature

Level of Lighting

Amount of Ventilation

Noise

example of sources of variance in test environment

New cards

Testtaker Variables

Examples of these are the following: Pressing Emotional Problems; Physical Discomfort; Lack of Sleep.

New cards

Examiner Related Variables

Examiner’s physical appearance and demeanor

Examiner’s that provides clues

Level of professionalism.

New cards

Test Scoring and Interpretation

sources of error variance

there may be a less chance that errors will be committed in simple tests that are usually answered by pencil and paper

In tests that have open ended questions or observations wherein the examiner must have to interpret himself/herself.

Interpreting test results from open ended questions or observations is subjective, thus, errors in the score might be committed

New cards

Reliability Estimates

Test-Retest Reliability Estimates

Parallel-Forms and Alternate-Forms Reliability Estimates

Split-Half Reliability

Other Methods of Estimating Internal Consistency

Measures of Inter-Scorer Reliability.

New cards

Test-Retest Reliability Estimates

This estimates the reliability of a measuring instrument by using the same instrument to measure the same thing at two points in time; This estimate is obtained by correlating pairs of scores from the same people on two different administrations.

New cards

Test-Retest measure

The _________ is appropriate when evaluating the reliability of a test that purports to measure something that is relatively stable over time, such as a personality trait.

New cards

Stable factors

Personality Trait; Intelligence Quotient;

hindi basta- basta nababago

New cards

fluctuating factors

Past Learning; Skill Level.

nababago

New cards

Coefficient of stability

When the interval between testing is greater than six months, the estimate of test-retest reliability is often referred to as the

New cards

six months

When the interval between testing is greater than _________, the estimate of testretest reliability is often referred to as the coefficient of stability

New cards

Test-Retest Reliability

This type of reliability testing is not appropriate for tests that measure a constantly changing characteristic for obvious reasons.

For example, the reliability of a test measuring happiness should not be tested for reliability using __________ because happiness varies in accordance with the situation experienced by an individua

New cards

Parallel-Forms

_______ of a test exist when, for each form of the test, the means and the variances of observed test scores are equal.

New cards

Parallel-Forms

Mr. Bitoy, a psychologist creates two forms of the test measuring confidence and administers both to the same standardization sample. The scores of both tests are then tested for correlation.

New cards

Parallel-Forms method

This method of testing reliability provides one of the most rigorous assessments of reliability commonly in use.

Often test developers find it burdensome to develop two forms of the same test, and practical constraints make it difficult to retest the same group of individuals

New cards

Alternate Forms

are simply different versions of a test that have been constructed so as to be parallel.

New cards

Alternate Forms equivalence

Although they do not meet the requirements for the legitimate designation “parallel,” alternate forms of a test are typically designed to be equivalent with respect to variables such as content and level of difficulty.

New cards

Split-Half Reliability Estimates

It is obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once; It is a useful measure of reliability when it is impractical or undesirable to assess reliability with two tests or to administer a test twice.

New cards

Split-Half Reliability step 1

Divide the test into equivalent halves.

New cards

Split-Half Reliability step 2

Calculate a Pearson r between scores of the two halves of the test.

New cards

Split-Half Reliability step 3

Adjust the half-test reliability using the Spearman-Brown formula.

New cards

Divide the test into equivalent halves

When it comes to split-half reliability coefficients, there’s more than one way to split a test – but there are some ways you should never split a test.

New cards

Middle split

Simply dividing the test in the middle is not recommended because it’s likely this procedure would spuriously raise or lower the reliability coefficient.

New cards

Random assignment split

One acceptable way to split a test is to randomly assign items to one or the other half of the test.

New cards

Odd-even split

Another acceptable way to split a test is to assign odd-numbered items to one half of the test and even-numbered items to the other half.

New cards

Spearman Brown Formula

The Spearman-Brown formula allows a test developer or user to estimate internal consistency reliability from a correlation of two halves of a test.

New cards

Spearman Brown Formula general use

It is a specific application of a more general formula to estimate the reliability of a test that is lengthened or shortened by any number of items.

New cards

Spearman Brown Formula data

This formula is best used for continuous and ordinal data.

New cards

Inter-Item Consistency

Refers to the degree of correlation among all the items on a scale.

New cards

Inter-Item Consistency calculation

A measure of inter-item consistency is calculated from a single administration of a single form of a test.

New cards

Interitem consistency and homogeneity

An index of interitem consistency, in turn, is useful in assessing the homogeneity of the test.

New cards

Homogeneous tests

Tests are said to be homogeneous if they contain items that measure a single trait.

New cards

Kuder-Richardson formula 20 (KR-20)

This is best used to measure internal consistency of tests with dichotomous items.

New cards

Dichotomous items

Example of this are questions that are answerable by yes or no, male or female, and true or false.

New cards

Coefficient Alpha

Coefficient Alpha is appropriate for use on tests containing non-dichotomous items.

New cards

Coefficient Alpha use

Coefficient Alpha is widely used as a measure of reliability, in part because it requires only one administration of the test.

New cards

Measures of Inter-Scorer Reliability

Measures of Inter-Scorer Reliability.

New cards

Inter-Scorer Reliability

Is the degree of agreement or consistency between two or more scorers (or judges or raters) with regard to a particular measure.

New cards

High Inter-Scorer Reliability coefficient

If the reliability coefficient is high, the prospective test user knows that test scores can be derived in a systematic, consistent way by various scorers with sufficient training.

New cards

Inter-Scorer Reliability example

Example of this is when judges in gymnastics scores the athlete based on various criteria. The extent to which their scores is consistent with each other is called Inter-Scorer Reliability.

New cards

Validity

Defined as the agreement between a test score or measure and the quality it is believed to measure.

New cards

Validity question

Does the test measure what is supposed to measure?

New cards

Evidence

Evidence is the evidence for inferences made about a test score (construct related, criterion related, and content related).

New cards

Validation

Validation is the process of gathering and evaluating evidence about validity.

New cards

Local validation studies

Necessary for test users to plan to alter in some way the format, instructions, language, or content of the test.

New cards

Types of evidence

Construct-related; Criterion-related; Content-related.

New cards

Face Validity

Relates more to what a test appears to measure to the person being tested than to what the test actually measures.

New cards

Face Validity judgment

A judgement concerning how relevant the test items appear to be.

New cards

Lack of face validity

A test’s lack of face validity could contribute to a lack of confidence in the perceived effectiveness of the test.

New cards

Face Validity effect on test taker

Decrease in test taker’s cooperation or motivation.

New cards

Face Validity effect on administrators

Unwillingness of administrators or manager to buy to the use of a particular test.

New cards

Content Validity

This measure of validity is based on evaluation on an evaluation of the subjects, topics, content covered by the items in the test.

New cards

Content Validity description

Describes a judgment of how adequately a test samples behavior representative of the universe of behavior that the test was designed to sample.

New cards

Content Validity and construct vision

Ideally, test developers have a clear vision of the construct being measured, and the clarity of this vision can be reflected in the content validity of the test.

New cards

Content Validity and key components

Test developers strive to include key components of the construct targeted for measurement, and exclude content irrelevant to the construct targeted for measurement.

New cards

Test Blueprint

From the pooled information, there emerges a Test Blueprint for the structure of the evaluation – a plan regarding the types of information.

New cards

Criterion Validity

This is a measure of validity obtained by evaluating the relationship of scores obtained on the test to scores on other tests or measures.

New cards

Criterion Validity judgment

A judgement of how adequately a test score can be used to infer an individual’s most probable standing on some measure of interest.

New cards

Criterion

The standard against which a test or a test score is evaluated.

New cards

Adequate criterion

An adequate criterion is relevant; Valid for the purpose for which it is being used; Uncontaminated.

New cards

Concurrent Validity

An index of the degree to which a test score is related to some criterion obtained at the same time (concurrently).

New cards

Predictive Validity

An index of the degree to which a test score predicts some criterion measure.

New cards

Concurrent Validity description

If test scores are obtained at about the same time as the criterion measures are obtained, measures of the relationship between the test scores and the criterion.

New cards

Predictive Validity description

Measures of the relationship between the test score obtained at a future time provided.

New cards

Predictive Validity example

Measures of the relationship between college admission tests and freshman grade point averages provide evidence of the predictive validity of the admissions tests.

New cards

Base Rate

Expressed in the population.

New cards

Hit Rate

Proportion of people identifies/ exhibit a particular trait/ characteristics.

New cards

Miss Rate

Fails to identify as having or not having a characteristics/trait.

New cards

False Positive

A miss wherein a test predicted that the test taker DID possess a trait being measured wherein fact the test taker DID NOT.

New cards

False Negative

A miss wherein a test predicted that the test taker DID NOT possess wherein fact he/she actually DID.

New cards

Validity Coefficient

A correlation coefficient that provides a measure of the relationship between test scores and scores in criterion.

New cards

Increment Validity

May be used in when predicting something like academic success in college.

New cards

Increment Validity and hierarchical regression

Uses hierarchical regression as it estimates how well a criterion can be predicted within existing predictors and evaluates how new predictors improve when new predictors is added in the equation.

New cards

Construct Validity

This is a measure of validity that is arrived at by executing a comprehensive analysis of how scores on the test relate to other test scores and measures.

New cards

Construct Validity theoretical framework

How scores on the test can be understood within some theoretical framework for understanding the construct that the test was designed to measure.

New cards

Construct Validity judgment

A judgment about appropriateness of inferences drawn from test scores regarding individual standings on a variable called construct.

New cards

Intelligence construct

Intelligence is a construct that may invoked to describe why a student performs well in school.

New cards

Anxiety construct

Anxiety is a construct that may invoked to describe why a psychiatric patient paces the floor.

New cards

Evidence of Construct Validity

Evidence of homogeneity; Evidence of changes in age; Evidence of pretest-posttest changes; Evidence from distinct groups; Correlating scores on other tests in accordance with what would be predicted from a theory that covers the manifestation of the construct in question.

New cards

Evidence of Homogeneity

The test is homogeneous, measuring a single construct.

New cards

Evidence of Homogeneity example

A test of academic achievement that contains subtests in areas of mathematics, spelling, and reading comprehension.

New cards

Pearson r in homogeneity

The Pearson r could be used to correlate average subtest scores with the average total test score.

100

New cards

Subtests and homogeneity

Subtests that in the test developer’s judgment do not correlate well with the test as a whole might have to be reconstructed (or eliminated).