Comprehensive Notes on Psychological Assessment and Evaluation
Psychological Assessment: Comprehensive Study Notes
Purpose of psychological assessment
Answer referral questions by gathering and integrating psychology-related data to inform evaluation using tools such as tests, interviews, case histories, etc.
Distinguish between related terms:
Psychological Assessment: data gathering, integration, and interpretation for evaluation.
Psychological Testing: measuring psychology-related variables via devices/procedures to sample behavior.
Educational Assessment (NAT, NCA): evaluate abilities/skills relevant to school success.
Retrospective Assessment: infer psychological aspects from past experiences.
Remote Assessment: online/tele-assessment data gathering when in-person isn't possible.
Ecological Momentary Assessment: in-the-mild, in-the-moment evaluation of problems at time/place they occur.
Planning and process of assessment
Planning Phases:
1) Referral
2) Intake Interview to Generate Hypothesis
3) Selection of Assessment Tools
Implementation Phase
4) Scoring and Interpretation of Results
5) Report Writing
6) Explanation of Results
Key concept: A systematic process aligns purpose, methods, tools, interpretation, and reporting of findings.
Purpose vs Method vs Interpretation vs Report:
Purpose: Determine referral question and hypothesis.
Method: Select tools/tests to answer the referral question.
Interpretation: Combine results to answer the referral question.
Report: Present results to the client to address the referral question.
Approaches to assessment
Collaborative Psychological Assessment: assessee and assessor co-create understanding.
Therapeutic Psychological Assessment: therapeutic self-discovery encouraged during assessment.
Dynamic Assessment: interactive model (evaluation → intervention → evaluation); common in educational settings.
Test: measuring device/procedure; Psychological Test: tool to measure psychology-related variables.
Interview: direct, face-to-face data gathering; note verbal and nonverbal cues; can be panel or motivational interviewing (therapeutic dialogue using empathy and cognition-altering techniques to affect motivation).
Portfolio: work products retained in various media (paper, canvas, film, etc.).
Case History Data: records/transcripts preserving archival information relevant to assessee; Case Study: narrative about a person/event based on case history data; Groupthink: tendency to reach consensus that may bias decisions.
Behavioral Observation: monitor actions to record quantitative/qualitative data; includes Naturalistic Observation in natural settings.
Role-Play Test: simulate a situation and evaluate expressed thoughts/behaviors/abilities.
Computer-Assisted Testing: computers assist test administration; not about test-takers but the process/tools.
Computer Adaptive Testing (CAT): items adapt to test-taker ability during the test.
Computer Adaptive Testing (CAT)
Tailors test items to test-taker ability, improving precision and efficiency.
Test security and administration phases
Before the Test: ensure security, test selection, administrator competence, materials prepared, informed consent obtained.
During the Test: build rapport, explain purpose, provide clear instructions.
After the Test: safeguard results, scoring and interpretation, reporting, explaining results to test-taker.
Purpose, method, interpretation, and reporting (assessment planning framework)
Purpose: determine referral question and hypothesis.
Method: select tools/tests to answer the referral question.
Interpretation: synthesize results to answer the referral question.
Report: present findings to clients.
Settings and sources
Educational Setting, Clinical Setting, Counseling Setting, Organizational Setting.
Test catalogues include: test manuals, professional books, reference books, journal articles, online databases.
Reports should articulate the conclusion and deconstruct the queried variable (e.g., what does "fitness to work" or "emotional damage" mean?).
Assessment hypothesis is grounded in the psychometrician’s theoretical orientation and guides method/tool selection.
Psychological Autopsy: retrospective reconstruction of psychological state post-mortem.
Reports should emphasize strength-based format and honesty (truthful, beneficial, informative, compassionate, humble).
Research questions should be beneficial, informative, truthful, compassionate, humble (BITCH).
Ethical and legal considerations (four pillars and professional standards)
Four Pillars of Ethical Psychological Assessment:
Right to Informed Consent: provide realistic information about test performance; for minors, inform guardians.
Right to Privacy and Confidentiality: privileged information protected by law; disclosures allowed in three main situations: harm to self, harm to others, court subpoenas.
Right to the Least Stigmatizing Label: use non-stigmatizing terminology in reporting.
Competence: task, acquisition, maintenance, and specialty; supervise, consult, and engage in continuing education.
Responsibilities and standards:
Relationship with clients; avoid exploitation; diversity and non-discrimination; communicate truthfully with the public.
General Ethical Principles: Beneficence/Non-Mmaleficence, Fidelity/Responsibility, Integrity, Justice, Respect for rights and dignity.
Areas of Competence (AFAMS): Foundation, Acquisition, Maintenance, Specialty; means of maintaining competence include supervision, education, consultation, training, self-directed learning.
Functional competencies in assessment: selection, use, integration, interpretation, reporting, explanation of use.
Informed consent, confidentiality, and ethics of child rights.
The APA ethical framework and levels of testing (Level A–C)
Level A: Basic/administration with manual guidance; general achievement tests.
Level B: Tests requiring knowledge of test construction/use and related fields (e.g., RPm, RPsy, aptitude tests).
Level C: Substantial understanding plus supervised experience (e.g., Projective tests, Stanford-Binet, MMPI).
Standards: must be followed by all psychologists.
Guidelines: aspirational rather than mandatory; aim to provide appropriate services.
Right to consent, privacy, and confidentiality (expanded)
Consent: voluntary agreement with understanding of purpose, process, risks/benefits, alternatives.
Privacy: respect for the individual; confidentiality limits when required by law, safety concerns, or court orders.
Disclosure considerations: harm to self/others, court subpoenas, professional obligations.
Value of consent for minors: guardians informed of results and recommendations.
Professional conduct and client welfare
Compassion: avoid harming clients; primum noncere (first, do no harm).
Competence and ongoing development; seek consultation; be mindful of cultural differences; adhere to ethical governance.
Measurement scales and basic statistics (descriptive statistics and measurement concepts)
Scales of measurement:
Nominal: categorization (names, categories).
Ordinal: order/ranking.
Interval: equal intervals, no true zero.
Ratio: true zero, meaningful zero.
Raw score: unmodified tally of correct answers or points.
Frequency distribution: how scores occur; can be shown as histograms, bar graphs, etc.
Measures of central tendency:
Mean (x̄)
Median
Mode
Measures of variability:
Range
Interquartile Range (IQR) = Q3 - Q1
Semi-interquartile range = IQR/2
Average Deviation
Variance (c2^2)
Standard Deviation (SD) = sqrt( Variance )
Distribution shapes and statistics
Skewness (asymmetry): Positive skew (tail to the right), Negative skew (tail to the left).
Kurtosis (pointiness): Platykurtic, Leptokurtic, Mesokurtic; Normal curve often called Gaussian or Laplace-Gaussian.
Normal curve properties
Bell-shaped, symmetrical; mean = median = mode.
Z-scores convert raw scores to standard deviation units:
Z = rac{X - ar{X}}{SD}T-scores convert Z to a mean of 50 and SD of 10:
T = 50 + 10ZStanine: 1–9 scale; mean 5, SD ≈ 2; whole-value scores 1–9.
Probability, and standard normal distribution concepts used to interpret test scores.
Reliability and measurement error (Classical Test Theory and extensions)
Reliability: consistency of a test across items, forms, or occasions.
Types of reliability:
Test-retest reliability: stability across time.
Parallel-forms (alternate-forms) reliability: equivalence across forms.
Split-half reliability: internal consistency across halves; often adjusted with the Spearman-Brown formula.
Inter-item consistency: internal consistency of items within a test.
Inter-scorer reliability: agreement among scorers.
Key formulas:
Cronbach's alpha (internal consistency):
\alpha = \frac{k}{k-1} \left(1 - \frac{\sum{i=1}^{k} \sigma^2{i}}{\sigma^2{X}}\right) where k = number of items, \sigma^2i = variance of item i, and \sigma^2_X = variance of total test score.KR-20 (remark for dichotomous items):
\text{KR-20} = \frac{k}{k-1} \left(1 - \frac{\sum pi qi}{\sigma^2X} \right) where pi = proportion correct on item i, qi = 1 - pi.KR-21 (equal-item-difficulty approximation):
\text{KR-21} = \frac{k}{k-1} \left(1 - \bar{p}^2 \right)
where \bar{p} is the average item difficulty (mean proportion correct).Spearman-Brown prophecy formula (split-half):
\rho_{SB} = \frac{2r}{1 + r}
where r is the correlation between the two halves.Standard Error of Measurement (SEM):
SEM = SD \sqrt{1 - r{tt}} where r{tt} is the reliability coefficient.Standard Error of the Difference (for comparing two scores):
SEM_{\Delta} = \sqrt{2} \cdot SEMConfidence intervals around observed scores often use SEM (e.g., for a CI around a true score).
Classical Test Theory concepts:
True score (T), Observed score (X), Error (E) with: X = T + E
Reliability is the ratio of true-score variance to total variance.
Other reliability extensions:
Domain Sampling Theory and Generalizability Theory: broader frameworks for reliability across facets/sources of variation.
Item Response Theory (IRT): models relating latent traits to item responses; useful for handling dichotomous and polytomous items; key ideas: difficulty, discrimination, and guessing parameters; supports adaptive testing.
Reliability vs validity: reliability is a prerequisite for validity but does not guarantee validity; validity is about whether the test measures what it is supposed to measure.
Measurement error sources: item/sample content, test administration conditions, scoring/interpretation, etc.
Reliability considerations for different test types:
Speed tests vs power tests: speed tests often have restricted time; reliability may be estimated via alternate forms, test-retest, or split-half with adjustments (Spearman-Brown).
Validity: evidence and types
Validity refers to the degree a test measures what it claims to measure and predicts relevant outcomes.
Types of validity evidence:
Content validity: the extent to which test content covers the domain; assessed via Content Validity Ratio (CVR) and Content Validity Index (CVI).
CVR formula: CVR = \frac{ne - N/2}{N/2} where ne is the number of experts indicating an item is essential and N is the total number of experts.
CVR ranges from -1 to 1; per-item decisions: retain if CVR >= 0.25, revise if < 0, reject if negative (thresholds vary by sample size).
Criterion validity (concurrent/predictive): relationship with an external criterion measured at the same time (concurrent) or in the future (predictive).
Measured by correlation with the criterion (r). Higher |r| indicates stronger validity.
Construct validity: the extent to which the test scores relate to the theoretical construct; evidence from convergent and discriminant validity, factor analysis.
Convergent validity: scores correlate highly with other measures of the same construct.
Discriminant validity: scores do not correlate highly with measures of different constructs.
Factor analysis (Exploratory vs Confirmatory): identifies underlying factors and tests model fit.
Factor loading indicates the extent a factor explains a test score.
Ecological validity: how well a test predicts real-world behavior.
Face validity: whether a test appears to measure what it claims to measure; not sufficient alone for validity.
Test bias and fairness:
Test bias: systematic disadvantage to certain groups; needs to be identified and mitigated.
Test fairness: equitable use of tests across populations; consider culture, language, and access.
Validity coefficients interpretation (range-based guidance):
Content validity and reliability interplay with validity; higher validity coefficients imply stronger predictive power for intended criteria.
Validity and performance: a valid test can be reliable; a reliable test is not necessarily valid.
Test utility and decision analysis
Utility analysis evaluates whether benefits of using a test outweigh costs.
Common approaches:
Taylor-Russell tables: estimate the benefit of test use in selection given test validity, base rates, and selection ratios.
Naylor-Shine tables: evaluate the incremental gain in mean criterion performance by test use.
Brogden-Cronbach-Gleser method: compute the monetary gain of using a test under specific conditions.
Utility analysis factors:
Test validity (rxy)
Selection ratio (SR)
Base rate (BR) in the population
Costs and benefits (economic and non-economic)
Cut scores and decision strategies:
Fixed cut score: a single threshold to pass/fail.
Relative cut scores: norm-referenced thresholds for screening.
Multiple cut scores: for multi-stage or multi-instrument selection.
Multistage/multi-hurdle selection: sequential thresholds; compensatory models (e.g., multiple regression) versus non-compensatory models.
Methods for setting cut scores:
Angoff: expert judgments averaged to set cut scores.
Known groups method (contrast groups): use groups known to possess or lack the trait to set cutoffs.
IRT-based approaches: linking item difficulty to passing thresholds; bookmark and item-mapping methods.
Other methods: predictive yield, discriminant analysis, etc.
Methods of test construction and item development
Test development lifecycle:
Conceptualization
Norm-referenced vs criterion-referenced design
Pilot work and test tryout
Item bank creation
Item formats: selected-response vs constructed-response
Scaling and standardization
Pilot testing and cross-validation (cross-validation/shrinkage)
Co-validation (co-norming) across tests
Item formats and response types:
Selected-response: multiple choice, true/false, Likert, matching
Constructed-response: short answer, essay, performance tasks
Item analysis and quality checks:
Item difficulty index (proportion correct)
Item reliability index (internal consistency per item)
Item validity index (predictive/criterion validity per item)
Item discrimination index (how well item differentiates high/low scorers)
Item-characteristic curves (ICCs)
Qualitative item analysis: think-aloud, expert panels
Test tryout and item revision:
Pilot items, field-test performance, revise based on analyses.
Cross-validation/revalidation with new samples; cross-validation may reduce validity estimates (shrinkage).
Item formats for computer administration:
Item banks; CAT (computerized adaptive testing); adaptive item branching (item branching).
Scales and scaling methods:
Norm-referenced vs criterion-referenced scaling
Age-based, group-based, stanine scaling
Standards for test construction:
Floor and ceiling effects: floor if test too hard; ceiling if too easy; address via item selection and test design.
Valid item criteria:
A good item is answered correctly by high scorers and not by low scorers overall; problematic items may appear to misfit.
Special domains: intelligence, creativity, and aptitude
Intelligence (historical and theoretical perspectives):
Francis Galton, Alfred Binet, David Wechsler, Jean Piaget
Interactionism: genes and environment influence intelligence (e.g., Thurstone PMA, Spearman G and S factors)
Spearman: two-factor theory (G general, S specific)
Gardner's multiple intelligences: linguistic, logical-mathematical, spatial, musical, interpersonal, intrapersonal, bodily-kinesthetic, naturalistic, etc.
Cattell-Horn-Carrol (CHC) theory: broad/narrow factors; general intelligence (Gf, Gc, etc.)
Horn's expansion: Gv (visual), Ga (auditory), Gq (quantitative), Gs (processing speed), Grw (reading/writing), Gsm (short-term memory), Glr (long-term memory/retrieval), etc.
Carroll's three-stratum theory: general (III), broad (II), narrow (I).
Thorndike’s triarchic theory: social, concrete, abstract intelligence.
Wechsler and Stanford-Binet: major standardized intelligence batteries; WAIS/WAIS-R/WAIS-IV; SB Series; WISC for children; Wechsler scales emphasize verbal and performance measures; SB emphasizes reasoning with age-based norms.
Information-processing and cognitive theories of intelligence: processing speed, working memory, executive functions; PASS model (Planning, Attention, Simultaneous, Successive).
Creativity: original ideas, fluency, flexibility, elaboration; convergent vs divergent thinking; tests of creativity and problem solving.
Nonverbal intelligence and culture-fair/information tests: RPM, CFIT, culture-free tests; Flynn effect (rise in IQ scores over time).
Neuropsychological and clinical assessments
Neuropsychological evaluation: assesses brain-behavior relationships; includes history taking, MSE, neuropsychological tests; consider effects of medications, brain injury, diseases.
Common neuropsychological tests and constructs: executive function, abstract reasoning, memory (short-term/long-term), language, visuospatial abilities; commonly used batteries include Halstead-Reitan and others.
Neurology/Neuropsychology terms: neurons, CNS, PNS, contralateral control, lesion types (focal vs diffuse), acalculia, aphasia, amnesia, agnosia, apraxia, ataxia, etc.
Special evaluation topics: abuse/neglect, custody evaluations, forensic assessments, competency to stand trial, risk assessment, and duty to warn.
Psychological report structure and interviewing skills
A psychological report is the end product; integrates data to help life outcomes; tailor reports to client needs; cannot compensate for poor testing or incompetence.
Typical report sections:
Identifying Information
Referral Question
Evaluation Procedures
Background Information
Behavioral Observation / Mental Status Exam (MSE): ASEPTIC framework (Appearance/Behavior, Speech, Emotion, Perception, Thought Content, Insight/Judgement, Cognition)
Predisposing, Precipitating, Perpetuating, Protective factors
Ego Syntonic vs Ego Dystonic features
Interpretations and Impressions
Summary and Recommendations
Scaling and Immediacy, Probing, Priorities, Problem-Solving insights
Interviewing and counseling skills:
Assessment interviewing skills; Level 1–4 competencies; the importance of ethical conduct.
Counseling vs psychotherapy: counseling is shorter-term and aims to address life challenges; psychotherapy may address deeper issues.
Interview types: structured vs unstructured; stress/hypnotic/cognitive/collaborative interviews.
Interview response levels ( Level 1–5 ) and active listening as foundation for rapport.
Types of clients and ethical considerations with diverse populations.
Tests and measures in psychology
Key concepts in test development:
Norm-referenced vs criterion-referenced benchmarks.
Pilot work and item analysis; cross-validation and co-validation.
Internal consistency measures (KR-20/21, Coefficient Alpha, APD) and inter-scorer reliability.
Homogeneity vs heterogeneity; dynamic vs static traits; restriction/inflation of range; speed vs power tests.
Validity and bias:
Importance of validity evidence; test bias and fairness concerns; culture-fair and culture-informed assessments.
Common test domains and examples mentioned:
Intelligence tests: Stanford-Binet, WAIS/WAIS-R/WAIS-IV, WISC, RPM, CFIT, ITTER tests, PPVT, Leiter, Bayley scales, SB series, etc.
Achievement and aptitude tests: WIAT, WIAT-III, WJ-IV, K-ABC, KABC-II, CMMS, PPVT, IT-PAs, Bender Gestalt, Raven matrices, G-HDT, CPIT, etc.
Personality measures: NEO-PI-R, MMPI (and MMPI-3), CPI, 16PF, CPI-3rd edition, Guilford-Zimmerman, Rosenberg Self-Esteem, GSE, DRS, Hope Scale, LOT-R, SWLS, PANAS, etc.
Projective and semi-projective measures: Rorschach, TAT, Holtzman Inkblot, Drawings (Draw-a-Person, House-Tree-Person, Kinetic Family Drawing), Rotter’s incomplete sentences, Sentence Completion, etc.
Behavioral and neuropsychological measures: behavioral observations, behavioral assessment methods, lead roles in group settings, biofeedback, plethysmography, polygraph.
Statistical considerations in assessments
Normal distribution and standard scores (Z-scores, T-scores, Stanine):
Z-scores: Z = \frac{X - \mu}{\sigma}
T-scores: T = 50 + 10Z
Stanine: 1–9 scale with mean 5 and SD ~2.
Reliability and validity interpretation guidance:
Reliability coefficients in the .90s denote strong precision; .80s are generally adequate for many clinical decisions; .65–.70 may be a weak passing level for some uses.
Normal curve and distribution properties:
The normal curve is Gaussian; symmetry implies mean = median = mode.
Z-scores allow comparison across different distributions.
Practical measurement concerns:
Floor/Ceiling effects signal poor item range; revise item difficulties to better discriminate across ability levels.
Key historical and theoretical context (highlights)
Historical development: contributions from Galton, Binet & Simon, Wechsler, Spearman, Pearson, Rousseau, etc.
Culture and assessment: culture-specific tests, culture-free/fair tests; Flynn effect; the impact of culture on test norms and interpretation.
Important concepts: standardization, normative data, reliability, validity, test bias, fairness, and culturally informed assessment.
Practical implications and reporting
Reports should present information in strength-based language with clinically useful implications.
Interpretations should be grounded in theory, empirical evidence, and the specific referral question.
Consider the utility of the test data for decision-making in education, clinical, and organizational contexts.
Quick reference: common formulas used in assessment
Content Validity Ratio (CVR):
CVR = \frac{n_e - N/2}{N/2}KR-20 (dichotomous items):
\text{KR-20} = \frac{k}{k-1} \left(1 - \frac{\sum pi qi}{\sigma^2} \right)KR-21 (equal difficulty items):
\text{KR-21} = \frac{k}{k-1} \left(1 - \bar{p}^2 \right)Cronbach's Alpha (internal consistency):
\alpha = \frac{k}{k-1} \left(1 - \frac{\sum{i=1}^{k} \sigma^2{i}}{\sigma^2_{X}} \right)Spearman-Brown correction (split-half):
\rho_{SB} = \frac{2r}{1 + r}Standard Error of Measurement (SEM):
SEM = SD \sqrt{1 - r_{tt}}SEM for score difference:
SEM_{\Delta} = \sqrt{2}\, SEMZ-score and T-score conversions:
Z = \frac{X - \mu}{\sigma}, \quad T = 50 + 10Z
Summary of core expectations for exam preparation
Understand distinctions among assessment types and planning stages.
Be familiar with ethical principles, consent, confidentiality, and cultural considerations.
Be able to explain and compute basic reliability and validity indices, including Cronbach’s alpha, KR-20/21, and Spearman-Brown, with correct formulas.
Recognize the role of different validity evidence and how to gather construct/criterion validity data (convergent, discriminant; factor analysis).
Know the differences between norm-referenced and criterion-referenced assessments, including how cut scores are set and how to interpret them.
Be able to discuss test utility analyses (Taylor-Russell, Naylor-Shine, Brogden-Cronbach-Gleser) and factors affecting utility.
Understand the structure of psychological reports and the importance of MSE and interview skills.
Be familiar with major theories of intelligence and their implications for testing (G, CHC, Gardner, etc.).
Note on breadth
This set of notes consolidates a wide range of topics covered in the transcript, including planning, ethics, measurement theory, test construction, statistics, intelligence, neuropsychology, and reporting. Use these as a framework for deeper review of each topic, and supplement with course manuals and practice problems where available.
End of notes