Comprehensive Notes on Psychological Assessment and Evaluation

Psychological Assessment: Comprehensive Study Notes

Purpose of psychological assessment
- Answer referral questions by gathering and integrating psychology-related data to inform evaluation using tools such as tests, interviews, case histories, etc.
- Distinguish between related terms:
- Psychological Assessment: data gathering, integration, and interpretation for evaluation.
- Psychological Testing: measuring psychology-related variables via devices/procedures to sample behavior.
- Educational Assessment (NAT, NCA): evaluate abilities/skills relevant to school success.
- Retrospective Assessment: infer psychological aspects from past experiences.
- Remote Assessment: online/tele-assessment data gathering when in-person isn't possible.
- Ecological Momentary Assessment: in-the-mild, in-the-moment evaluation of problems at time/place they occur.
Planning and process of assessment
- Planning Phases:
- 1) Referral
- 2) Intake Interview to Generate Hypothesis
- 3) Selection of Assessment Tools
- Implementation Phase
- 4) Scoring and Interpretation of Results
- 5) Report Writing
- 6) Explanation of Results
- Key concept: A systematic process aligns purpose, methods, tools, interpretation, and reporting of findings.
- Purpose vs Method vs Interpretation vs Report:
- Purpose: Determine referral question and hypothesis.
- Method: Select tools/tests to answer the referral question.
- Interpretation: Combine results to answer the referral question.
- Report: Present results to the client to address the referral question.
Approaches to assessment
- Collaborative Psychological Assessment: assessee and assessor co-create understanding.
- Therapeutic Psychological Assessment: therapeutic self-discovery encouraged during assessment.
- Dynamic Assessment: interactive model (evaluation → intervention → evaluation); common in educational settings.
- Test: measuring device/procedure; Psychological Test: tool to measure psychology-related variables.
- Interview: direct, face-to-face data gathering; note verbal and nonverbal cues; can be panel or motivational interviewing (therapeutic dialogue using empathy and cognition-altering techniques to affect motivation).
- Portfolio: work products retained in various media (paper, canvas, film, etc.).
- Case History Data: records/transcripts preserving archival information relevant to assessee; Case Study: narrative about a person/event based on case history data; Groupthink: tendency to reach consensus that may bias decisions.
- Behavioral Observation: monitor actions to record quantitative/qualitative data; includes Naturalistic Observation in natural settings.
- Role-Play Test: simulate a situation and evaluate expressed thoughts/behaviors/abilities.
- Computer-Assisted Testing: computers assist test administration; not about test-takers but the process/tools.
- Computer Adaptive Testing (CAT): items adapt to test-taker ability during the test.
Computer Adaptive Testing (CAT)
- Tailors test items to test-taker ability, improving precision and efficiency.
Test security and administration phases
- Before the Test: ensure security, test selection, administrator competence, materials prepared, informed consent obtained.
- During the Test: build rapport, explain purpose, provide clear instructions.
- After the Test: safeguard results, scoring and interpretation, reporting, explaining results to test-taker.
Purpose, method, interpretation, and reporting (assessment planning framework)
- Purpose: determine referral question and hypothesis.
- Method: select tools/tests to answer the referral question.
- Interpretation: synthesize results to answer the referral question.
- Report: present findings to clients.
Settings and sources
- Educational Setting, Clinical Setting, Counseling Setting, Organizational Setting.
- Test catalogues include: test manuals, professional books, reference books, journal articles, online databases.
- Reports should articulate the conclusion and deconstruct the queried variable (e.g., what does "fitness to work" or "emotional damage" mean?).
- Assessment hypothesis is grounded in the psychometrician’s theoretical orientation and guides method/tool selection.
- Psychological Autopsy: retrospective reconstruction of psychological state post-mortem.
- Reports should emphasize strength-based format and honesty (truthful, beneficial, informative, compassionate, humble).
- Research questions should be beneficial, informative, truthful, compassionate, humble (BITCH).
Ethical and legal considerations (four pillars and professional standards)
- Four Pillars of Ethical Psychological Assessment:
- Right to Informed Consent: provide realistic information about test performance; for minors, inform guardians.
- Right to Privacy and Confidentiality: privileged information protected by law; disclosures allowed in three main situations: harm to self, harm to others, court subpoenas.
- Right to the Least Stigmatizing Label: use non-stigmatizing terminology in reporting.
- Competence: task, acquisition, maintenance, and specialty; supervise, consult, and engage in continuing education.
- Responsibilities and standards:
- Relationship with clients; avoid exploitation; diversity and non-discrimination; communicate truthfully with the public.
- General Ethical Principles: Beneficence/Non-Mmaleficence, Fidelity/Responsibility, Integrity, Justice, Respect for rights and dignity.
- Areas of Competence (AFAMS): Foundation, Acquisition, Maintenance, Specialty; means of maintaining competence include supervision, education, consultation, training, self-directed learning.
- Functional competencies in assessment: selection, use, integration, interpretation, reporting, explanation of use.
- Informed consent, confidentiality, and ethics of child rights.
The APA ethical framework and levels of testing (Level A–C)
- Level A: Basic/administration with manual guidance; general achievement tests.
- Level B: Tests requiring knowledge of test construction/use and related fields (e.g., RPm, RPsy, aptitude tests).
- Level C: Substantial understanding plus supervised experience (e.g., Projective tests, Stanford-Binet, MMPI).
- Standards: must be followed by all psychologists.
- Guidelines: aspirational rather than mandatory; aim to provide appropriate services.
Right to consent, privacy, and confidentiality (expanded)
- Consent: voluntary agreement with understanding of purpose, process, risks/benefits, alternatives.
- Privacy: respect for the individual; confidentiality limits when required by law, safety concerns, or court orders.
- Disclosure considerations: harm to self/others, court subpoenas, professional obligations.
- Value of consent for minors: guardians informed of results and recommendations.
Professional conduct and client welfare
- Compassion: avoid harming clients; primum noncere (first, do no harm).
- Competence and ongoing development; seek consultation; be mindful of cultural differences; adhere to ethical governance.
Measurement scales and basic statistics (descriptive statistics and measurement concepts)
- Scales of measurement:
- Nominal: categorization (names, categories).
- Ordinal: order/ranking.
- Interval: equal intervals, no true zero.
- Ratio: true zero, meaningful zero.
- Raw score: unmodified tally of correct answers or points.
- Frequency distribution: how scores occur; can be shown as histograms, bar graphs, etc.
- Measures of central tendency:
- Mean (x̄)
- Median
- Mode
- Measures of variability:
- Range
- Interquartile Range (IQR) = Q3 - Q1
- Semi-interquartile range = IQR/2
- Average Deviation
- Variance (c2^2)
- Standard Deviation (SD) = sqrt( Variance )
- Distribution shapes and statistics
- Skewness (asymmetry): Positive skew (tail to the right), Negative skew (tail to the left).
- Kurtosis (pointiness): Platykurtic, Leptokurtic, Mesokurtic; Normal curve often called Gaussian or Laplace-Gaussian.
- Normal curve properties
- Bell-shaped, symmetrical; mean = median = mode.
- Z-scores convert raw scores to standard deviation units:
  $Z = rac{X - \bar{X}}{SD}$
- T-scores convert Z to a mean of 50 and SD of 10:
  $T = 50 + 10Z$
- Stanine: 1–9 scale; mean 5, SD ≈ 2; whole-value scores 1–9.
- Probability, and standard normal distribution concepts used to interpret test scores.
Reliability and measurement error (Classical Test Theory and extensions)
- Reliability: consistency of a test across items, forms, or occasions.
- Types of reliability:
- Test-retest reliability: stability across time.
- Parallel-forms (alternate-forms) reliability: equivalence across forms.
- Split-half reliability: internal consistency across halves; often adjusted with the Spearman-Brown formula.
- Inter-item consistency: internal consistency of items within a test.
- Inter-scorer reliability: agreement among scorers.
- Key formulas:
- Cronbach's alpha (internal consistency):
 $\alpha = \frac{k}{k-1} \left(1 - \frac{\sum{i=1}^{k} \sigma^2{i}}{\sigma^2{X}}\right)$ where k = number of items, \sigma^2i = variance of item i, and \sigma^2_X = variance of total test score.
- KR-20 (remark for dichotomous items):
 $\text{KR-20} = \frac{k}{k-1} \left(1 - \frac{\sum pi qi}{\sigma^2X} \right)$ where pi = proportion correct on item i, qi = 1 - pi.
- KR-21 (equal-item-difficulty approximation):
 $\text{KR-21} = \frac{k}{k-1} \left(1 - \bar{p}^2 \right)$
 where \bar{p} is the average item difficulty (mean proportion correct).
- Spearman-Brown prophecy formula (split-half):
 $\rho_{SB} = \frac{2r}{1 + r}$
 where r is the correlation between the two halves.
- Standard Error of Measurement (SEM):
 $SEM = SD \sqrt{1 - r{tt}}$ where r{tt} is the reliability coefficient.
- Standard Error of the Difference (for comparing two scores):
 $SEM_{\Delta} = \sqrt{2} \cdot SEM$
- Confidence intervals around observed scores often use SEM (e.g., for a CI around a true score).
- Classical Test Theory concepts:
- True score (T), Observed score (X), Error (E) with: $X = T + E$
- Reliability is the ratio of true-score variance to total variance.
- Other reliability extensions:
- Domain Sampling Theory and Generalizability Theory: broader frameworks for reliability across facets/sources of variation.
- Item Response Theory (IRT): models relating latent traits to item responses; useful for handling dichotomous and polytomous items; key ideas: difficulty, discrimination, and guessing parameters; supports adaptive testing.
- Reliability vs validity: reliability is a prerequisite for validity but does not guarantee validity; validity is about whether the test measures what it is supposed to measure.
- Measurement error sources: item/sample content, test administration conditions, scoring/interpretation, etc.
- Reliability considerations for different test types:
- Speed tests vs power tests: speed tests often have restricted time; reliability may be estimated via alternate forms, test-retest, or split-half with adjustments (Spearman-Brown).
Validity: evidence and types
- Validity refers to the degree a test measures what it claims to measure and predicts relevant outcomes.
- Types of validity evidence:
- Content validity: the extent to which test content covers the domain; assessed via Content Validity Ratio (CVR) and Content Validity Index (CVI).
 - CVR formula: $CVR = \frac{ne - N/2}{N/2}$ where ne is the number of experts indicating an item is essential and N is the total number of experts.
 - CVR ranges from -1 to 1; per-item decisions: retain if CVR >= 0.25, revise if < 0, reject if negative (thresholds vary by sample size).
- Criterion validity (concurrent/predictive): relationship with an external criterion measured at the same time (concurrent) or in the future (predictive).
 - Measured by correlation with the criterion (r). Higher |r| indicates stronger validity.
- Construct validity: the extent to which the test scores relate to the theoretical construct; evidence from convergent and discriminant validity, factor analysis.
 - Convergent validity: scores correlate highly with other measures of the same construct.
 - Discriminant validity: scores do not correlate highly with measures of different constructs.
 - Factor analysis (Exploratory vs Confirmatory): identifies underlying factors and tests model fit.
 - Factor loading indicates the extent a factor explains a test score.
- Ecological validity: how well a test predicts real-world behavior.
- Face validity: whether a test appears to measure what it claims to measure; not sufficient alone for validity.
- Test bias and fairness:
- Test bias: systematic disadvantage to certain groups; needs to be identified and mitigated.
- Test fairness: equitable use of tests across populations; consider culture, language, and access.
- Validity coefficients interpretation (range-based guidance):
- Content validity and reliability interplay with validity; higher validity coefficients imply stronger predictive power for intended criteria.
- Validity and performance: a valid test can be reliable; a reliable test is not necessarily valid.
Test utility and decision analysis
- Utility analysis evaluates whether benefits of using a test outweigh costs.
- Common approaches:
- Taylor-Russell tables: estimate the benefit of test use in selection given test validity, base rates, and selection ratios.
- Naylor-Shine tables: evaluate the incremental gain in mean criterion performance by test use.
- Brogden-Cronbach-Gleser method: compute the monetary gain of using a test under specific conditions.
- Utility analysis factors:
- Test validity (rxy)
- Selection ratio (SR)
- Base rate (BR) in the population
- Costs and benefits (economic and non-economic)
- Cut scores and decision strategies:
- Fixed cut score: a single threshold to pass/fail.
- Relative cut scores: norm-referenced thresholds for screening.
- Multiple cut scores: for multi-stage or multi-instrument selection.
- Multistage/multi-hurdle selection: sequential thresholds; compensatory models (e.g., multiple regression) versus non-compensatory models.
- Methods for setting cut scores:
- Angoff: expert judgments averaged to set cut scores.
- Known groups method (contrast groups): use groups known to possess or lack the trait to set cutoffs.
- IRT-based approaches: linking item difficulty to passing thresholds; bookmark and item-mapping methods.
- Other methods: predictive yield, discriminant analysis, etc.
Methods of test construction and item development
- Test development lifecycle:
- Conceptualization
- Norm-referenced vs criterion-referenced design
- Pilot work and test tryout
- Item bank creation
- Item formats: selected-response vs constructed-response
- Scaling and standardization
- Pilot testing and cross-validation (cross-validation/shrinkage)
- Co-validation (co-norming) across tests
- Item formats and response types:
- Selected-response: multiple choice, true/false, Likert, matching
- Constructed-response: short answer, essay, performance tasks
- Item analysis and quality checks:
- Item difficulty index (proportion correct)
- Item reliability index (internal consistency per item)
- Item validity index (predictive/criterion validity per item)
- Item discrimination index (how well item differentiates high/low scorers)
- Item-characteristic curves (ICCs)
- Qualitative item analysis: think-aloud, expert panels
- Test tryout and item revision:
- Pilot items, field-test performance, revise based on analyses.
- Cross-validation/revalidation with new samples; cross-validation may reduce validity estimates (shrinkage).
- Item formats for computer administration:
- Item banks; CAT (computerized adaptive testing); adaptive item branching (item branching).
- Scales and scaling methods:
- Norm-referenced vs criterion-referenced scaling
- Age-based, group-based, stanine scaling
- Standards for test construction:
- Floor and ceiling effects: floor if test too hard; ceiling if too easy; address via item selection and test design.
- Valid item criteria:
- A good item is answered correctly by high scorers and not by low scorers overall; problematic items may appear to misfit.
Special domains: intelligence, creativity, and aptitude
- Intelligence (historical and theoretical perspectives):
- Francis Galton, Alfred Binet, David Wechsler, Jean Piaget
- Interactionism: genes and environment influence intelligence (e.g., Thurstone PMA, Spearman G and S factors)
- Spearman: two-factor theory (G general, S specific)
- Gardner's multiple intelligences: linguistic, logical-mathematical, spatial, musical, interpersonal, intrapersonal, bodily-kinesthetic, naturalistic, etc.
- Cattell-Horn-Carrol (CHC) theory: broad/narrow factors; general intelligence (Gf, Gc, etc.)
- Horn's expansion: Gv (visual), Ga (auditory), Gq (quantitative), Gs (processing speed), Grw (reading/writing), Gsm (short-term memory), Glr (long-term memory/retrieval), etc.
- Carroll's three-stratum theory: general (III), broad (II), narrow (I).
- Thorndike’s triarchic theory: social, concrete, abstract intelligence.
- Wechsler and Stanford-Binet: major standardized intelligence batteries; WAIS/WAIS-R/WAIS-IV; SB Series; WISC for children; Wechsler scales emphasize verbal and performance measures; SB emphasizes reasoning with age-based norms.
- Information-processing and cognitive theories of intelligence: processing speed, working memory, executive functions; PASS model (Planning, Attention, Simultaneous, Successive).
- Creativity: original ideas, fluency, flexibility, elaboration; convergent vs divergent thinking; tests of creativity and problem solving.
- Nonverbal intelligence and culture-fair/information tests: RPM, CFIT, culture-free tests; Flynn effect (rise in IQ scores over time).
Neuropsychological and clinical assessments
- Neuropsychological evaluation: assesses brain-behavior relationships; includes history taking, MSE, neuropsychological tests; consider effects of medications, brain injury, diseases.
- Common neuropsychological tests and constructs: executive function, abstract reasoning, memory (short-term/long-term), language, visuospatial abilities; commonly used batteries include Halstead-Reitan and others.
- Neurology/Neuropsychology terms: neurons, CNS, PNS, contralateral control, lesion types (focal vs diffuse), acalculia, aphasia, amnesia, agnosia, apraxia, ataxia, etc.
- Special evaluation topics: abuse/neglect, custody evaluations, forensic assessments, competency to stand trial, risk assessment, and duty to warn.
Psychological report structure and interviewing skills
- A psychological report is the end product; integrates data to help life outcomes; tailor reports to client needs; cannot compensate for poor testing or incompetence.
- Typical report sections:
- Identifying Information
- Referral Question
- Evaluation Procedures
- Background Information
- Behavioral Observation / Mental Status Exam (MSE): ASEPTIC framework (Appearance/Behavior, Speech, Emotion, Perception, Thought Content, Insight/Judgement, Cognition)
- Predisposing, Precipitating, Perpetuating, Protective factors
- Ego Syntonic vs Ego Dystonic features
- Interpretations and Impressions
- Summary and Recommendations
- Scaling and Immediacy, Probing, Priorities, Problem-Solving insights
- Interviewing and counseling skills:
- Assessment interviewing skills; Level 1–4 competencies; the importance of ethical conduct.
- Counseling vs psychotherapy: counseling is shorter-term and aims to address life challenges; psychotherapy may address deeper issues.
- Interview types: structured vs unstructured; stress/hypnotic/cognitive/collaborative interviews.
- Interview response levels ( Level 1–5 ) and active listening as foundation for rapport.
- Types of clients and ethical considerations with diverse populations.
Tests and measures in psychology
- Key concepts in test development:
- Norm-referenced vs criterion-referenced benchmarks.
- Pilot work and item analysis; cross-validation and co-validation.
- Internal consistency measures (KR-20/21, Coefficient Alpha, APD) and inter-scorer reliability.
- Homogeneity vs heterogeneity; dynamic vs static traits; restriction/inflation of range; speed vs power tests.
- Validity and bias:
- Importance of validity evidence; test bias and fairness concerns; culture-fair and culture-informed assessments.
- Common test domains and examples mentioned:
- Intelligence tests: Stanford-Binet, WAIS/WAIS-R/WAIS-IV, WISC, RPM, CFIT, ITTER tests, PPVT, Leiter, Bayley scales, SB series, etc.
- Achievement and aptitude tests: WIAT, WIAT-III, WJ-IV, K-ABC, KABC-II, CMMS, PPVT, IT-PAs, Bender Gestalt, Raven matrices, G-HDT, CPIT, etc.
- Personality measures: NEO-PI-R, MMPI (and MMPI-3), CPI, 16PF, CPI-3rd edition, Guilford-Zimmerman, Rosenberg Self-Esteem, GSE, DRS, Hope Scale, LOT-R, SWLS, PANAS, etc.
- Projective and semi-projective measures: Rorschach, TAT, Holtzman Inkblot, Drawings (Draw-a-Person, House-Tree-Person, Kinetic Family Drawing), Rotter’s incomplete sentences, Sentence Completion, etc.
- Behavioral and neuropsychological measures: behavioral observations, behavioral assessment methods, lead roles in group settings, biofeedback, plethysmography, polygraph.
Statistical considerations in assessments
- Normal distribution and standard scores (Z-scores, T-scores, Stanine):
- Z-scores: $Z = \frac{X - \mu}{\sigma}$
- T-scores: $T = 50 + 10Z$
- Stanine: 1–9 scale with mean 5 and SD ~2.
- Reliability and validity interpretation guidance:
- Reliability coefficients in the .90s denote strong precision; .80s are generally adequate for many clinical decisions; .65–.70 may be a weak passing level for some uses.
- Normal curve and distribution properties:
- The normal curve is Gaussian; symmetry implies mean = median = mode.
- Z-scores allow comparison across different distributions.
- Practical measurement concerns:
- Floor/Ceiling effects signal poor item range; revise item difficulties to better discriminate across ability levels.
Key historical and theoretical context (highlights)
- Historical development: contributions from Galton, Binet & Simon, Wechsler, Spearman, Pearson, Rousseau, etc.
- Culture and assessment: culture-specific tests, culture-free/fair tests; Flynn effect; the impact of culture on test norms and interpretation.
- Important concepts: standardization, normative data, reliability, validity, test bias, fairness, and culturally informed assessment.
Practical implications and reporting
- Reports should present information in strength-based language with clinically useful implications.
- Interpretations should be grounded in theory, empirical evidence, and the specific referral question.
- Consider the utility of the test data for decision-making in education, clinical, and organizational contexts.
Quick reference: common formulas used in assessment
- Content Validity Ratio (CVR):
 $CVR = \frac{n_e - N/2}{N/2}$
- KR-20 (dichotomous items):
 $\text{KR-20} = \frac{k}{k-1} \left(1 - \frac{\sum pi qi}{\sigma^2} \right)$
- KR-21 (equal difficulty items):
 $\text{KR-21} = \frac{k}{k-1} \left(1 - \bar{p}^2 \right)$
- Cronbach's Alpha (internal consistency):
 $\alpha = \frac{k}{k-1} \left(1 - \frac{\sum{i=1}^{k} \sigma^2{i}}{\sigma^2_{X}} \right)$
- Spearman-Brown correction (split-half):
 $\rho_{SB} = \frac{2r}{1 + r}$
- Standard Error of Measurement (SEM):
 $SEM = SD \sqrt{1 - r_{tt}}$
- SEM for score difference:
 $SEM_{\Delta} = \sqrt{2}\, SEM$
- Z-score and T-score conversions:
 $Z = \frac{X - \mu}{\sigma}, \quad T = 50 + 10Z$
Summary of core expectations for exam preparation
- Understand distinctions among assessment types and planning stages.
- Be familiar with ethical principles, consent, confidentiality, and cultural considerations.
- Be able to explain and compute basic reliability and validity indices, including Cronbach’s alpha, KR-20/21, and Spearman-Brown, with correct formulas.
- Recognize the role of different validity evidence and how to gather construct/criterion validity data (convergent, discriminant; factor analysis).
- Know the differences between norm-referenced and criterion-referenced assessments, including how cut scores are set and how to interpret them.
- Be able to discuss test utility analyses (Taylor-Russell, Naylor-Shine, Brogden-Cronbach-Gleser) and factors affecting utility.
- Understand the structure of psychological reports and the importance of MSE and interview skills.
- Be familiar with major theories of intelligence and their implications for testing (G, CHC, Gardner, etc.).
Note on breadth
- This set of notes consolidates a wide range of topics covered in the transcript, including planning, ethics, measurement theory, test construction, statistics, intelligence, neuropsychology, and reporting. Use these as a framework for deeper review of each topic, and supplement with course manuals and practice problems where available.
End of notes