Comprehensive Notes on Psychological Assessment and Evaluation

Psychological Assessment: Comprehensive Study Notes

  • Purpose of psychological assessment

    • Answer referral questions by gathering and integrating psychology-related data to inform evaluation using tools such as tests, interviews, case histories, etc.

    • Distinguish between related terms:

    • Psychological Assessment: data gathering, integration, and interpretation for evaluation.

    • Psychological Testing: measuring psychology-related variables via devices/procedures to sample behavior.

    • Educational Assessment (NAT, NCA): evaluate abilities/skills relevant to school success.

    • Retrospective Assessment: infer psychological aspects from past experiences.

    • Remote Assessment: online/tele-assessment data gathering when in-person isn't possible.

    • Ecological Momentary Assessment: in-the-mild, in-the-moment evaluation of problems at time/place they occur.

  • Planning and process of assessment

    • Planning Phases:

    • 1) Referral

    • 2) Intake Interview to Generate Hypothesis

    • 3) Selection of Assessment Tools

    • Implementation Phase

    • 4) Scoring and Interpretation of Results

    • 5) Report Writing

    • 6) Explanation of Results

    • Key concept: A systematic process aligns purpose, methods, tools, interpretation, and reporting of findings.

    • Purpose vs Method vs Interpretation vs Report:

    • Purpose: Determine referral question and hypothesis.

    • Method: Select tools/tests to answer the referral question.

    • Interpretation: Combine results to answer the referral question.

    • Report: Present results to the client to address the referral question.

  • Approaches to assessment

    • Collaborative Psychological Assessment: assessee and assessor co-create understanding.

    • Therapeutic Psychological Assessment: therapeutic self-discovery encouraged during assessment.

    • Dynamic Assessment: interactive model (evaluation → intervention → evaluation); common in educational settings.

    • Test: measuring device/procedure; Psychological Test: tool to measure psychology-related variables.

    • Interview: direct, face-to-face data gathering; note verbal and nonverbal cues; can be panel or motivational interviewing (therapeutic dialogue using empathy and cognition-altering techniques to affect motivation).

    • Portfolio: work products retained in various media (paper, canvas, film, etc.).

    • Case History Data: records/transcripts preserving archival information relevant to assessee; Case Study: narrative about a person/event based on case history data; Groupthink: tendency to reach consensus that may bias decisions.

    • Behavioral Observation: monitor actions to record quantitative/qualitative data; includes Naturalistic Observation in natural settings.

    • Role-Play Test: simulate a situation and evaluate expressed thoughts/behaviors/abilities.

    • Computer-Assisted Testing: computers assist test administration; not about test-takers but the process/tools.

    • Computer Adaptive Testing (CAT): items adapt to test-taker ability during the test.

  • Computer Adaptive Testing (CAT)

    • Tailors test items to test-taker ability, improving precision and efficiency.

  • Test security and administration phases

    • Before the Test: ensure security, test selection, administrator competence, materials prepared, informed consent obtained.

    • During the Test: build rapport, explain purpose, provide clear instructions.

    • After the Test: safeguard results, scoring and interpretation, reporting, explaining results to test-taker.

  • Purpose, method, interpretation, and reporting (assessment planning framework)

    • Purpose: determine referral question and hypothesis.

    • Method: select tools/tests to answer the referral question.

    • Interpretation: synthesize results to answer the referral question.

    • Report: present findings to clients.

  • Settings and sources

    • Educational Setting, Clinical Setting, Counseling Setting, Organizational Setting.

    • Test catalogues include: test manuals, professional books, reference books, journal articles, online databases.

    • Reports should articulate the conclusion and deconstruct the queried variable (e.g., what does "fitness to work" or "emotional damage" mean?).

    • Assessment hypothesis is grounded in the psychometrician’s theoretical orientation and guides method/tool selection.

    • Psychological Autopsy: retrospective reconstruction of psychological state post-mortem.

    • Reports should emphasize strength-based format and honesty (truthful, beneficial, informative, compassionate, humble).

    • Research questions should be beneficial, informative, truthful, compassionate, humble (BITCH).

  • Ethical and legal considerations (four pillars and professional standards)

    • Four Pillars of Ethical Psychological Assessment:

    • Right to Informed Consent: provide realistic information about test performance; for minors, inform guardians.

    • Right to Privacy and Confidentiality: privileged information protected by law; disclosures allowed in three main situations: harm to self, harm to others, court subpoenas.

    • Right to the Least Stigmatizing Label: use non-stigmatizing terminology in reporting.

    • Competence: task, acquisition, maintenance, and specialty; supervise, consult, and engage in continuing education.

    • Responsibilities and standards:

    • Relationship with clients; avoid exploitation; diversity and non-discrimination; communicate truthfully with the public.

    • General Ethical Principles: Beneficence/Non-Mmaleficence, Fidelity/Responsibility, Integrity, Justice, Respect for rights and dignity.

    • Areas of Competence (AFAMS): Foundation, Acquisition, Maintenance, Specialty; means of maintaining competence include supervision, education, consultation, training, self-directed learning.

    • Functional competencies in assessment: selection, use, integration, interpretation, reporting, explanation of use.

    • Informed consent, confidentiality, and ethics of child rights.

  • The APA ethical framework and levels of testing (Level A–C)

    • Level A: Basic/administration with manual guidance; general achievement tests.

    • Level B: Tests requiring knowledge of test construction/use and related fields (e.g., RPm, RPsy, aptitude tests).

    • Level C: Substantial understanding plus supervised experience (e.g., Projective tests, Stanford-Binet, MMPI).

    • Standards: must be followed by all psychologists.

    • Guidelines: aspirational rather than mandatory; aim to provide appropriate services.

  • Right to consent, privacy, and confidentiality (expanded)

    • Consent: voluntary agreement with understanding of purpose, process, risks/benefits, alternatives.

    • Privacy: respect for the individual; confidentiality limits when required by law, safety concerns, or court orders.

    • Disclosure considerations: harm to self/others, court subpoenas, professional obligations.

    • Value of consent for minors: guardians informed of results and recommendations.

  • Professional conduct and client welfare

    • Compassion: avoid harming clients; primum noncere (first, do no harm).

    • Competence and ongoing development; seek consultation; be mindful of cultural differences; adhere to ethical governance.

  • Measurement scales and basic statistics (descriptive statistics and measurement concepts)

    • Scales of measurement:

    • Nominal: categorization (names, categories).

    • Ordinal: order/ranking.

    • Interval: equal intervals, no true zero.

    • Ratio: true zero, meaningful zero.

    • Raw score: unmodified tally of correct answers or points.

    • Frequency distribution: how scores occur; can be shown as histograms, bar graphs, etc.

    • Measures of central tendency:

    • Mean (x̄)

    • Median

    • Mode

    • Measures of variability:

    • Range

    • Interquartile Range (IQR) = Q3 - Q1

    • Semi-interquartile range = IQR/2

    • Average Deviation

    • Variance (c2^2)

    • Standard Deviation (SD) = sqrt( Variance )

    • Distribution shapes and statistics

    • Skewness (asymmetry): Positive skew (tail to the right), Negative skew (tail to the left).

    • Kurtosis (pointiness): Platykurtic, Leptokurtic, Mesokurtic; Normal curve often called Gaussian or Laplace-Gaussian.

    • Normal curve properties

    • Bell-shaped, symmetrical; mean = median = mode.

    • Z-scores convert raw scores to standard deviation units:
      Z = rac{X - ar{X}}{SD}

    • T-scores convert Z to a mean of 50 and SD of 10:
      T = 50 + 10Z

    • Stanine: 1–9 scale; mean 5, SD ≈ 2; whole-value scores 1–9.

    • Probability, and standard normal distribution concepts used to interpret test scores.

  • Reliability and measurement error (Classical Test Theory and extensions)

    • Reliability: consistency of a test across items, forms, or occasions.

    • Types of reliability:

    • Test-retest reliability: stability across time.

    • Parallel-forms (alternate-forms) reliability: equivalence across forms.

    • Split-half reliability: internal consistency across halves; often adjusted with the Spearman-Brown formula.

    • Inter-item consistency: internal consistency of items within a test.

    • Inter-scorer reliability: agreement among scorers.

    • Key formulas:

    • Cronbach's alpha (internal consistency):
      \alpha = \frac{k}{k-1} \left(1 - \frac{\sum{i=1}^{k} \sigma^2{i}}{\sigma^2{X}}\right) where k = number of items, \sigma^2i = variance of item i, and \sigma^2_X = variance of total test score.

    • KR-20 (remark for dichotomous items):
      \text{KR-20} = \frac{k}{k-1} \left(1 - \frac{\sum pi qi}{\sigma^2X} \right) where pi = proportion correct on item i, qi = 1 - pi.

    • KR-21 (equal-item-difficulty approximation):
      \text{KR-21} = \frac{k}{k-1} \left(1 - \bar{p}^2 \right)
      where \bar{p} is the average item difficulty (mean proportion correct).

    • Spearman-Brown prophecy formula (split-half):
      \rho_{SB} = \frac{2r}{1 + r}
      where r is the correlation between the two halves.

    • Standard Error of Measurement (SEM):
      SEM = SD \sqrt{1 - r{tt}} where r{tt} is the reliability coefficient.

    • Standard Error of the Difference (for comparing two scores):
      SEM_{\Delta} = \sqrt{2} \cdot SEM

    • Confidence intervals around observed scores often use SEM (e.g., for a CI around a true score).

    • Classical Test Theory concepts:

    • True score (T), Observed score (X), Error (E) with: X = T + E

    • Reliability is the ratio of true-score variance to total variance.

    • Other reliability extensions:

    • Domain Sampling Theory and Generalizability Theory: broader frameworks for reliability across facets/sources of variation.

    • Item Response Theory (IRT): models relating latent traits to item responses; useful for handling dichotomous and polytomous items; key ideas: difficulty, discrimination, and guessing parameters; supports adaptive testing.

    • Reliability vs validity: reliability is a prerequisite for validity but does not guarantee validity; validity is about whether the test measures what it is supposed to measure.

    • Measurement error sources: item/sample content, test administration conditions, scoring/interpretation, etc.

    • Reliability considerations for different test types:

    • Speed tests vs power tests: speed tests often have restricted time; reliability may be estimated via alternate forms, test-retest, or split-half with adjustments (Spearman-Brown).

  • Validity: evidence and types

    • Validity refers to the degree a test measures what it claims to measure and predicts relevant outcomes.

    • Types of validity evidence:

    • Content validity: the extent to which test content covers the domain; assessed via Content Validity Ratio (CVR) and Content Validity Index (CVI).

      • CVR formula: CVR = \frac{ne - N/2}{N/2} where ne is the number of experts indicating an item is essential and N is the total number of experts.

      • CVR ranges from -1 to 1; per-item decisions: retain if CVR >= 0.25, revise if < 0, reject if negative (thresholds vary by sample size).

    • Criterion validity (concurrent/predictive): relationship with an external criterion measured at the same time (concurrent) or in the future (predictive).

      • Measured by correlation with the criterion (r). Higher |r| indicates stronger validity.

    • Construct validity: the extent to which the test scores relate to the theoretical construct; evidence from convergent and discriminant validity, factor analysis.

      • Convergent validity: scores correlate highly with other measures of the same construct.

      • Discriminant validity: scores do not correlate highly with measures of different constructs.

      • Factor analysis (Exploratory vs Confirmatory): identifies underlying factors and tests model fit.

      • Factor loading indicates the extent a factor explains a test score.

    • Ecological validity: how well a test predicts real-world behavior.

    • Face validity: whether a test appears to measure what it claims to measure; not sufficient alone for validity.

    • Test bias and fairness:

    • Test bias: systematic disadvantage to certain groups; needs to be identified and mitigated.

    • Test fairness: equitable use of tests across populations; consider culture, language, and access.

    • Validity coefficients interpretation (range-based guidance):

    • Content validity and reliability interplay with validity; higher validity coefficients imply stronger predictive power for intended criteria.

    • Validity and performance: a valid test can be reliable; a reliable test is not necessarily valid.

  • Test utility and decision analysis

    • Utility analysis evaluates whether benefits of using a test outweigh costs.

    • Common approaches:

    • Taylor-Russell tables: estimate the benefit of test use in selection given test validity, base rates, and selection ratios.

    • Naylor-Shine tables: evaluate the incremental gain in mean criterion performance by test use.

    • Brogden-Cronbach-Gleser method: compute the monetary gain of using a test under specific conditions.

    • Utility analysis factors:

    • Test validity (rxy)

    • Selection ratio (SR)

    • Base rate (BR) in the population

    • Costs and benefits (economic and non-economic)

    • Cut scores and decision strategies:

    • Fixed cut score: a single threshold to pass/fail.

    • Relative cut scores: norm-referenced thresholds for screening.

    • Multiple cut scores: for multi-stage or multi-instrument selection.

    • Multistage/multi-hurdle selection: sequential thresholds; compensatory models (e.g., multiple regression) versus non-compensatory models.

    • Methods for setting cut scores:

    • Angoff: expert judgments averaged to set cut scores.

    • Known groups method (contrast groups): use groups known to possess or lack the trait to set cutoffs.

    • IRT-based approaches: linking item difficulty to passing thresholds; bookmark and item-mapping methods.

    • Other methods: predictive yield, discriminant analysis, etc.

  • Methods of test construction and item development

    • Test development lifecycle:

    • Conceptualization

    • Norm-referenced vs criterion-referenced design

    • Pilot work and test tryout

    • Item bank creation

    • Item formats: selected-response vs constructed-response

    • Scaling and standardization

    • Pilot testing and cross-validation (cross-validation/shrinkage)

    • Co-validation (co-norming) across tests

    • Item formats and response types:

    • Selected-response: multiple choice, true/false, Likert, matching

    • Constructed-response: short answer, essay, performance tasks

    • Item analysis and quality checks:

    • Item difficulty index (proportion correct)

    • Item reliability index (internal consistency per item)

    • Item validity index (predictive/criterion validity per item)

    • Item discrimination index (how well item differentiates high/low scorers)

    • Item-characteristic curves (ICCs)

    • Qualitative item analysis: think-aloud, expert panels

    • Test tryout and item revision:

    • Pilot items, field-test performance, revise based on analyses.

    • Cross-validation/revalidation with new samples; cross-validation may reduce validity estimates (shrinkage).

    • Item formats for computer administration:

    • Item banks; CAT (computerized adaptive testing); adaptive item branching (item branching).

    • Scales and scaling methods:

    • Norm-referenced vs criterion-referenced scaling

    • Age-based, group-based, stanine scaling

    • Standards for test construction:

    • Floor and ceiling effects: floor if test too hard; ceiling if too easy; address via item selection and test design.

    • Valid item criteria:

    • A good item is answered correctly by high scorers and not by low scorers overall; problematic items may appear to misfit.

  • Special domains: intelligence, creativity, and aptitude

    • Intelligence (historical and theoretical perspectives):

    • Francis Galton, Alfred Binet, David Wechsler, Jean Piaget

    • Interactionism: genes and environment influence intelligence (e.g., Thurstone PMA, Spearman G and S factors)

    • Spearman: two-factor theory (G general, S specific)

    • Gardner's multiple intelligences: linguistic, logical-mathematical, spatial, musical, interpersonal, intrapersonal, bodily-kinesthetic, naturalistic, etc.

    • Cattell-Horn-Carrol (CHC) theory: broad/narrow factors; general intelligence (Gf, Gc, etc.)

    • Horn's expansion: Gv (visual), Ga (auditory), Gq (quantitative), Gs (processing speed), Grw (reading/writing), Gsm (short-term memory), Glr (long-term memory/retrieval), etc.

    • Carroll's three-stratum theory: general (III), broad (II), narrow (I).

    • Thorndike’s triarchic theory: social, concrete, abstract intelligence.

    • Wechsler and Stanford-Binet: major standardized intelligence batteries; WAIS/WAIS-R/WAIS-IV; SB Series; WISC for children; Wechsler scales emphasize verbal and performance measures; SB emphasizes reasoning with age-based norms.

    • Information-processing and cognitive theories of intelligence: processing speed, working memory, executive functions; PASS model (Planning, Attention, Simultaneous, Successive).

    • Creativity: original ideas, fluency, flexibility, elaboration; convergent vs divergent thinking; tests of creativity and problem solving.

    • Nonverbal intelligence and culture-fair/information tests: RPM, CFIT, culture-free tests; Flynn effect (rise in IQ scores over time).

  • Neuropsychological and clinical assessments

    • Neuropsychological evaluation: assesses brain-behavior relationships; includes history taking, MSE, neuropsychological tests; consider effects of medications, brain injury, diseases.

    • Common neuropsychological tests and constructs: executive function, abstract reasoning, memory (short-term/long-term), language, visuospatial abilities; commonly used batteries include Halstead-Reitan and others.

    • Neurology/Neuropsychology terms: neurons, CNS, PNS, contralateral control, lesion types (focal vs diffuse), acalculia, aphasia, amnesia, agnosia, apraxia, ataxia, etc.

    • Special evaluation topics: abuse/neglect, custody evaluations, forensic assessments, competency to stand trial, risk assessment, and duty to warn.

  • Psychological report structure and interviewing skills

    • A psychological report is the end product; integrates data to help life outcomes; tailor reports to client needs; cannot compensate for poor testing or incompetence.

    • Typical report sections:

    • Identifying Information

    • Referral Question

    • Evaluation Procedures

    • Background Information

    • Behavioral Observation / Mental Status Exam (MSE): ASEPTIC framework (Appearance/Behavior, Speech, Emotion, Perception, Thought Content, Insight/Judgement, Cognition)

    • Predisposing, Precipitating, Perpetuating, Protective factors

    • Ego Syntonic vs Ego Dystonic features

    • Interpretations and Impressions

    • Summary and Recommendations

    • Scaling and Immediacy, Probing, Priorities, Problem-Solving insights

    • Interviewing and counseling skills:

    • Assessment interviewing skills; Level 1–4 competencies; the importance of ethical conduct.

    • Counseling vs psychotherapy: counseling is shorter-term and aims to address life challenges; psychotherapy may address deeper issues.

    • Interview types: structured vs unstructured; stress/hypnotic/cognitive/collaborative interviews.

    • Interview response levels ( Level 1–5 ) and active listening as foundation for rapport.

    • Types of clients and ethical considerations with diverse populations.

  • Tests and measures in psychology

    • Key concepts in test development:

    • Norm-referenced vs criterion-referenced benchmarks.

    • Pilot work and item analysis; cross-validation and co-validation.

    • Internal consistency measures (KR-20/21, Coefficient Alpha, APD) and inter-scorer reliability.

    • Homogeneity vs heterogeneity; dynamic vs static traits; restriction/inflation of range; speed vs power tests.

    • Validity and bias:

    • Importance of validity evidence; test bias and fairness concerns; culture-fair and culture-informed assessments.

    • Common test domains and examples mentioned:

    • Intelligence tests: Stanford-Binet, WAIS/WAIS-R/WAIS-IV, WISC, RPM, CFIT, ITTER tests, PPVT, Leiter, Bayley scales, SB series, etc.

    • Achievement and aptitude tests: WIAT, WIAT-III, WJ-IV, K-ABC, KABC-II, CMMS, PPVT, IT-PAs, Bender Gestalt, Raven matrices, G-HDT, CPIT, etc.

    • Personality measures: NEO-PI-R, MMPI (and MMPI-3), CPI, 16PF, CPI-3rd edition, Guilford-Zimmerman, Rosenberg Self-Esteem, GSE, DRS, Hope Scale, LOT-R, SWLS, PANAS, etc.

    • Projective and semi-projective measures: Rorschach, TAT, Holtzman Inkblot, Drawings (Draw-a-Person, House-Tree-Person, Kinetic Family Drawing), Rotter’s incomplete sentences, Sentence Completion, etc.

    • Behavioral and neuropsychological measures: behavioral observations, behavioral assessment methods, lead roles in group settings, biofeedback, plethysmography, polygraph.

  • Statistical considerations in assessments

    • Normal distribution and standard scores (Z-scores, T-scores, Stanine):

    • Z-scores: Z = \frac{X - \mu}{\sigma}

    • T-scores: T = 50 + 10Z

    • Stanine: 1–9 scale with mean 5 and SD ~2.

    • Reliability and validity interpretation guidance:

    • Reliability coefficients in the .90s denote strong precision; .80s are generally adequate for many clinical decisions; .65–.70 may be a weak passing level for some uses.

    • Normal curve and distribution properties:

    • The normal curve is Gaussian; symmetry implies mean = median = mode.

    • Z-scores allow comparison across different distributions.

    • Practical measurement concerns:

    • Floor/Ceiling effects signal poor item range; revise item difficulties to better discriminate across ability levels.

  • Key historical and theoretical context (highlights)

    • Historical development: contributions from Galton, Binet & Simon, Wechsler, Spearman, Pearson, Rousseau, etc.

    • Culture and assessment: culture-specific tests, culture-free/fair tests; Flynn effect; the impact of culture on test norms and interpretation.

    • Important concepts: standardization, normative data, reliability, validity, test bias, fairness, and culturally informed assessment.

  • Practical implications and reporting

    • Reports should present information in strength-based language with clinically useful implications.

    • Interpretations should be grounded in theory, empirical evidence, and the specific referral question.

    • Consider the utility of the test data for decision-making in education, clinical, and organizational contexts.

  • Quick reference: common formulas used in assessment

    • Content Validity Ratio (CVR):
      CVR = \frac{n_e - N/2}{N/2}

    • KR-20 (dichotomous items):
      \text{KR-20} = \frac{k}{k-1} \left(1 - \frac{\sum pi qi}{\sigma^2} \right)

    • KR-21 (equal difficulty items):
      \text{KR-21} = \frac{k}{k-1} \left(1 - \bar{p}^2 \right)

    • Cronbach's Alpha (internal consistency):
      \alpha = \frac{k}{k-1} \left(1 - \frac{\sum{i=1}^{k} \sigma^2{i}}{\sigma^2_{X}} \right)

    • Spearman-Brown correction (split-half):
      \rho_{SB} = \frac{2r}{1 + r}

    • Standard Error of Measurement (SEM):
      SEM = SD \sqrt{1 - r_{tt}}

    • SEM for score difference:
      SEM_{\Delta} = \sqrt{2}\, SEM

    • Z-score and T-score conversions:
      Z = \frac{X - \mu}{\sigma}, \quad T = 50 + 10Z

  • Summary of core expectations for exam preparation

    • Understand distinctions among assessment types and planning stages.

    • Be familiar with ethical principles, consent, confidentiality, and cultural considerations.

    • Be able to explain and compute basic reliability and validity indices, including Cronbach’s alpha, KR-20/21, and Spearman-Brown, with correct formulas.

    • Recognize the role of different validity evidence and how to gather construct/criterion validity data (convergent, discriminant; factor analysis).

    • Know the differences between norm-referenced and criterion-referenced assessments, including how cut scores are set and how to interpret them.

    • Be able to discuss test utility analyses (Taylor-Russell, Naylor-Shine, Brogden-Cronbach-Gleser) and factors affecting utility.

    • Understand the structure of psychological reports and the importance of MSE and interview skills.

    • Be familiar with major theories of intelligence and their implications for testing (G, CHC, Gardner, etc.).

  • Note on breadth

    • This set of notes consolidates a wide range of topics covered in the transcript, including planning, ethics, measurement theory, test construction, statistics, intelligence, neuropsychology, and reporting. Use these as a framework for deeper review of each topic, and supplement with course manuals and practice problems where available.

  • End of notes