Notes on Assessment Practices and Expert Judgment Methods in Forensic Psychology and Psychiatry
Forensic Assessment Practices and Judgment Methods: An International Snapshot
Topic and scope
International survey of forensic examiners who are members of professional associations.
Focus: use of structured assessment tools to aid expert judgment in forensic psychology/psychiatry evaluations.
Key questions: relative frequency of different referrals; what tools are used globally; frequency/type of structured tools; practitioners’ rationales for using/not using tools.
Main findings: most evaluations used tools (74.2%), and tools tended to be used in groups (average 4 per case); extreme diversity of tools (286 different tools observed).
Implications: encourages structured decision methods, improved reliability/validity, and integration of case-relevant information.
Core concepts and definitions
Actuarial tools: mechanical, formula-based tools.
Structured Professional Judgment (SPJ) tools: present evidence-based factors with guidelines; no fixed algorithm to combine data; lies between actuarial and unaided clinical judgment.
Forensic tools can be categorized into three classes (Heilbrun, Grisso):
Forensic Assessment Instruments (FAIs): tools designed to assess abilities or propensities directly tied to legal questions (e.g., CST abilities).
Forensically Relevant Instruments (FRIs): measures clinical constructs relevant to psycholegal concepts (e.g., psychopathy, malingering).
Clinical Assessment Instruments (CAIs): standard clinical tests used for diagnosis, symptoms, and intervention planning.
Strategic framing (Kahneman, 2011; Faust & Ahern, 2012; Grisso, 2003): emphasize balancing data breadth with the need for reliable, valid indicators; cognitive limits suggest focusing on essential information.
The Current Study: aims and rationale
What is the degree of adoption of structured tools in routine forensic practice internationally?
Under what conditions do professionals justify or not justify using these tools?
Methodological advance: respondents described their use in their two most recent forensic evaluations, providing sampling of cases rather than self-reported frequency alone.
Context: prior surveys often asked about tool usage in single evaluations or relied on self-estimated frequencies.
Methods: procedure, participants, and materials
Procedure
IRB approval obtained; online survey designed with REDCap.
Invitations sent to professional forensic mental health associations; two-week reminders.
Respondents answered questions about two most recent evaluations, referencing actual reports.
Estimated survey duration: ~15 minutes.
Definitions and questions
Forensic mental health evaluation: psychological/psychiatric assessment in a legal context.
Ask about referral question, information sources, use of standardized tools (tests, instruments, checklists, rating systems), which tools used, reasons for use/not use, report length, duration from referral to completion, and demographics.
Participants
Population: psychologist and psychiatrist members of professional forensic associations in the US, Canada, Australia/NZ, and Europe.
Total respondents: 434, reporting on 868 cases.
Education/experience: predominantly doctoral-level (≈91%) and master’s-level clinicians (≈7.4%);
more psychologists (51%) than psychiatrists (6%).Experience: average of 16.56 years (SD = 12.01).
Board certification: about 16.4% (e.g., ABFP, Royal College, etc.).
Practice location: US (44.7%), Canada (6.9%), Australia/NZ (4.2%), Europe (2.8%).
Within the US, representation across 39 states and DC.
Results: what referrals occurred, tool usage, and information sources
Referral questions: Figure 1 shows the relative percentages of referrals across 868 cases.
Most common referral: Competence to Stand Trial (CST).
Next most frequent: criminal risk, insanity (criminal responsibility), and sentencing aid evaluations.
Civil proceedings (adult and child) also common.
Other categories (e.g., False Confession, Immigration, Fitness for Duty, Capacity to Waive Miranda Rights) were infrequent (<1% in some cases).
Top-ten referrals (by frequency) were analyzed with detailed statistics in Table 1 for each type where n ≥ 25 reports.
How long evaluations took and report length (per referral, Table 1)
Shortest completion time: Workplace/Employment Disability evaluations (~18 days).
Longest: Child Custody evaluations (~44 days).
Other referrals generally 25–36 days.
Shortest reports: CST (~13 pages).
Longest reports: Child Custody (~32 pages).
Use of structured tools (overall and by referral)
Overall: 74.2% of evaluations used one or more tools.
Among those using tools, most used several (average 4 tools; range up to 18).
Observed extreme diversity: 286 different tools across the ten referral areas; few cases used the same exact set of tools.
Tool usage by referral: criminal risk assessments and child protection evaluations most likely to use tools (≥ 89%); CST evaluations least likely to use tools (58.4%).
Other information sources used (Table 2)
Examinee interviews: nearly universal across referrals (≈99%).
Mental health/medical records: commonly used (≈72.1–100% depending on referral).
Justice system records (police reports, etc.): very high use in violence, insanity, etc. (up to 97%), lower in disability (~17.2%).
Collateral interviews (professional and non-professional): variable; most common in Child Custody; professional collaterals used in some categories (up to ~72.1% for some referrals).
Educational records: varied (≈3.4–40.3% depending on referral).
Additional observations and biological tests used variably depending on referral (e.g., trauma symptom inventories more common in Disability/Civil Tort; biological tests more common in Insanity).
Table 3: Ten most frequently used tools per referral type (by percent of evaluations using each tool)
Overall trend: MMPI variants and WAIS are among the most common CAIs; many tools are versionally aggregated (MMPI-2/MMPI-A/MMPI-2-RF, etc.).
CST: MMPI (15.2%), WAIS (11.8%), ECST-R small role (7.4%), etc.
Violence Risk: HCR-20 (35.6%), Static (65.9%), MMPI (21.8%), MCMI (19.4%), PAI (17.9%), etc.
Sex Offender Risk: PCL-R (35.6%), TOMM (9.9%), MMPI (27.3%), PAI (17.9%), etc.
Insanity: PCL-R (35.2%), MMPI (27.3%), MCMI (27.9%), Trauma measures (27.6%), etc.
Aid in Sentencing: PAI (14.9%), MMPI (14.9%), TOMM (16.7%), etc.
Disability: CAIs (e.g., MMPI 21%), TOMM 9.9%, MCMI 19.4%, etc.
Child Custody: MMPI (60.5%), WAIS (19.4%), PAI (25.6%), etc.
Civil Commitment: MMPI (44.4%), MCMI (40.7%), PAI (25.6%), etc.
Child Protection: MMPI (60.5%), MCMI (40.7%), PAI (25.6%), etc.
Civil Tort: MMPI (44.4%), MCMI (40.7%), PAI (25.6%), etc.
Notable trends
CAIs (e.g., WAIS, MMPI, PAI) are common across many referrals; FRIs (e.g., PCL-R, TOMM, HCR-20) are especially prominent in violence/sex-offense-related referrals; FAIs (e.g., CST-specific instruments like ECST-R, MacCAT-CA) appear mainly in CST and related areas.
Response style/malingering tools (e.g., TOMM, M-FAST, SIRS) appear among the top 10 in several categories, especially Insanity and CST, and to a lesser extent in Disability.
Some tools are highly tailored for specific referrals (e.g., HCR-20 and VRAG for Violence Risk).
Summary interpretation of tools
110 unique tools observed in Violence Risk Assessments alone.
Tools with broader use across referrals: MMPI, PAI, WAIS, MCMI, and trauma inventories (TSI, PCL-M).
There is no one-size-fits-all battery; tool selection is highly case-specific and varies widely across referrals.
Theoretical framing and interpretation
Heilbrun et al. three-category taxonomy for tools
FAIs (Forensic Assessment Instruments): tools designed to directly assess legal-relevant abilities or propensities (e.g., CST tools like ECST-R, MacCAT-CA).
FRIs (Forensically Relevant Instruments): measures of clinical constructs that are relevant to psycholegal questions (e.g., PCL-R, TOMM).
CAIs (Clinical Assessment Instruments): standard clinical tests (e.g., WAIS, MMPI, PAI).
Implication of diversity
Diversity offers case-specific flexibility and may reflect robust clinical training to select tests that fit hypotheses.
However, excessive diversity could undermine reliability and interrater agreement; attorneys and courts face a bewildering array of tools.
The authors propose balancing diversity with standardization by emphasizing psychometric soundness and guided tool selection.
Psychometric considerations and recommendations
Many tools used have substantial empirical support, but not all observed tools have strong psychometric properties.
Recommend using tools with sound psychometric properties when available and relevant to the referral question.
Propose adopting a checklist (from Heilbrun et al.) for tool selection:
Commercial publication and distribution
Available test manual
Demonstrated reliability and validity for the intended purpose
Peer review
Known decision-making formulas
Encourage training to prioritize tools with demonstrated empirical support and to progressively integrate CAIs/FRIs where needed to address multiple hypotheses.
How much data is enough? Implications for decision making
Forensic evaluators collect data from multiple sources (interviews, records, collateral info, tests).
The study notes that most cases used 3–5 tools on average, with some evaluations using many tools (up to 18 in some referrals).
The authors discuss decision science guidance: integrating too many data points can overwhelm judgment; recommended focus on four to six essential variables that are highly reliable and valid, ensuring minimal overlap between variables.
Example of aggregation: combining an intelligence test, an achievement test, an adaptive behavior test, and school records to form an overall cognitive capacity indicator, which becomes one of the four to six essential data points.
The goal is to avoid excessive data gathering while preserving diagnostic and prognostic validity.
Caveats: the four-to-six-variable guideline is a heuristic, not a rigid rule; the exact essential set will vary by case and referral question.
Limitations and caveats acknowledged by the authors
Representativeness and generalizability
The sample comprises members of forensic associations; not all forensic clinicians are members, and members may be more likely to adhere to professional standards.
Representation across countries and within organizations varied; response rate could not be calculated, limiting generalizability.
Some cells in Tables were small (as few as 25 cases), which can yield unstable estimates.
Online survey limitations
Self-reported data; potential biases in recall and reporting.
Methodological scope
The study reports on two most recent cases per clinician, which, while providing real-case data, may not capture longitudinal practice patterns.
Practical implications and recommendations for practice and policy
Positive takeaway: broad adoption of structured tools, consistent with improved decision making, reduced bias, and potentially higher interrater reliability.
Balancing breadth and rigor
Encourage use of tools with robust psychometric properties for core questions, especially where there is consensus on tool usefulness (e.g., violence risk, civil/child custody contexts).
Limit unnecessary testing where incremental validity is not demonstrated.
Training and guidelines
Promote a shift from unstructured flexibility to a combination of structured tools and structured decision methods that optimize efficiency.
Use the Heilbrun checklist as a practical screening device for tool selection.
Operational guidance
Develop and promote structured decision methods to integrate data efficiently and consistently.
Encourage reporting that clarifies which tools contributed incremental validity to the referral question.
Conclusion and overarching message
The field has evolved from largely clinical judgment-based practice to a landscape with many tools and diverse data sources.
The next step is to optimize practice by focusing on tools with strong psychometric properties, reducing unnecessary testing, and developing structured decision methods that improve efficiency and reliability without sacrificing validity.
Notes on terminology and context
SPJ vs actuarial: SPJ uses evidence-based factors with guidelines but does not fix a simple algorithm; actuarial uses fixed rules.
The study highlights a tension in forensic practice between the flexibility of SPJ and the reliability gains of actuarial or well-validated tools.
Quick reference to numbers and key figures (selected highlights)
Sample: N = 434 experts; cases = 868.
Tool usage: 74.2% used one or more tools; average tools per case = 4 (SD ≈ 2.95); range up to 18; observed 286 unique tools total.
By referral (percent using any structured tool): CST 58.4%; Violence Risk 89.0%; Sex Offender Risk 96.9%; Insanity 71.8%; Aid in Sentencing 82.1%; Disability 65.5%; Child Custody 79.1%; Civil Commitment 83.9%; Child Protection 92.6%; Civil Tort 66.7%; Average Across Referrals 74.2%.
Information sources (percentage using each source, by referral): Examinee interview ~99% across referrals; Justice system records often used (up to 97% in some referrals); Educational records variable (up to 40.3% in Disability); Non-professional/professional collateral sources variable (e.g., Child Custody showing high collateral use).
Tools by referral: Violence Risk had high use of HCR-20 (35.6%), VRAG (17.8%), MMPI (27.3%), PAI (17.9%), LS/CMI (10.5); CST tools included ECST-R and MacCAT-CA among top 10; others varied by category.
Examples of specific tools frequently noted across referrals: MMPI variants, WAIS, PAI, PCL-R, HCR-20, VRAG, TOMM, LS/CMI, ECST-R, MacCAT-CA, SORAG, SVR-20, etc.
References and context for further reading
Foundational comparisons of clinical judgment vs actuarial approaches (Dawes, Faust, & Meehl, 1989; Grove et al., 2000; Grisso, 2003).
Structured professional judgment approaches for violence risk (Guy, Packer, & Warnken, 2012).
Checklist and quality criteria for forensic instruments (Heilbrun et al., 2002).
Cognitive and decision-making considerations in professional judgment (Kahneman, 2011; Simon, 1956).
Note on scope and application
The authors emphasize that findings reflect practice among forensic association members and may not generalize to all forensic practitioners.
The study provides a benchmark for future research and policy discussions about optimizing tool use, transparency, and training in forensic evaluations.
Administrative and acknowledgement details
Portions of results presented at AP-LS conference (2014).
Recognizes NSF support for the first author; authors' affiliations: Tess M. S. Neal (University of Nebraska Public Policy Center) and Thomas Grisso (UMass Medical School).
End of notes