DefiningandMeasuringRiskAssessmentAccuracyAccurateAssessments:Accuracyoccurswhenapredictionmatchesthesubsequentoutcome.Therearetwoformsofaccuracy:PositiveAccuracy:Predictingthatanindividualwillbereconvicted,andtheydoindeedgoontocommitanewoffenseandreceiveanewconviction.NegativeAccuracy:Predictingthatanindividualwillnotbereconvicted,andtheysurviveforaperiod(e.g.,2to5years)inthecommunitywithoutfurtherrecordedoffending.InaccurateAssessments:Inaccuracyoccurswhenthepredictionandoutcomedonotalign:Predictingnocriminalbehavior/convictionwhentheindividualisactuallyreconvicted.Predictingreconvictionwhentheindividualdoesnotreoffend.QuantifyingAccuracy:Simplystatingameasureis50\%or95\%accurateisoftenmeaninglesswithoutspecifyingwhatthatpercentagesignifies.Thereisnosinglepercentageusedtomeasureaccuracy;rather,researchersuseavarietyofmetrics.TheTwo−by−TwoOutcomeGridinPredictionTheoutcomesofriskpredictionsarecategorizedintoafour−cellmatrixbasedontherelationshipbetweenthepredictionandtheactualoutcome:TruePositive:Apersonispredictedtobeconvictedandissubsequentlyconvicted.The"positive"referstotheoccurrenceoftheevent,and"true"referstothecorrectprediction.FalsePositive:Apersonispredictedtobereconvictedbutisnot.Thisrepresentsanoverestimationofrisk.Impact:Theseerrorscanleadtounjustifiedreductionsincivillibertiesandhumanrights(e.g.,harshersentencesorstricterconditions).FalseNegative:Apersonispredictednottobereconvictedbuttheydocommitanoffense.Impact:Thesecasesarehighlyvisibleandaretypicallytheonesthat"hitthefrontpagesofthenewspaper."TrueNegative:Apersonispredictednottoreoffendandindeeddoesnotreoffend.TheComplexityofUncertainty:Inpractice,riskassessmentusuallyprovidesanestimateofuncertaintyratherthanasimplebinary"yes/no."However,atwo−by−twogridisusedwhenapplyinga"cutoff"score(e.g.,individualsaboveacertainscorearedeemedlikelytooffend).CaseStudy:GenderasaCrudeRiskAssessmentMeasureTheScenario:Usinggenderasthesoleriskassessmentmetric,whereallmenarepredictedas"highrisk"(expectedtoreoffend)andallwomenarepredictedas"lowrisk"(notexpectedtoreoffend).TheData(BasedonRealNumbers):MeninSample:454Recidivists:109Non−recidivists:345WomeninSample:62Recidivists:3Non−recidivists:59SuccessoftheMeasure:RecidivismRates:Therateformen(24\%)wasalmostsixtimeshigherthantherateforwomen(5\%).IdentifyingRecidivists:Ofthetotal112peoplewhoreoffendedinthesample,thecrudemeasurecorrectlyidentified109ofthem(97\%).ConclusiononAccuracy:Fromadetectionstandpoint,themeasurelookshighlyjustifiedforprioritizingservicesorjustifyinglongerprisonstaysformen.TheFlipSide(Limitations):OverallAccuracyRate:Whencombiningtruepositivesandtruenegatives,thepredictionwascorrectforonlyaboutone−thirdofthetotalsample.FalsePositiveRate:345menwerepredictedtoreoffendbutdidnot.Thisleadstomassiveoverestimationofrisk.Consequences:Thispolicywouldbeextremelyexpensivefinancially(e.g.,keepingmorepeopleinprison)andethicallyproblematicduetounnecessaryreductionsinliberty.StatisticalDiscrimination:AreaUndertheCurve(AUC)Limitationsofthe2x2Grid:Realriskassessmentusesscores(e.g.,7outof10vs.5outof10)ratherthansimplecategories.Weneedtoknowifhigherscoresmeaningfullydiscriminatebetweenrecidivistsandnon−recidivists.AUCDefinition:TheAreaUndertheCurve(AUC)isessentiallyastatisticalaverageofthetwo−by−twogridcalculatedforeverypossiblecutoffscoreonariskmeasure.AUCInterpretation:Thescorerangesfrom0to1.0.5indicatesaccuracyequivalenttoacoinflip(chance).Thevaluerepresentstheprobabilitythatarandomlyselectedrecidivistwillhaveahigherriskscorethanarandomlyselectednon−recidivist.RealityofAccuracyScores:Largemeta−analyses(coveringsexualoffending,violence,andgeneraloffending)showthattypicalriskmeasuresscoreintherealmof0.65to0.75.Ascorearound0.7isconsideredtohavemoderatetohighaccuracy.WhyAccuracyIsn′tHigher:Humanbehavioriscomplexandinfluencedbyindividualfactors,societalstructures,lawenforcementpolicies,andthefactthatmostoffendinggoesundetected.Predictingbehaviorisinherentlydifficult,butbeingabovechanceallowsthesetoolstoinformpolicyandresourceallocation.BiasandEthnicityinRiskAssessmentTheRacismQuestion:Amajordebatefocusesonwhetherriskassessmentalgorithmsareracist.MeasuringBias:Itisimportanttodistinguishbetweendifferencesinscoresanddifferencesinpredictivevalidity:HigherScores:Aspecificgroup(e.g.,Indigenouspopulations)mightscorehigheronameasure,butthismayreflecthistoricalfactorsorsystemicdiscriminationinsociety(e.g.,over−policing)ratherthanabiasinthemeasureitself.PredictiveValidity:Therealquestioniswhethertheconnectionbetweenscoresandoutcomesisthesameacrossgroups.Forinstance,doestheAUCscoreremainconsistentforbothIndigenousandnon−Indigenouspopulations?ResearchFindings:Ameta−analysiscomparingIndigenousandnon−Indigenousgroupsfound:Consistentmoderatepredictivevalidityacrossbothgroupsformostmeasures.StaticMeasures:Historical,unchangeablefactorsshowedmorepotentialforbiasbecausetheyreflecttheenforcementenvironmenttheindividuallivedin.AbandoningToolsvs.Improvement:Expertsargueagainstabandoningthesetoolsunlessreplacedbyasystemwithsuperioraccuracy.Withoutalgorithms,thesystemfallsbackonprofessionalclinicaljudgment,whichresearchprovesisevenmorebiasedandhardertoquantifyortest.BestPracticesandRecommendationsPracticeRecommendations:AvoidSoleReliance:Decisionsshouldneverbebasedsolelyonascore−basedclassification.PrioritizeDynamicMeasures:Usemeasuresthataccountforchangeableaspectsofaperson′slifetoreducetheimpactofhistoricalbias.CulturalCompetence:Increasetrainingforassessorstounderstandandaddresshumanbiases.ResearchRecommendations:Evaluateallriskmeasurestodetermineiftheypredictequallywellacrossracial,ethnic,andgendergroups.Studyhowriskinstrumentsareactuallyappliedindecision−makingcontextstoseeiftheyreduceorincreasedisparities.Investigatehowtheethnicityoftherater(thepersonperformingtheassessment)affectsbias.MethodsofRiskCommunicationCategoricalLabels(Low,Moderate,High):Subjectivity:Theseareinterpreteddifferentlybydifferentpeople.SurveyFindings:Inastudentsurvey,"Low"riskwasestimatedasanaveraged\approx 19\%chanceofreoffending,"Moderate"as\approx 50\%,and"High"as\approx 80\%.Inconsistency:Differentmeasuresusedifferentnumbersofcategories(e.g.,somehave3,somehave5),meaninga"high"labelcanmeandifferentthingsacrosstools.AbsoluteRecidivismEstimates:Providinganexactpercentagechanceofreoffending(e.g.,40\%).Pros:Easyforinterpretive/thresholddecisions(e.g.,lawsrequiringan"imminent"riskthreshold).Cons:Canbemisleadingasitreferstoagroupproperty(sharingcharacteristicswithapreviouslyresearchedsample)ratherthanapreciseindividualprediction;notconsistentacrossdifferentjurisdictions(e.g.,NZvs.Canada).RelativeRiskMeasures:Expressingriskasaratio(e.g.,"threetimesaslikelyastheaverageoffender").Pros:Morereliableacrossdifferentgeographicalsamplesandjurisdictions.Cons:Oftenmisinterpretedbythosewhostrugglewithfractions/proportions;meaninglesswithoutknowingthe"baserate"oftheaverageoffender.PercentileRanks:Comparinganindividualtoothers(e.g.,"inthe50^{th}percentileofrisk").ConsensusonCommunication:Expertssuggestcombiningmeasures—usingacategorylabel,anabsoluterecidivismestimate,andarelativeriskratiotogether—tomitigatetheweaknessesofeachindividualmethod.TheCommonRiskLanguageInitiativeGoal:Tostandardizeriskcommunicationacrossdifferentmeasuresandassessorsusingfivelevels(Level1throughLevel5)insteadofsubjectivelabels.Level2Example:RecidivismEstimate:Between5\%and30\%.Action:Monitoringforcompliance;minimalintervention.Prognosis:Generallygood;unlikelytobeacareeroffender.Level5Example:Characteristics:Multiple,chronic,severe,andentrenchedcriminallifestyles.RecidivismEstimate:85\%orhigher(virtuallycertain).Action:Intensivesupervisionandmonitoring;approximately300hoursofintensivetherapy/intervention.RemainingChallenges:Itisdifficulttocomparedifferenttypesofoffending.Forexample,aLevel3scoreforgeneraloffendingmayinvolvea40\%recidivismrate,butaLevel3forsexualoffendingwouldnecessarilybelowerbecausesexualoffendingisasmallercomponentofoverallcrime.Potentialforconfusionremainsforparoleboardswhenanindividualhasdifferentlevelsfordifferentoffensetypes(e.g.,Level4forviolencebutLevel1forsexualoffending).