Estimation and Evidence in Forensic Anthropology: Age-at-Death

Abstract

This article discusses age estimation using skeletal morphological changes, focusing on the pubic symphysis. It addresses the inconsistencies in age estimation due to ad hoc statistical methods used in previous studies. The study analyzes a large dataset ( $n = 1766$ ) of pubic symphyseal scores from anatomical collections, war dead, and genocide victims, emphasizing statistical methods that provide accurate coverage. Coverage is defined as a method's ability to have a stated percentage of individuals within a particular pubic symphyseal stage falling between the stated age limits. The article demonstrates that transition analysis, when used with an appropriate prior age-at-death distribution, provides accurate coverages, unlike percentile methods, range methods, and means with standard deviations. It suggests that more emphasis should be placed on collecting data on age changes in large samples rather than focusing on inter-population variation in aging rates.

Keywords: forensic science, pubic symphysis, probit analysis, likelihood ratio

Introduction

Human osteological remains are crucial in forensic anthropology for identification and evidence in putative identifications, both requiring a Bayesian framework. Estimation needs a prior distribution, while evidence requires the probability of the osteological data from the general population. This paper, the second in a four-part series, focuses on the Suchey–Brooks pubic symphyseal system for age estimation and evidence presentation. Future papers will cover categorical variables like sex and race, and time-since-death estimation. The focus on the six Suchey–Brooks stages is due to the extensive data available and the possibility of population-based variations in progression through these stages. The study uses data from over 1700 known age-at-death males from various populations to examine the practical effects of timing differences. The single-indicator approach simplifies the methodological analysis.

Materials and Methods

Samples and Data

The sample consists of 1766 males with known ages-at-death (Table 1). Sub-samples are from the Los Angeles Coroner’s Office, the Terry Anatomical Collection, U.S. Korean War Dead, Balkan genocide victims, and the Department of Anatomy of the University of Chiang Mai, Thailand. The L.A. Coroner’s Office sample is well-documented with birth certificates and known times of death. Data were obtained from a handout by Suchey at the AAFS meetings in 1986. The Terry Anatomical Collection has less documentation, with ages-at-death reported upon entry into the collection, likely self-reported or provided by relatives. The Korean War Dead sample primarily consists of those killed in action, with known birth and death dates. Ages-at-death for the University of Chiang Mai sample were derived from Schmitt’s Figure 2 using ‘‘DataThief’’ (http://www.datathief.org/), which may introduce slight inaccuracies. Table 2 lists the age distribution within Suchey–Brooks stages for the Thai sample. The Balkan sample is challenging due to identification by relatives based on personal effects and clothing, without DNA analysis. This raises concerns about the validity of identifications. Therefore, the Balkan sample is sometimes treated as a test case with unknown ages. Suchey–Brooks scores for pubic symphyseal development were recorded for all samples.

Percentile Method

The percentile method involves using sample statistics of age within a given stage to estimate age. Katz and Suchey provided a table listing the sample sizes within their six stages of the pubic symphysis, including the mean age, standard deviation of age, and the 95% range of age within each stage. Applying the reference sample's mean age and standard deviation to a target sample or case is a risky endeavor, as the age distribution within skeletal stages is unlikely to be Gaussian or symmetric. For this reason, using percentiles of age within stage, as advocated by Katz and Suchey, is a superior approach. Their use of the ‘‘95% range’’ is equivalent to listing the 2.5 and 97.5 percentiles of age. Even the percentile method has considerable disadvantages over transition analysis which is described below:

Standard Errors: Should include standard errors on the percentiles because individual stages may contain relatively small sample sizes.
Incomplete Description: Listing a few sample percentiles provides a rather incomplete description.
Kaplan-Meier Plots: Both of these problems can be addressed graphically by producing complete Kaplan–Meier plots of survivorship within stage and including confidence intervals on the survivorship.
Implicit Prior Distribution: All of these methods contain an implicit prior distribution for age. This implicit prior is the actual age distribution of the reference sample itself. The authors point out that the perceived differences in aging between samples derive from the different age structures of the study populations.

Transition Analysis

Transition analysis, a parametric method, models the passage of individuals from a given developmental stage to the next. In the Suchey/Brooks system, five transition distributions are modeled between the six ordered phases. With longitudinal data, characterizing these distributions would be simple, but cross-sectional data require assuming a distributional form for the transitions and fitting the transition analysis model by maximum likelihood. The method begins with a simplified version and is then elaborated. For instance, pubic symphyseal phases can be collapsed into two broad stages: phases I and II, and phases III to VI. Phase III marks the completion of the oval outline of the symphyseal face, distinguishing between incomplete and complete faces. A probit model was fit to the 1766 males using the glm function in "R", yielding an average age of 27.30 years and a standard deviation of 7.41 years for the transition from stage 1 to stage 2 (II to III). Transition analysis can be represented graphically, using age in the natural log scale for a log normal transition distribution. For the log normal, the mean age-at-transition is 26.52, which is 0.78 years less than for the normal distribution. The log-normal distribution, being asymmetric, avoids extremely young or negative ages-at-transition. The transition analysis method is compared to (nonparametric) kernel density estimation as a graphical check. A cumulative probit model can be applied to age on a log scale, representing more than one transition. The cumulative probit, or proportional odds model with a probit link, provides similar results when stages are collapsed. An attractive feature of the cumulative probit is that it will provide similar results when stages are collapsed. The probability from the cumulative probit of being in each stage at a given age as well as the kernel density estimates. Both probit and kernel density methods agree well except for Suchey–Brooks stage V and VI. Collapsing these last two stages into one stage can bring the two methods into agreement.

Age Estimation from Transition Analysis and a Prior Age Distribution

Lucy et al. argued for using the reference sample both to obtain likelihood functions (which is what transition analysis provides) as well as the prior age-at-death distribution. In the current context, it may not be reasonable to use the reference sample to obtain a prior age-at-death distribution. Prior age-at-death distributions should be specified as reasonable guesses for an individual case before osteological analysis. Two main examples of age estimation are presented, one using the Balkan sample and the other using the Thai sample. For the Balkan sample, it would be reasonable to use an ‘‘age-at-missing’’ distribution reported in Komar, but a Gompertz model is used to fit the Balkan individuals as a prior age-at-death distribution, and similarly, a Gompertz model is used for the Thai individuals. Combining these prior age-at-death distributions with the probabilities from the transition analysis yields a function proportional to the posterior density of age. Dividing through by the integral across age gives the probability density function (PDF) of age conditional on Suchey–Brooks stage. From this PDF, one can find the highest posterior density region (HPDR) for any specified level of coverage. Coverage refers to the percentage of individuals expected to fall within the specified HPDR. In the current example, only 50% coverage is used. For comparison, the mean age by stage plus and minus $0.674$ standard deviation units from Suchey and Katz’s Table 1 (p. 211) is used. The correctness of coverage for the HPDR and confidence interval approaches is examined using a cumulative binomial test. The cumulative binomial is also used to test if the HPDR and confidence intervals are appropriately centered on the age distributions within stages.

Pubic Symphyseal Stages as Evidence in ‘‘Positive Identification’’ Cases

Steadman et al. have discussed using pubic symphyseal and auricular surface stages to calculate the likelihood ratio for a positive identification. Konigsberg et al. noted that the likelihood ratio should be reported when building the evidentiary basis for identifications. The likelihood ratio is calculated as the probability that an individual would be in the observed Suchey–Brooks stage conditional on the known age divided by the probability of obtaining the observed Suchey–Brooks stage from the population at large. Likelihood ratios will be calculated for a number of different samples and the probability of obtaining particular Suchey–Brooks stages can be estimated by the observed frequencies in each sample. The probability that an individual would be in an observed Suchey–Brooks stage conditional on ‘‘known’’ age is found from the transition analysis. The likelihood ratio can be converted to a base 10 logarithm. A score of zero then represents ‘‘evens’’ or the case where the observed Suchey–Brooks stage is as likely to come from the identified individual as from an individual selected at random. The evidentiary value of the Suchey–Brooks system is sample-specific, so plots of the log-likelihood ratio distributions for the four largest samples are examined. These likelihood ratios are calculated using transition analysis from the total sample less each of the particular samples under study. In each plot, the average distribution from 1000 permutations across the sample is shown to indicate the expected distribution under random ‘positive identifications.’

Results

Percentile Method

Figures 6–9 show the Kaplan–Meier survivorship estimates and 95% confidence intervals within the six Suchey–Brooks pubic symphyseal stages for the Los Angeles Coroner’s samples, the Terry Anatomical Collection, the Korean War Dead sample, and the Balkan sample. Table 3 lists sample sizes within stages for each sample, as well as the 2.5th, 25th, 50th, 75th, and 97.5th percentiles of age within stage. The percentiles were found using ‘‘method 7’’ from Hyndman and Fan, and the ages were rounded to the nearest integer. The 2.5th and 97.5th percentiles for the Los Angeles Coroner’s sample agree with the ‘‘95% range’’ published in Katz and Suchey’s Table 8, except for the bottom age of 36 years for stage VI given in Katz and Suchey. Additionally, there are two less individuals in the first stage compared to Katz and Suchey because two individuals under 14 years were excluded. Table 3 provides the 50th percentile of age within stage across samples as a quick summary of the central tendencies for age within stage. As a consequence, the contrast of Fig. 8 for the Korean War Dead sample with Figs. 6, 7, and 9 illustrates the often cited example where a direct application of ‘‘age-by-stage’’ information from the young Korean War Dead sample would underestimate ages for more typical forensic or anatomical samples which contain older adults.

Transition Analysis

Table 4 contains the transition analysis parameters (mean and standard deviation of the log age of transition) between each of the six Suchey–Brooks stages for all individuals in the study ( $n = 1766$ ), for all except the Balkan sample ( $n = 1554$ ), and for just the Balkan ( $n = 212$ ), Los Angeles Coroner’s Office ( $n = 737$ ), Terry Anatomical ( $n = 422$ ), and Korean War ( $n = 358$ ). The table also gives the standard errors for all parameters and the mean age-at-transition converted back to the original straight scale of years. From the standard errors and parameter values in Table 4, it may be tempting to test for differences in rates of aging between the samples. Addressing the question as to how well ‘‘non-Balkan standards’’ would apply to individuals from the Balkans, Fig. 10 plots the transition distributions based on the 1554 non-Balkan individuals and the 212 Balkan individuals. Figure 11 shows a comparable plot of what are known as normed likelihoods. These normed likelihoods are the probabilities of being in each Suchey–Brooks stage (based on the transition analyses) but scaled such that the maximum probability (or really, the maximum likelihood) is equal to 1.0. Figure 11 also shows dotted horizontal lines at normed likelihoods of 0.7965 and 0.1465. From a frequentist standpoint, the 95% confidence set for age conditional on Suchey–Brooks stage consists of all ages with normed likelihoods greater than 0.1465. Similarly, the 50% confidence set for age conditional on Suchey–Brooks stage consists of all ages with normed likelihoods greater than 0.7965.

Age Estimation from Transition Analysis and a Prior Age Distribution

As a test for transition analysis, the parameters have been applied to estimate age ranges for the 212 Balkan individuals and the 37 Thai individuals. In both analyses, the transition analysis parameters were calculated for the entire male sample but excluded the Balkan sample when testing on the Balkans, and similarly excluded the Thai sample when testing for Thailand. For both samples, it is necessary to have a prior age distribution, for which Gompertz models are used here. The youngest individual in the Balkan sample was 17 years old and for the Thai sample the youngest individual was 20 years old, so the Gompertz models begin at ages 17 and 20, respectively. Figure 12 compares the age distribution from Komar’s Srebrenica age-at-missing data with the age-at-death data for 199 Balkan individuals in the current study with ages between 20 and 75 (inclusive). As these age distributions are quite dissimilar, a Gompertz model fit to the current data is used to represent the prior age-at-death distribution. Figure 13 shows the 95% confidence intervals for the Kaplan–Meier and Gompertz model for the Balkan sample, while Fig. 14 shows a comparable graph for the Thai sample. Figure 15 shows a plot of coverage for the Balkan sample comparing the Suchey–Brooks confidence intervals to the transition analysis HPDRs. The stages have been randomly ‘‘jittered’’ to reduce overlap of the points, and the HPDRs from the transition analysis are plotted above the points and the Suchey–Brooks confidence intervals below. Figure 16 shows a coverage plot for the Thai sample. Figures 17–20 show the cumulative distributions of log-likelihood ratios for the four largest samples: the Los Angeles Coroner’s Office sample, the Terry Anatomical Collection, the Korean War Dead sample, and the Balkan sample. For the Terry Anatomical Collection the evidentiary values are generally lower, such that only 52.84% of the sample has likelihood ratios greater than 1.0. The Korean War sample is intermediate with 73.18% of the cases having likelihood ratios greater than 1.0, and finally the Balkan sample has a slightly higher percentage than the Korean War sample, with 75.47% of the cases from the Balkans having likelihood ratios greater than 1.0.

Discussion

The discussion in this paper is framed around a number of particular methodological issues that often arise in attempting to estimate age-at-death or presenting osteological evidence that may help confirm identifications. The first problem to be dealt with is that of how to model the progression of individuals through an ordered (staged) system such as the Suchey–Brooks pubic symphyseal stages. Konigsberg and Herrmann have used an unrestricted cumulative probit to model progression through the Suchey–Brooks stages, Boldsen et al. have used a continuation ratio approach, and Samworth and Gowland have suggested using a shifted exponential. All of these models add a level of complexity that seems unnecessary in the current context. The percentile method, which was applied graphically in Figs. 6–9 and summarized in Table 3 cannot be presented as compactly. Furthermore, the percentile method takes essentially a ‘‘hidden Bayesian’’ approach where the reference sample prior age distribution influences the calculated percentiles. Some of the transition analysis parameters provided in Table 4 indicate significant differences between samples for mean ages-at-transition. Because the percentile method does not allow for different prior age-at-death distributions, while maximum likelihood (i.e., non-Bayesian) estimation of age-at-death uses an unreasonable uniform prior, it is better to use explicit priors when estimating age-at-death. Figures 13 and 14 show that a Gompertz model of mortality can be adequately fit to the sample of Balkan and Thai males, respectively, while Figs. 15 and 16 show that the 50% coverage from transition analysis is accurate in its coverage and placement. On the evidentiary value of the Suchey–Brooks stages, a rather different tack from those previously explored has been taken here. From the statistical literature on ordinal categorical data, it is common to see some summary measure of the fit of a model, usually referred to as a pseudo R2. The graphs shown in Figs. 17–20 show, as one would generally expect, that the evidentiary value of the Suchey–Brooks system is rather lower.