Population Affinity

Historically, human variation was categorized in a racist and typological way. This meant that anthropologists would assign certain characteristics or traits as being indicative of certain groups (certain races), a practice known as racial typology. Important is that typology in itself is not bad, as humans we classify everything into types in order to make sense of the world. The “bad” comes from ignoring the spectrum of human biological diversity and saying that races display discrete and unique traits from each other which allowed for the justification of white supremecist ideology. This fact is also important in these early typological studies because anthropologists were looking for confirmation bias that white males were in fact superior and they created trait lists that ‘proved’ this ‘natural order.’

Now, we know that there is no one set of traits that exists in a vacuum for only one population or one race. All humans are one species and represent a wide range of phenotypic expressions due to our complex evolutionary history involving population bottlenecks, gene flow, mutation and adaptation, and our social structures as social primates. Human variation is clinal in distribution meaning we see geographically patterned continuous variation. This clinal variation that feeds into forensic anthropology is seen in human skin color, body proportions and craniofacial morphology.

Ancestry estimation is the classification of an unknown individual’s most likely geographic origin into one or more reference groups using classification statistics and experienced-based methods of analysis. This is related to ancestral origins, continental or ethnic. However, this term is not informative for local population structure and instead is geared more towards the three group model of estimating African, Asian or European. Population affinity is the estimation of group membership of an unknown individual based on morphological or genetic similarity to a well-defined group based on some measure of statistical difference (DFA).

Population affinity can be estimated in the US due to the practice of positive assortative mating encouraged by systemic racism and discrimination based on economic status or social class. The system of social race in our society and the appearance of race in the US is a direct result of this nonrandom mating, gene flow and migration. Race is a social construct that is based around the categorization of people into groups based on phenotypic characteristics. Because race is a social construct, it is not biologically real. However, based on the social institutions of our country, race absolutely has real biological consequences for individuals, especially non-white individuals. In our estimations of ancestry or population affinity, the end goal is always to successfully assign an individual to the social race category to which they were most likely socially perceived to be in life in order to help with their identification after death.

There are different approaches to estimating population affinity that can be broken down into metric versus non-metric and morphological (macromorphoscopic). Nonmetric traits were traits that were compiled via the trait list approach and have a bad rap because they were used typologically. Nonmetric traits are usually discrete and scored as present/absent. They include features such as extra sutural bone, proliferative ossifications, ossification failure, suture variation (metopic suture) or foramina variation (multiple zygomaticofacial foramina). These types of traits are commonly used by bioarchaeologists in biodistance studies. Macromorphoscopic traits on the other hand are also nonmetric traits, but only those that have been standardized and defined. They are referred to as quasi-continuous because they do include presence/absence data (post bregmatic depression) but also data scored on ordinal scales of expression. This includes the scoring of bone shape (nasal bone shape), bony feature morphology (inferior nasal aperture), suture shape (zygomaticomaxillary), or feature prominence/protrusion (anterior nasal spine). These macromorphoscopic traits are used by forensic anthropologists in estimations of ancestry or population affinity.

Metric approaches include craniometrics, or measuring inter landmark distances. This method measures the distance between defined cranial landmarks using spreading or sliding calipers. Geometric morphometrics also uses cranial landmarks, but measures them in 3D space using a digital caliper and euclidean distance. This method uses ‘nonstandard’ ILDs, which can be harder to objectively define and locate with low interobserver error, however when used correctly capture more nuanced variation and improve classification accuracy.

DFA: classification of an unknown individual based on overall similarity to other individuals

Some of the most common statistical approaches to estimating population affinity include discriminant function analysis, artificial neural networks and random forest models. In discriminant function analysis, a linear decision is made to delineate groups. This function seeks to maximize the differences between groups. When used with only two groups, this is referred to as DFA but when used with more than two groups it is referred to as canonical vector analysis. This linear separation creates ‘discrete’ groups around a centroid, which is the mean value for all the discriminant scores. In estimating population affinity through statistical software such as FORDISC, morphoscopic trait scores or metric traits are input into the program and an individual is assigned to the group to which it is the most similar to. This similarity is based on the individual's distance from a group centroid in multivariate space. This distance is referred to as Mahalanobis distance. A key factor to the calculation of this distance is the fact that different weights are applied to different variables based on what is most important to the analysis. This is done when variables are correlated, which with craniofacial data they generally are. This Mahalanobis distance and DFA work together to rotate and scale the data in such a way that groups are made and the variables become uncorrelated. It focuses on the total variance now within each group in order to successfully assign an individual to a reference population. This is in opposition to PCA, which works similarly except it looks at all groups at once and considers the total variance.

When you get an output from FORDISC, you see the Mahalanobis distance, the typicality probability and the posterior probability. The typicality probability tells you how likely it is that you actually belong to the group that you were assigned to, more simply, how typical you are of that group. This probability is based on absolute Mahalanobis distance, and purely considers how close the unknown individual is to the classified group’s centroid without taking into consideration the correlation between variables (covariance matrix). A TPS of 0.5 means that 50% of the total sample in that group is expected to be as far or farther from the centroid as the unknown individual. The posterior probability tells you the probability of membership to the assigned groups based on relative distances from the group centroids, using Mahalanobis distance and the covariance matrix. The main assumption with this probability is that the unknown individual will belong to one of the reference groups available in the analysis, but of course this is not always the case. You may end up in a situation with a very high PP but a very low TP, indicating that the unknown individual is still most similar to one group however they are not actually typical of that group.

Considerations of DFA

Curse of dimensionality: Sample size and the number of variables is very important in DFA analysis. You should not have more variables than the number of individuals in the smallest reference sample or you risk overfitting the data. The classification accuracy will increase with the number of variables until a certain point at which then the classification accuracy will decrease. The extreme minimum is to have one less measurement than the sample size, however a sample size that is 3 times larger than the number of variables is preferred. If there are too many variables, you can fix this by selecting stepwise variables. This removed any variable that is not contributing to classification.

Outliers: DFA is sensitive to outliers because it uses the mean of the linear discriminant function scores to come up with the group classifications and centroids. Extreme outliers can drastically impact this mean to the point of also impacting the groupings and centroids, which then impacts the classification of an unknown individual.

The axes in a discriminant function plot represent two of the dimensions and the points on that graph represent the group mean of a dependent variable, which would be dependent on the two dimensions chosen.