Accountability and Coercion: Is Justice Blind when It Runs for Office?

Introduction

Trial judges in the US wield significant authority through their sentencing power.
In 39 states, judges are subject to reelection, raising concerns about impartiality.
The study develops a theory of sentencing behavior influenced by electoral considerations.
Voters are not fully informed and tend to notice underpunishment more than overpunishment.
The theory predicts elected judges will become more punitive as elections approach.
Analysis of 22,095 Pennsylvania criminal cases from the 1990s supports this prediction.
Electoral dynamics are estimated to account for 1,818 to 2,705 years of incarceration.

The Role of Trial Judges

Trial court judges play a crucial role in administering criminal justice, with nearly 5,000 state trial court judges.
In 1998, they sentenced almost one million felons to over two million years in state jails and prisons.
Sentencing authority ultimately governs the coercive power of the state on individual defendants.
Trial judges have broad discretion over sentencing in nearly all noncapital cases.
Most cases are resolved via plea bargain, but the presiding judge must approve the proposed sentence.

Judicial Elections and Impartiality

In 39 states, trial judges stand for reelection through:
- Competitive partisan elections (8 states)
- Competitive nonpartisan elections (21 states)
- Noncompetitive retention elections (10 states)
In retention elections, judges run unopposed and must receive more "yes" than "no" votes.
In 7 states, judges are evaluated by the legislature, governor, or a judicial nominating commission.
Legal scholars are concerned that elections may compromise judicial independence.
Empirical studies suggest voters are uninformed about judge behavior, and trial judges have high retention rates.

Theory of Electoral Control and Sentencing

The article develops a theory specifying conditions under which trial judges alter sentencing behavior to improve electoral prospects.
Voters find it difficult to monitor judicial behavior, so a single instance of judicial malfeasance can influence an election.
Certain characteristics of the informational environment surrounding trial judge elections, however, make perceived underpunishment easier to observe than perceived overpunishment.
If judges discount the future value of retaining office, they will minimize the electoral consequences of underpunishment later in their terms.
The theory predicts unidirectional convergence: trial judges will become more punitive as their terms proceed.

Comparing with Existing Theories

Existing studies of electoral incentives predict bidirectional convergence: officials become more conservative or liberal to align with constituents.
The study predicts judges will become more punitive regardless of their position relative to constituents’ preferences.
Unidirectional convergence could also result from a bias in the selection of judges who are uniformly less punitive than their constituents."
Critical tests are devised to differentiate between these accounts.

Empirical Analysis: Pennsylvania

Incumbent trial judges in Pennsylvania face noncompetitive retention elections every ten years.
Analysis of over 22,000 Pennsylvania trial court sentences for aggravated assault, rape, and robbery convictions in the 1990s showed:
- Sentences for these crimes are significantly longer the closer the sentencing judge is to standing for reelection.
The results suggest the superiority of the theoretical account over others.
A baseline estimate imputes at least 1,818 to 2,705 years of additional prison time due to electoral incentives.

Judicial Behavior and Electoral Accountability

The method of choosing trial court judges is a matter of substantial controversy.
Elections may tie judges too closely to the whims of public opinion, compromising judicial integrity.
Elected judges may base decisions on political effects rather than legal precepts or unbiased facts.

Representation and Electoral Connection

Electoral control is viewed as an agency problem where elections serve two fundamental roles:
- Selection devices: Voters pick agents whose preferences mirror their own.
- Voters typically do not fully solve the adverse selection problem due to limited information.
- Incentive mechanisms: Elections induce officials to approximate constituents’ preferences.

Electoral Incentives

Officials wish to retain office to:
- Enjoy its perquisites.
- Influence policy in the future.
Effectiveness of electoral incentives increases with the official’s perception of the value of retention.
Officials continuously reevaluate the balance between their own preferences and electoral concerns over their terms.

Temporal Discounting

Officials may discount the future value of retaining office.
- At the beginning of their terms, they prioritize implementing their own preferred policies.
- Toward the end of a term, retaining office becomes a paramount concern.

Voter Evaluation

Voter evaluation of candidate performance is temporally proximate to each election.
Officials may ignore constituent preferences when voters are inattentive or have short memories.
Competitive electoral environments mitigate these effects through challenger scrutiny.

Bidirectional Convergence

The incentive effects of elections and their variation over time imply bidirectional convergence.
As election approaches, officials moderate their behavior to more closely approximate the wishes of a pivotal constituent (e.g., the median voter).
Officials more liberal than that constituent will drift rightward over the electoral cycle, while more conservative officials will drift leftward.
Convergence is constituency specific.

Electoral Incentives of Trial Judges

The informational and institutional environment of trial judge elections differs fundamentally from other elected officials.
Voters are uninformed about the most basic aspects of these officials’ behavior and responsibilities.
This paucity of information is exacerbated by:
- Lack of contextual cues like party labels in nonpartisan judicial elections.
- Absence of challengers in retention elections to provide information about incumbent performance.
- Restrictions on position taking by judicial candidates.

Fire Alarm Oversight

Information about adverse consequences of a single case, when publicized, can be decisive in swaying voter opinion.
This draws on McCubbins and Schwartz’s (1984) distinction between types of oversight:
- Sustained, active “police patrol” oversight by voters is costly.
- Passive “fire alarm” oversight occurs when well-informed actors publicize instances of perceived judicial malfeasance.

Asymmetric Information

In criminal justice, fire alarms nearly always correspond to perceived instances of underpunishment, not overpunishment (Canes-Wrone, Herron and Schotts, 2001).
Underpunishment is more easily observed through news accounts of recidivism.
Victims’ families and groups have incentives to publicize specific instances of underpunishment.
Asymmetry motivates even moderate or liberal voters to assume the worst about defendants and judges.

Effectiveness of Oversight

The fact that fire alarms are rarely pulled indicates deterrence.
Voter ignorance in trial judge elections results from trial judge compliance with public opinion, rather than judicial autonomy.
High retention rates do not necessarily indicate judicial independence; they can signal total autonomy or total subservience.

Judges' Preferences

Elected judges are attentive to public opinion.
Judges have their preferences over criminal justice issues and desire to judge based on perceived culpability, remorse, or likelihood of recidivism.
The balance between the value of office and implementing their own preferences shifts increasingly toward the former as election approaches.
The fire-alarm nature of trial judge oversight suggests unidirectional convergence: judges will become more punitive, not more representative, over the course of their terms.
Hall (1992) finds that liberal state supreme court justices in states with short terms are less likely to dissent from decisions upholding the imposition of the death penalty, due to ideological extremity revealing.

Alternative Theories

An alternative theory yielding unidirectional convergence is preference-based: Judges in all districts may be uniformly more liberal than their constituents due to training (Roberts and Edwards, 1989; Roberts and Doob, 1990).
However, voters themselves put judges in office in the first place.
An experiment suggests that voters’ seemingly punitive tendencies are primarily a consequence of their informational environment.
Detailed accounts lead to a much smaller proportion believing the assigned sentence was too lenient.
Media coverage of criminal proceedings may explain the perception of judicial leniency.
The empirical analysis develops a set of empirical tests that distinguish not only between the bidirectional and unidirectional convergence hypotheses, but also between a preference-based causal story for unidirectional convergence and the information-centered one.

Data and Method: Pennsylvania Courts of Common Pleas

Evaluated by the Pennsylvania Commission on Sentencing, elections data from the Pennsylvania Department of State, and judges’ backgrounds from the Pennsylvania Manual
When judgeships vacate, replacements are selected via a partisan competitive election.
- In the primary election, judges compete for one (87% of the time) or both of the major party nominations.
- In the general election, the top vote getter(s) will fill the one or more open seats in a particular judicial district.
Once elected, judges stand for reelection every ten years on the basis of a noncompetitive retention vote.
Not all judges in a district are on the same electoral calendar.

Institutional Setting

Conventional wisdom suggests judges will be most divorced from the electoral connection when they serve long terms and run in nonpartisan retention elections.
Pennsylvania trial judges operate in an institutional setting that will render them least sensitive to periodic voter review.
However, these very institutional conditions are the ones the theory predicts will produce unidirectional convergence toward punitiveness.

Criminal Cases in Pennsylvania

The manner in which criminal cases wind their way through the judicial system in Pennsylvania is enormously complex.
Common Pleas judges generally exercise enormous discretion in imposing sentences, with several constraints.
- All crimes carry with them statutory maximum sentences, and some have associated mandatory minima as well.
- The Pennsylvania Commission on Sentencing (PCS) offers voluntary sentencing guidelines for most felonies and misdemeanors.
Judges are obliged to take account of PCS instructions, but not to abide by them (42 Pa.C.S. § 9781).

Pennsylvania Sentencing Guidelines

Work as they do in many other states: PCS classifies crimes by offense gravity and defendants by prior record.
A judge can determine the recommended sentencing range by referring to a sentencing matrix, expanding the recommended penalty range upward or downward by 12 months in the presence of aggravating or mitigating factors.
Additional matrices exist for separate sentence enhancements such as possession of a deadly weapon during the commission of a crime.
For a given conviction, sentencing judges in Pennsylvania hand down both a minimum and maximum sentence.
In cases involving incarceration, the defendant is obliged to spend at least the minimum term in prison before becoming eligible for parole.
A state parole board may or may not grant the defendant parole up to the release time specified by the judge as the maximum sentence.
The manner in which Pennsylvania incarcerates and releases defendants falls between fully indeterminate sentencing and fully determinate sentencing.

Analyzing Sentencing Behavior

Must account for a judge’s discretion in a given case.
Attention restricted to a class of felonies for which judges always have some discretion in sentencing and typically assign prison time.
22,095 observations for discretionary sentences imposed from 1990 to 1999 according to guidelines issued in 1988, 1994, and 1997 (42 Pa.C.S. § 9781).
Judges assign two sentences for each case.
Dependent variable is the smaller of these two quantities, measured in months of incarceration, representing the determinate portion of the judge’s discretion over sentencing.
By statute, the smaller sentence imposed by the judge cannot exceed one-half the larger sentence, which itself cannot be greater than the statutory maximum.
For certain crimes, the law mandates a minimum prison sentence.
These rules place upper and lower boundaries on the range of a judge’s sentencing options, creating a censoring problem.
Employ a two-limit tobit model with observation-specific left and right censoring points because OLS regression produces biased coefficient estimates due to censoring (Maddala, 1983, 160–62; Tobin, 1958).

Censoring

Employ a two-limit tobit model with observation-specific left and right censoring points because OLS regression produces biased coefficient estimates due to censoring.
-Model also allows us to address a second problem created by the 16% of cases in which no prison time was imposed: treat these cases as left-censored, assuming they represent punishment less than the minimum jail time.
Other factors than electoral proximity and statutory limits may explain assigned sentences: failure to control for them will bias inferences if the omitted variables are correlated with the included ones.
Employing case- and judge-level controls is a conservative strategy.
Crimes, defendants, and cases vary independently in ways that will affect judges’ use of their discretion.

Control Measures

The Pennsylvania Sentencing Commission’s recommended minimum and maximum sentences provide ideal measures to control for the severity of the offense committed and the defendant’s prior criminal record.
They incorporate an enormous amount of information, including victim age, the crime’s location, and the level of violence.
Employ dummy variables for the applicable sentence guideline regime (1988, 1994, or 1997).
Supplemental controls for the nature of particular crimes include indicator variables that distinguish the type of crime (rape and robbery—the baseline category is aggravated assault) and whether it involved the possession or use of a deadly weapon.

Case Dispositions

Control for variation in the disposition of cases because in 51.5% of the cases in our sample, the defendant was convicted on more than one count.
Because judges can decide whether to impose sentences consecutively or concurrently and which counts to issue sentences on, we examine only the sentences associated with the most severe count on which the defendant was convicted and control for the number of counts.
Include indicator variables for negotiated and nonnegotiated guilty pleas (the baseline category is conviction at trial).

Judge's Ideology

Measure judicial ideology is difficult. Take three approaches:
- Judges’ time-invariant ideological proclivities cannot be systematically correlated with where they happen to be in their own electoral cycles. This obviates the need to control for those proclivities.
- As a robustness check, the characteristics of the judges as proxies for their punitive tendencies control. The measures we employ are the judge’s age and age-squared, whether the judge was male or female, and whether the judge had prosecution experience
As a more comprehensive robustness check, we employ judge-specific fixed effects (i.e., one dummy variable per judge—425 variables total) to control for all time-invariant characteristics of the sentencing judge. This approach is the most conservative because it requires no a priori assumptions about how judges’ preferences are derived.

Electoral Proximity Hypothesis

Primary hypothesis concerns the effect of electoral proximity. Code proximity as the number of days elapsed in the judge’s term at the date of sentencing divided by 3,653.
The measure is thus scaled from zero to one, with zero representing 10 years until the next election, and one an imminent retention vote (Election Day).
Expect a positive coefficient on this measure: as proximity increases, so should assigned sentences.
Another variable, whose relevance that has been explained, appears in the summary statistics: a measure of district political conservatism on criminal justice issues.
Lacking a perfect measure, employ the district Republican share of the two-party vote in the previous statewide attorney general race.

Results

Presentation of empirical findings proceeds in two stages.
- Provide statistical results that confirm the primary hypothesis.
- Devise and implement a series of critical tests to distinguish the underlying causal mechanism because these results are consistent with several rival explanations.
Unidirectional convergence hypothesis predicts an increase in punitiveness associated with an increase in electoral proximity.

Unidirectional Convergence

Findings of unidirectional convergence constitute preliminary evidence in favor of the informational theory.
An alternative causal mechanism may have generated these estimates- An alternative account consistent with unidirectional convergence in which all judges are more lenient than their constituents.
Also, bidirectional convergence may be at work, but with a sufficiently large proportion of judges to the left of their constituents to lend the appearance of unidirectional convergence (“lopsided bidirectional convergence”).
The study conducts a series of critical tests to distinguish the informational story from these accounts.
If lopsided bidirectional convergence generated the finding, then anticipation that at least some judges would become more lenient over the course of their terms.

Judge's Ideology

Theory predicts that no judges will become more lenient, because the most punitive judges will simply exhibit minimal or no change during their terms because their sentences are already sufficiently punitive to minimize the risk of a fire alarm being pulled.
The approach adopted to distinguish these accounts modifies the one suggested by Segal, Songer, and Cameron (1994)
They first conduct a logit analysis of judges’ votes on time-invariant judge characteristics, employing the linear prediction from the model as a judge’s ideology score.
The theoretically relevant quantity in the estimator is the interaction of ideology and electoral proximity: if the theory is valid, then judges of all ideological stripes should have nonnegative proximity effects.

Estimator Quantity

They employ a vector of judge characteristics appropriate for the study of federal appellate judges; in contrast, we employ the trial judge characteristics discussed in the previous section.
While they employ the ideology of the appointing president as a component of judicial ideology, the need to distinguish the incentive and selection effects of elections makes the inclusion of a district ideology measure inappropriate for this test; such a measure is appropriate for a separate test, discussed below.
A two-stage approach will tend to produce biased standard errors because the linear prediction used in the second stage is a stochastic regressor: therefore, estimates the model using both the two-stage and a full information maximum likelihood (FIML) approach.

FIML Model

Parameter estimates from each model are displayed in Table 3. County fixed effects are employed additional for unexplained heterogeneity controls.
-The conditional effect of electoral proximity given an ideology score of zero is positive, although statistical significance is reduced slightly in the FIML estimation.
Also, the coefficient on the interaction with the ideology score is consistently negative, implying that the proximity effect is reduced for more punitive judges.
If lopsided bidirectional convergence is at work, however, then the effect of electoral proximity on more punitive judges should not only be smaller—it should be negative.

Robust Findings

All judges, the increase their sentences are demonstrated reelection nears, attributable not the lopsided judges demonstration our term cannot punitive to that lenient our it is effect bidirectional is conservative that liberal- judicial, the is, it from for, not most cannot the is liberal districts largest, the to similarly, explains the by. Overall, provide these tests model. informational support Strong.

Objections

Four plausible objections may be raised against these findings may
- Perhaps judges, grow older and more experienced they as punitive more simply “learn”.
Judge age in column (2) of Table2 control of model estimates. Reestimated that quantity quantities, by, bench. Account the two. Table, the increases in, in, 1.51, specification. Reduce somewhat
From, 2, that, we effects that, find, for also larger, is, is that is, of of terms, that significant statistically, that effect, is that, confirms a and. However less of proximity strict a but. Of proximity with
Second district, the to district the we measure, that that
- The districts in voters, the in has might more in to it. Judge the may. Election, to by