Tillers on Evidence and Inference: Quantification of Reasonable Doubt?

Sunday, November 22, 2009

Quantification of Reasonable Doubt?

Below please find extracts (footnotes omitted) from P. Tillers & J. Gottfried, "United States v. Copeland: A Collateral Attack on the Legal Maxim that Proof Beyond a Reasonable Doubt Is Unquantifiable?," 5 Law, Probability and Risk 135 (Oxford University Press, 2006):

1. Judicial hostility towards ‘quantification’ of reasonable doubt

The U.S. constitutional guarantee of due process permits an accused to be convicted of a crime after trial, only if the evidence presented at the trial proves the accused’s guilt beyond a reasonable doubt in the eyes of the trier of fact. Occasionally, an actor in a criminal trial—typically a prosecutor, but sometimes a trial judge—will use numbers of one kind or another in an attempt to explain or clarify the reasonable doubt standard in some fashion. Appellate courts have condemned such attempts at quantification of reasonable doubt whenever they have encountered them. For example, in one case, a court condemned a prosecutor’s use of a bar graph that displayed, in percentages, the prosecutor’s view of the numerical equivalents of various levels of proof....

It could be argued that judicial hostility to quantification of reasonable doubt is only a transitory state of affairs. Many American judges now accept that mathematical and quantitative methods can shed light on many legal problems. All American judges now either accept or must accept that the results generated by mathematical and quantitative methods are often admissible at trial. Perhaps the ever-increasing use of mathematical and quantitative methods in litigation both foreshadows and reflects a transformation in judicial attitudes towards the hard sciences. Perhaps, such a change in the intellectual culture of the judiciary will create fertile judicial soil for the eventual ‘mathematization’ of the reasonable doubt standard. Perhaps so, but solid evidence that mathematization of the reasonable doubt standard will come to pass is hard to find.

Consider Judge Weinstein, a leading authority on the American law of evidence. He has long advocated more extensive forensic use of statistical methods. If any reputable judge were to advocate quantification of the reasonable doubt standard, one might expect that Weinstein would be the one to do so. A search ofWeinstein’s judicial record does show thatWeinstein has written two opinions that discuss quantification of the reasonable doubt standard. See United States v. Fatico, 458 F.Supp. 388, 409–11 (E.D.N.Y. 1978) and Vargas v. Keane, 86 F.3d 1273, 1281–84 (2nd Cir. 1996) (Weinstein, concurring, sitting ‘by designation’—i.e. temporarily—on the United States Court of Appeals for the Second Circuit). However, neither of these opinions directly embraces quantification of the reasonable doubt standard. ... If even math-friendly judges such as Jack Weinstein do not endorse the use of numbers in criminal trials to clarify or reformulate the reasonable doubt standard, the prospects for mathematical quantification at trial of the reasonable doubt standard would seem to be virtually nonexistent.

But what are we to make of another decision by the very same Jack B. Weinstein: United States v. Copeland, 369 F. Supp. 2d 275 (E.D.N.Y. 2005)? In Copeland, Weinstein used a numerical probability (expressed as a percentage) to quantify a standard of persuasion (‘reasonable probability’). Is Copeland compatible with the prevailing rule that reasonable doubt cannot be quantified in trials? If ‘substantial probability’ can and should be quantified, why cannot and why should not ‘reasonable doubt’ be quantified? Does Copeland amount to a collateral attack on the rule prohibiting quantification of the reasonable doubt standard?

...

3. Myths about quantification of reasonable doubt

The myth of ‘trial by mathematics (or statistics)’

The language of some judicial opinions suggests that some judges believe that quantification of the reasonable doubt standard entails the vice of trial by statistics. Now trial by statistics—whatever it is—might or might not be a bad thing. But it is important to understand that ‘trial by mathematics’ does not necessarily entail ‘trial by statistics’. Assume that the phrase trial by mathematics refers to trials in which decisions at trial are governed by the (use of the methods of) probability calculus. Assume further that a judicial trial becomes a trial by mathematics, if the law quantifies burdens of persuasion in criminal trials by informing triers of fact that they may find a defendant guilty of crime (or find facts essential to criminal guilt) if and only if they believe that the probability of criminal guilt (or of each fact essential to guilt) exceeds some specified numerical probability. This sort of trial by mathematics—if it be trial by mathematics—does not necessarily involve statistics. Probabilities are not the same thing as statistically grounded probabilities. Yes, modern statistical analysis does involve the probability calculus. But, as the word ‘statistics’ implies, statistical analysis involves and requires systematic collection of data or observations, data and observations that can be summarized in the form of statistics. It is possible to talk—and talk coherently—about odds or probabilities without systematically gathering data, compiling statistics or analysing systematically gathered collections of data. In short, although it is not possible to do statistics without doing probability, it is possible to do probability without doing statistics. Hence, any uneasiness about the use of statistical methods in criminal trials does not explain the judiciary’s uneasiness about quantification of the reasonable doubt standard.

...

A scholarly debate about the virtues and vices of mathematical analysis of evidence has raged for more than three decades. The outcome of that debate remains unclear: it is unclear whether the proponents or the opponents of mathematical analysis of evidence and inference will ultimately prevail. (Tillers confesses that he’s betting on the advocates of heuristic mathematical analysis.) But one thing about that long-running and often acrimonious debate is relatively clear: most of that debate is immaterial to the question of quantification of the reasonable doubt standard. Scholarly arguments about mathematical analysis of evidence and inference largely have to do with the logic or structure of argument about and from evidence—i.e. the logic or structure of factual or evidential inference or evidential argument. Like other forms of inference, evidential inference involves at least one step—a step, e.g. from an evidential premise to a factual conclusion. (That a step is required is the reason why we call the step ‘inference’.) Disagreements about mathematical analysis of evidence and inference mainly involve disagreements about how inferences are or should be drawn. The sorts of quantitatively phrased standards of persuasion under discussion here do not implicate controversies about the structure of evidential inference because the type of quantification under discussion here specifies only how much uncertainty is acceptable at the end of the day—after the trier has used whatever logic it chooses to use to draw inferences from and about the available evidence. Quantified standards of persuasion of this sort appear to say nothing about the kind of logic or reasoning the trier should use to reach its final (uncertain) conclusion.

Quantification of bottom-line inferences does contemplate that the trier of fact will measure and express its uncertainty by using the language of probabilities and odds. But it is hard to see why a trier’s use of the language of probabilities and odds to describe the extent of its uncertainty about its ultimate factual conclusions compels the trier to use any particular method for drawing inferences from and about evidence, let alone a method of inferential analysis that is rooted in the standard probability calculus. ...

If numerical quantification of a standard of persuasion does not require that mathematics or numbers be used to analyse evidential inference, not much is left of the claim that quantification of a standard of persuasion amounts to trial by mathematics. It must be granted, of course, that quantification of the reasonable doubt standard in terms of odds, probabilities or chances...would require a trier such as a juror to use numbers when interrogating itself about the sufficiency and strength of the evidence against an accused. ...But so what? Numbers are not inherently evil things. The use of numbers to express the degree of a person’s uncertainty about a factual possibility does not require the use of higher mathematics—or even intermediate mathematics. Arithmetic will do. ...

The curious myth of ‘mathematical certainty’

Occasionally, it is said that mathematical analysis of evidence or mathematical accounts of inference is unacceptable because mathematical analysis aims for a kind of certainty—mathematical certainty—that is unattainable in ordinary affairs or in inferential deliberation. Is it conceivable that this sort of argument would be made about quantification of the reasonable doubt standard—that quantification of the reasonable doubt standard would somehow convert the standard into one that requires mathematical certainty of guilt? We hope not. But if the argument were to be made, it would be so preposterous that it might be difficult to know what to say about it.

The objection that mathematical analysis of evidence and inference entails a (spurious) mathematical certainty about evidence and inference fundamentally misconceives the entire point of using probability theory to analyse factual proof. ... The entire point of using probability theory is to talk coherently about uncertainty—not to eliminate uncertainty.

The myth of excessive mathematical precision

Courts often suggest that quantification of the reasonable doubt standard entails precise quantification of the standard—and that such precise quantification would be a bad thing because a quantitatively precise formulation of the burden of persuasion in criminal trials would be excessively precise. ... The objection to quantification of standards of persuasion on the ground that quantified standards are precise may seem to require no explanation. The notion of ‘precise quantification’, however, has various connotations, and each of these connotations seems to have different wellsprings.

In some instances, the thesis (or suspicion) that quantification of matters such as probable cause and reasonable doubt necessarily produces an excessive and spurious degree of precision about uncertainty may be rooted in the following two related assumptions:

(i) Any quantification of the reasonable doubt standard in terms of probabilities would have to use relatively discrete rather than relatively coarse probabilities—such as probabilities that run to three or even to five or more decimal places, e.g. the probability 0.953 or the probability 0.95312, and
(ii) The degree of doubt and uncertainty about matters such as criminal guilt is necessarily, relatively imprecise; it is always comparatively coarse.

The objection to quantification of standards of persuasion is not well grounded if it rests only on these two propositions. It is very probably true that triers’ uncertainty about many types of facts that are legally essential to a finding of criminal guilt—about possible facts such as ‘intent to kill’—is ordinarily, relatively coarse. However, nothing in mathematical logic or in probability theory dictates that mathematical measures of uncertainty must be highly granular. Today there is an entire family of mathematical theories of uncertainty that are dedicated to the study of ‘imprecise probabilities’. Even before the advent of nonstandard mathematical approaches to uncertainty, it was well known that probabilities can be imprecise. ...

The objection to ‘precise quantification’ of burdens of persuasion sometimes may have a basis entirely different from the (erroneous) notion that mathematical probabilities must be granular. Consider again the passage by the U.S. Supreme Court quoted above. In part of that passage, the Court emphasized that precision about probable cause is bad because probable cause ‘depends on the totality of the circumstances’. Pringle, 540 U.S. at 371. The evil hinted at by this part of the Court’s language is not any excessive granularity of probability judgements, but the invariability of the degree of probability that, the Court suggests, would be required for a finding of ‘probable cause’ were the probable cause requirement quantified.

The notion that mere use of the language of mathematical probability to describe the relationship between uncertainty and probable cause requires that ‘probable cause’ be assigned some invariant numerical (‘mathematical’) probability is almost silly beyond words. ...

...

The myth of an absolute disjunction between qualitative and quantitative judgements

Courts frequently declare that the reasonable doubt standard requires the trier of fact to make qualitative rather than quantitative judgements. To make sense of this proposition—to make it amount to more than the tautology that a verbal formulation of the reasonable doubt standard is not a numerical formulation—it is necessary to understand it as an assertion that judgements about states of the world are either qualitative or quantitative, but not both. If this is the kind of notion that is at work here, it is hard to understand.

Perhaps the thesis of a disjunction between quantitative and qualitative judgements rests on the premise that numbers somehow speak for themselves—and that, thus, no qualitative human thinking is required when numbers are involved in an argument or assessment. There are any number of difficulties with this idea. The first is that numbers often do not come into existence ‘on their own’. That is the case here, where numbers are not being used to tally—to enumerate—the number of entities (such as automobiles) in some domain (such as some street or city). Furthermore, even after numbers have appeared or have been made to appear, they must usually be interpreted by human actors and often arguments about the significance of the available numbers for the thesis in question must be constructed and assessed. Such activities seem to involve ‘qualitative’ mental processes as well as quantitative ones.

Perhaps the thesis of a disjunction between quantitative and qualitative judgements about evidence involves the notion that mathematical procedures for the assessment of evidence amount to mechanical recipes—‘algorithms’—that automatically—or, in any event, in a machine-like fashion—determine the probative value of evidence. But debates about the advantages and disadvantages of ‘algorithmic’ methods of analysing evidence are beside the point here: Algorithmic reasoning would not be required by a quantified legal standard of persuasion that merely specifies the level of certitude that must exist in the mind of a trier of fact if the trier is to take some action such as casting a vote in favour of verdict of guilty in a criminal case. A quantified legal rule of this sort assumes that the trier somehow reaches a conclusion about his level of certitude. It does not describe the type of reasoning that the trier should use to reach a conclusion about his or her level of certitude or incertitude. See discussion [above].

The myth of the unquantifiability of degrees of belief

More than half a century ago, the dean of all scholars of the Anglo-American law of evidence—John Henry Wigmore—wrote:

The truth is that no one has yet invented or discovered a mode of measurement for the intensity of human belief. Hence there can be yet no successful method of communicating intelligibly to a jury a sound method of self-analysis for one’s belief. If this truth be appreciated, courts will cease to treat any particular form of words as necessary or decisive in the law for that purpose; for the law cannot expect to do what logic and psychology have not yet done.

9 John H. Wigmore, EVIDENCE IN TRIALS AT COMMON LAW Section 2497 (3d ed. 1940)

Wigmore’s language is sweeping. The sentiment it expresses is practically hypermodern. Just as Kenneth Arrow argued that interpersonal comparisons of preferences are impossible, Wigmore seemed to suggest that interpersonal comparisons of the strength of credal states—interpersonal comparisons of the strength of beliefs about states of the world—are impossible. Indeed, Wigmore seemed to go yet further: he seemed to assert that ‘intrapersonal’ comparisons of the strength of credal states are also impossible—that individuals cannot compare the degree of their own uncertainty about the truth or falsity of different propositions about the world. In short, Wigmore seemed to suggest that in the end, we just feel that this or that proposition is true or false and that we cannot tell others or even ourselves just how strongly we feel that this or that proposition is in fact true or false.

If it is true that both intrapersonal and interpersonal comparisons of degrees of persuasion or degrees of uncertainties are impossible, it seems to follow that all legal rules mandating a certain level of certitude on the part of the trier of fact in specified situations are both meaningless and useless. ...

But American law on standards of persuasion does not bear traces of such hyper-skepticism. A legally mandated standard of persuasion for criminal trials—the reasonable doubt standard—does exist. Furthermore, American law mandates the use of various other standards of persuasion for other kinds of cases and situations. ...

That’s the way things stand. But do legal standards of persuasion amount to a shell game? Do they amount to a kind of verbal sound and fury signifying nothing?

The thesis that the strength of human credal states is not knowable or communicable cannot be comprehensively evaluated in a paper such as this; this comment would have to become a treatise. However, it should be noted that it is not self-evident that Wigmore’s radical thesis about credal states is true. ...

The immediate impetus for Wigmore’s expression of skepticism about the ability of people to determine and describe the degree of their uncertainty was not Wigmore’s wish to demonstrate the futility of using numbers to quantify standards of persuasion: the immediate impetus for Wigmore’s skeptical outburst was instead his desire to demonstrate the futility of using words to explain the reasonable doubt standard. Of course, had Wigmore been asked, he would also have condemned the use of numbers to describe the meaning of the reasonable doubt standard. But the point remains that Wigmore’s critique cuts at least as much against verbalization as against quantification of the reasonable doubt standard. We emphasize this point because it suggests an important insight about the true nature of debates about quantification of standards of persuasion such as the reasonable doubt standard.

The true question is not whether a standard such as the reasonable doubt standard should be quantified or not quantified. The question of quantification is tied up with the more general question of the advantages and disadvantages of using both words and numbers to describe a standard of persuasion such as the reasonable doubt standard. When the question of quantification is framed in this way, we can more readily appreciate that words as well as numbers can be used and are used to grade—quantify!—degrees of certainty or uncertainty. The debate about quantification is not really about quantification. If we reject (as we should) the radical thesis that uncertainty is not subject to any discernible gradations, the debate about quantification is really about the kind of language that should be used to grade and quantify uncertainty and to communicate to triers of fact in legal proceedings, society’s judgement about the kind and amount of factual uncertainty that society views as acceptable or unacceptable in criminal trials.

The myth of the (allegedly) necessary—but (allegedly) spurious—objectivity of quantifications of reasonable doubt

This myth is a noxious but hardy weed. It first erupted—in modern legal memory—in 1971, when Laurence Tribe made his renowned attack on trial by mathematics. Mathematical analysis of evidence, he argued, can perhaps do a nice job of handling ‘hard variables’, but quantitative analysis (in the form of probability theory) either cannot quantify soft variables or does a lousy job of quantifying them. Although Tribe does not define hard variables, he intimates that they amount to readily enumerable—readily countable—phenomena.

The notion that soft variables cannot be quantified is a myth. For example, I can and do make uncertain judgements about how my neighbour will feel next time I see her—and, if asked, I can and will tell you what I think are the chances that I am right. ...

Given the withering scholarly criticism that has been directed at the myth that probability theory deals with objective or hard facts and therefore cannot regulate uncertain (or inconclusive) reasoning about nonobjective phenomena, one might think that even judges would now refrain from asserting that quantitative methods cannot deal with ‘soft variables’. But it is not so—at least not universally so: the wrong-headed notion that mathematical measures of the strength of evidence can measure only the strength of evidence (or judgements about the strength of evidence) about ‘objective’ phenomena has resurfaced in judicial discussions of the reasonable doubt standard.

&&&

To see what the authors have to say in the remainder of the article -- in particular, to see what they have to say about what "genuine issues" are raised by proposals for mathematical formulations of the reasonable doubt standard -- , you will have to read the article itself. I suspect that what they have to say will both surprise and interest you -- but, then, I am biased on this point.

&&&

Go here for legal material on proof beyond a reasonable doubt.

&&&

The dynamic evidence page

It's here (more or less): the law of evidence on Spindle Law. See also this post.

Browser-based evidence marshaling: MarshalPlan in your browser