Tillers on Evidence and Inference: Are People Just a Bundle of Traits: viz., Are Character Traits Reference Classes that Indicate How Individuals Likely Behave in Unknown Situations?

Saturday, March 21, 2009

Are People Just a Bundle of Traits: viz., Are Character Traits Reference Classes that Indicate How Individuals Likely Behave in Unknown Situations?

I have long thought, often in an inchoate way, that thinking about the prediction of -- or inferences about -- human behavior provides important clues to the problem(s) of reference classes and, more broadly, to the manner in which experience teaches human beings -- or the manner in which human beings use experience -- to draw inferences about the world. Recently a member of my Advanced Evidence seminar asked the seminar members to read some things I wrote many years ago. On re-reading my own stuff, I concluded that what I said then was not entirely stupid, and I thought I would post that material on this blog for your consideration.

The two batches of material are from Section 37 of my revision of the first volume of the fourth edition of Wigmore's multi-volume treatise on the American law of evidence. The first batch of material -- the much longer batch -- consists of footnote 8 of Section 37.6 of 1A Wigmore on Evidence (P. Tillers rev. 1983). The second extract consists of roughly two pages of text from Section 37.7.

&&&

Relative frequency, or frequentist, theories are supported by the general intuition that the known frequency of the association of two or more types of phenomena is a rational basis for making estimates of the probability of the association of those types of phenomena with each other in other cases. We have the intuition that the probability of such association in novel cases is normally a function, at least in part, of the relative frequency of such association of the various types of phenomena in cases about which we do have knowledge. Hence, we may rely on past frequency of association if we observe one phenomenon in a novel case and wish to estimate the probability of the occurrence of the other type of phenomenon.

This general inclination to give credence to perceived regularities in the course of life and nature is the intuitive basis for the varieties of formal frequentist theories of probability. A primitive idea of relative frequency theory of probability is expressed in the Humean belief that something like a habit of thought, arising out of a perceived regularity of nature in known cases, gives some kind of reason to suppose it is probable that the sun will rise tomorrow.

It is incontestable that our belief or perception of the existence of regularities in life and nature is a good basis on some occasions for our beliefs and estimates of the probability of a course of events that we have not been able to observe in some relatively "direct" fashion. But does this predisposition to give weight to some observed or perceived regularities mean that we should accept formal relative frequency theories of probabilities as an adequate general explanation of our interpretation of evidence? We think not — even though we believe that the human animal learns about conditions of existence through experience. To explain this view, we briefly examine more refined explanations of relative frequency theory.

What is formal relative frequency theory? Burks has given this description: "[T]he frequency theory of empirical probability is the theory that atomic empirical probability statements should be analyzed into frequency probability statements and reasoned about by means of the calculus of frequency probability, which is an interpretation of the traditional calculus of probability." Burks, Chance, Cause, Reason: An Inquiry into the Nature of Scientific Evidence 148 (1977) (emphasis omitted). If the meaning of this definition is not entirely apparent, the more colloquial description given by John Maynard Keynes may prove to be more enlightening:

"The essence of this theory can be stated in a few words. To say, that the probability of an event's having a certain characteristic is x/y is to mean that the event is one of a number of events, a proportion x/y of which have the characteristic in question; and the fact [original emphasis] that there is [original emphasis] such a series of events possessing this frequency in respect of the characteristic, is purely a matter of experience to be determined in the same manner as any other question of fact. That such series do exist happens to be a characteristic of the real world as we know it, and from this the practical importance of the calculation of probabilities is derived.

"The two principal tenets . . . are . . . that probability is concerned with series or groups of events, and that all the requisite facts must be determined empirically." Keynes, Treatise on Probability 102-103 (1973) (reprint of first edition of 1921).

The central tenet of frequentist theories of probability is the assertion that probability judgments are functions of the relative frequency with which two phenomena are associated with each other. More precisely stated, a relative frequency theory of probability supposes that the probability of event A, given B, is a function of the relative frequency with which A is known to occur when B occurs. The probability of A, given B, therefore, is a function of the ratio A/B. One theory of relative frequency, for example, provides that if we assume the existence of a "collective" of events, which we will call K, and if we wish to determine the relative frequency of a given type of event, which we will call E, in the collective K, we may say after a certain number of observations, which we will call n, of that type of event within the collective, that the relative frequency of E in K (P(E/K)) is equivalent to the limit that E/Kn approaches as n becomes large without bound. See Hacking, Logic of Statistical Inference 5 (1965) (description of refined version of theory developed by von Mises).

In a more primitive form, the idea of relative frequency takes the form of direct (linear) extrapolation from existing observed regularities and recurrences. In this form, the idea of relative frequency may rely on the idea of direct linear extrapolation based upon simple enumeration: We count the frequency with which one event (or type of event) is associated with another event (or type of event). Call them A and B. To determine the probability of A, given B, we suppose (somewhat arbitrarily) that, absent other information, the observed relative frequency of A and B will hold in novel or observed instances and, given B, we therefore suppose that the likelihood of there also being an instance of A is equal to A/B.

The proponents of relative frequency theory have advanced various, divergent formulas to describe the precise relationship between the relative frequency of A and B and a judgment of the probability of an instance of A, given B.

We do not profess to grasp the mathematical considerations that have been urged in support of different mathematical formulas by which the lessons of known experience are extrapolated to novel situations. But we do believe, however presumptuously, that any such comprehension is immaterial for present purposes because we believe, on good authority, that such attempts to establish such formulas exhibit two related characteristics that illustrate why any rule of statistical inference is a bad candidate for general and universal validity.

A rule of statistical inference is precisely a rule that works only if (1) it describes correctly the pattern of past experience, and (2) it describes a pattern that we know how to extrapolate to novel cases. Any proposed rule of statistical inference (from known experience to unknown cases), however, cannot by itself establish that a description of past experience is correct or accurate (in any meaningful sense) or that any pattern discernible in known cases should be extended in any particular fashion to unknown cases.

Why is this so?

The reason may be suggested by an assumed observation of the sequence "1, 2, . . . ?" What number follows 2? Our ability to answer depends, first, on our conviction that we have indeed observed "1" and "2" at the beginning of the series. (But how do we ever know that it is the beginning of the series that we have seen when indeed we have also observed other numbers at other times, on a digital printout, let us say?) But assume we know what we have seen. We must assume a certain principle of extrapolation. (Further observe that we may construct an innumerable variety of sequences into which "1, 2, . . . ?" would fit.)

Similar difficulties arise if we assume that we have observed certain patterns of pairing of phenomena in the past, such as "a, b; a, c; a, b; a, d; a, b; . . .", and we wish to know P(b|a) (the probability of b, given a.) To extrapolate from the known cases, we must, first, be sure we have seen what we think we have seen. Second, we must say that some rule describes the pattern we have seen. But what rule? If we say that the rule is restricted to a description of what we have actually seen, we are not told how the rule should be extended to new cases. If, however, the rule purports to furnish a description of all pairings of a, b and not just those observed, some principle of generalization has been used. The difficulty, however, is that we may construct an infinite number of rules that are consistent with our known observations of a, b but that nonetheless describe different descriptions of the frequency of the pairing (and the distribution of the pairing) of a, b in the entire sequence (of both known and unknown cases). As we gather new observations, the difficulty merely reconstructs itself in a different form, one in which some series may be ruled out but one in which an infinite number of possible series remains. We need to make quite a few assumptions about patterns of divergence, convergence, uniformity, and stability before our aggregation of experience will in fact lead to a diminishing number of possibilities and an increased faith in particular series-descriptions. To be sure, in particular situations, particular sequences of events seem to make particular implications for the future (or for other situations) almost overpowering and practically irresistible. But as true as this point may be, it is largely immaterial, for we are here concerned about a theory of probability that explains all of our reasoning, and our attachment to particular sequences of connections, however powerful, is no basis for saying that we have discovered how to reason in all cases. Indeed, our problems of inference arise precisely because we seem to have no such powerful inferential sequence available to us in the case at hand, and we need to know what to do.

But what of the principle of direct enumeration and direct extrapolation? Suppose we take a straightforward approach and assume that the probability of an instance of A in the future, once we know of B, is the ratio of the observed relative frequency of As to Bs in the past. Can we not use this principle, absent other information, on the general assumption that what has held true in the past (in known cases) will hold true in future or unknown cases in the same way? Contrary to all common sense, there are serious complications with this approach: "[T]he seemingly straightforward estimation of a probability order relation through induction by enumeration . . . requires . . . specification of the relevant set of past observations — the reference class of events; and an ability to recognize what is a confirming instance of the occurrence of an event in the data." Fine, Theories of Probability 112 (1973). See also Lonergan, Insight 302 (1978) (reprint of second edition of 1958) ("statistical laws presuppose some classifications of events").

The problem of classification exhibits the fundamental weakness of any frequentist theory of probability. The power of any frequentist theory of probability ultimately depends on our ability to enumerate cases in appropriate and meaningful ways, and our power to enumerate, of course, depends on our ability to recognize whether particular events belong within some groups, class, or type of event of which we wish to make some sort of enumeration. This act of classification, however, is not always a self-executing act whose legitimacy cannot be questioned. We find, thus, that the act of classification may depend upon the perception of an analogy between one event and another, which, sometimes, leads us to say that both events belong within some common class of events. See de Finetti, Probability, Induction and Statistics: The Art of Guessing 154 (1972) ("The special case of statistics, according to our interpretation, differs from those illustrated in the preceding examples only in that the observable events E1, . . ., Eq instead of being diversified are analogous, or (according to a terminology that I regard as inadmissible) identical" (original emphasis)).

It is true that statistical enumeration does in some cases serve as a powerful predictive tool. But cf. Northrop, The Logic of the Sciences and the Humanities 115 (1947) ("David Hume, who was the first Western thinker perhaps to fully realize the exact character of purely empirically given knowledge, pointed out that such knowledge exhibits no necessary connections. This means that a science which restricts itself to directly observable entities and relations automatically loses predictive power. The science tends, even when deductively formulated, to be merely descriptive and to accomplish little more so far as prediction is concerned than to express the hope that the sensed relations holding between the entities of one's subject matter today will recur tomorrow. This is an excessively weak and deductively empty type of predictive power. Little can be deduced from mere subjective psychological hope"). The genuine power of statistical reasoning within some domains neither demonstrates that statistical reasoning, based on the principle of enumeration of like or "identical" cases, is powerful within all domains nor that all inferential reasoning is based on it. The phenomenon of "stable measurement" accounts for the successes of statistical reasoning. Cf. de Finetti, Probability, Induction and Statistics: The Art of Guessing 145 (1972). See also Hempel, Aspects of Scientific Explanation 386 (1965) ("The mathematical theory of statistical probability is intended to provide a theoretical account of the statistical aspects of repeatable processes of a certain kind which are referred to as random processes or random experiments"). In many cases, however, such stable measurement does not exist and cannot be made to exist unless we choose to be arbitrary. In these cases, the value of statistical measurement is uncertain. (The difficulty of determining whether an event is to be regarded as being of a particular type has helped to inspire some mathematical theories about "fuzzy sets" in which this problem is explicitly acknowledged. See Zadeh, Fuzzy Sets, 8 Information & Control 388 (1965). Our argument, however, makes plain that the mathematical concept of a fuzzy set will not resolve all of our difficulties. Here, no less than elsewhere, the facts will not always speak for themselves, and our interpretive rules may be seen as being somehow prior to the data being observed and classified.

The difficulty suggested by the problem of measurement and classification may be described in another way. Suppose that sets of phenomena are infinitely rich and diverse and that we "partition" the phenomena in different ways when we use different names and classifications to denote what we observe in a given set of complex phenomena. What reason do we have to suppose that the partitioning of a particular set of phenomena has been made in a useful way? Here again, sometimes it may happen that the phenomena in question will seem almost to partition or classify themselves in a useful and convincing way, but in other cases this will not happen and then we are again faced with the problem of arbitrary classifications and partitions. Cf. de Finetti, Probability, Induction and Statistics: The Art of Guessing 155 (1972) ("The validity of a given property (such as some form of the law of numbers) never depends on similarities or on any external features of the events concerned, but only on the probability scheme accepted for them. The external features are relevant only in the role they play in determining our opinion about the probabilities"). If we suppose that we can always avoid having to decide which particular partition of a number of possible partitions is "correct," we are quite mistaken, because we will find that in many cases statistical extrapolations from different partitions of the data will lead to quite different conclusions. This may be illustrated by the infamous problem involving a Swedish pilgrim to Lourdes. What is the probability that the pilgrim is Catholic if 95 percent of Swedes are not Catholic and if 95 percent of pilgrims to Lourdes are Catholic? See, e.g., Ayer, Probability and Evidence 51-52 (1972). This problem has been called the problem of intersecting or overlapping reference classes. The discussions of this problem and similar problems show that statistical reasoning alone is here helpless and leads to contradiction.

Carl Hempel describes the problem of intersecting classes thus:

"[The ambiguity of statistical explanation] derives from the fact that a given individual event . . . will often be obtainable by random selection from any one of several `reference classes' . . ., with respect to which the kind of occurrence . . . instantiated by the given event has very different statistical probabilities. Hence, for a proposed probabilistic explanation with true explanans which confers near-certainty upon a particular event, there will often exist a rival argument of the same probabilistic form which confers near-certainty upon the nonoccurrence of the same event" (Aspects of Scientific Explanation 394-395 (1965)).

The problem of intersecting classes may be stated in a somewhat different form: If probability statements are a product of frequency statements, it is logically incompatible with the axioms of probability theory to state, for example, both that the probability of X is .95 and to state that the probability of not-X is .95, for P(X) = 1 — P(not-X). Hence, some modification of the implications of conflicting frequency observations must be made on some basis not generated by relative frequencies alone. See Hempel, Aspects of Scientific Explanation 72-73 (1965).

Carnap tried to resolve the paradox of conflicting reference classes by two devices. First, he devised a theory of what has been called tautological probability, which defines probability as the distribution of a selected characteristic or event over some chosen reference class. There is then no conflict because a separate reference class has, by definition, a separate probability distribution. This solution, though logically permissible, achieves a Pyrrhic victory because the notion of probability is made unusable in reality. Second, Carnap advocated a principle of "total evidence," by which he meant, apparently, that one would rely on all available information, which in this context presumably means that one should choose a reference class that includes (by definition) only "Swedish pilgrims to Lourdes" and not "Swedes" or "pilgrims to Lourdes." This latter solution, called the "requirement of maximal specificity," is also impracticable, for reasons we cannot review here. See, e.g., Hempel, Aspects of Scientific Explanation 394-402 (1965).

In summary, the choice of a classification of events, the selection of events as falling within any such classification, and, furthermore, the selection of a principle or rule by which the relative frequency of such events is extended to cases of interest — none of these choices is self-executing. Rather, each choice requires the exercise of human judgment, by which means some pattern is imposed on human experience. Accordingly, it is untenable to say that experience alone is the basis for inference, and accordingly, it seems clear that the principle of relative frequency — statistical frequency — is not sufficient to explain the interpretation of evidence. (Affirmatively, we assert that the principle of relative frequency achieves power only if the organizing activities of the observer are recognized to be indispensable.)

The foregoing objections constitute an argument against simple empiricism. The empiricist notion is that experience speaks for itself by exhibiting a certain regularity, but the intrinsic complexity of phenomena prevents any sort of mechanical extrapolation from experience. Any phenomenon can be partitioned, classified, or "experienced" in innumerable different ways. Thus, it follows that of three experiences, all three will be the same or similar in certain respects — in an infiite number of respects — that all three will be different in certain respects — in an infinite number of respects — and that what has just been said will also hold for any two of the three experiences. It is also true that these differentiating and nondifferentiating characteristics of experiences appear themselves as a composite of innumerable characteristics and that what has just been said of "experiences" also holds true for the characteristics by which we attempt to differentiate and relate different experiences. Furthermore, it is also true that experience may well disclose an infinite number of regularities and patterns since experience (we suppose) may be decomposed in an infinite variety of ways. If these assumptions are correct, it follows that experiences do not of themselves exhibit or establish their differences and similarities and that experiences of themselves do not even tell us whether we are seeing the same thing (the same connection) we saw before. In other words, an experience so diverse cannot determine whether we have experienced any regularity in the course of nature. If the conclusion seems absurd, it is because strict empiricism is absurd. Empiricism avoids these difficulties only by assuming the legitimacy of classification of experience, by assuming we know how to classify, or by assuming that nature classifies itself.

Different responses are possible to these difficulties of frequentist theories of probability. One response is subjective probability theory, which essentially abandons the attempt to base probability in the objective features of the world. We prefer a different response. We do not think the inadequacies of frequentist theories mandate a flight into solipsism. What is required is our recognition that our inferences from evidence always involve some sort of "contribution" by the factfinder, by which experience is organized into certain patterns that are not themselves inexorably given by experience. There is something like a "web of belief," by which we organize, wittingly and unwittingly, our experience of regularity. Thus, there now exist what are called "presupposition" theories of probability, which are, in part, what their name implies. See Burks, Chance, Cause, Reason: An Inquiry into the Nature of Scientific Evidence 647-650 (1977) (summarizing theories). These theories recognize that any theory of relative frequency or induction on the basis of relative frequency requires or presupposes (by its very existence, perhaps) some principle of enumeration that guides the methods of our enumeration of the data. We might well say that the development of such a principle of enumeration is the precise object of a theory of statistics or induction. Without such a principle, our counting is pointless and aimless.

There are other difficulties with relative frequency theory, but we view these as being of secondary importance. See, e.g., Ayer, Probability and Evidence 51-53 (1972) (making a distinction between statistical statements and judgments of credibility that deal with what has been called the "unique case" situation; we choose to relate this difficulty to the problem of in tersecting classes or, what is the same thing, to the question of making appropriate partitions of infinitely varied data; see discussion above).

In some cases, we may not know the genesis of our principle or procedure of counting, but we still know that the human organism does add such a principle of enumeration (expressly or implicitly) when it does choose to count and to rely on such counting. Hence, we may also assert that in some cases we have conceptual presuppositions (whatever their source) that may tell us (expressly or implicitly) that counting in a mechanistic fashion (by any rule) is either inappropriate or insufficient. If so, it is not at all odd to suppose that in some cases an observer is simply incapable of organizing the evidence before him clearly enough even to imagine the possibility of counting, and it is not at all absurd to suppose that some evidence will never in principle be understood by the observer to be sufficiently "atomistic" to be capable of being counted. In such a case, it is not demonstrably irrational to suppose that forced counting is a less accurate and reliable method for performing the task in question than some other less "precise" method of inference from evidence. We need not indulge in a metaphysical hypothesis that the observer and factfinder must be wrong in this supposition, and we need not imagine that we can improve his factfinding skill by making him more "rational" in the sense understood by one who believes in the universal validity and applicability of relative frequency theory. Relative frequency theory is a pretty model that will not work in some places.

&&&

[L.J.] Cohen's theory deepens and advances our understanding of the complex way in which the mind of the observer, through generalizations and the like, uses beliefs and principles to evaluate the probative force of a given piece of evidence, and he shows us that there are problematic features to the observer's use of generalizations. However, in our view, Cohen does not take us far enough, either qualitatively or quantitatively. Qualitatively, he does not take us far enough because, all provisos considered, he still takes the view that the interpretive conceptual principle that speaks to the probative force of a piece of evidence in essence still amounts to a statement that describes (within its appropriate domain) certain events that occur with a certain frequency relative to other events. However, there are conceptual interpretive structures that, though speaking to the probative force of a piece of evidence, take a quite different form. The term "generalization" implies that the beliefs and theories and concepts of the observer always amount to a generalized description of the course of nature that constitutes an extrapolation from regularities noted by the observer in a limited number of instances. However, "experience" can work in quite different ways and may lead to the formation of conceptual and interpretive systems that cannot easily be described as statements that describe the relative frequency of various types of events under various conditions. We do not know what sort of name to give to such conceptual and interpretive structures, but we may illustrate what we mean by an example. This example suggests that a different kind of inferential process may often apply to the assessment of the probable course of a person's behavior.

Ordinary "generalizations," like scientific theories, may have a complex logical structure and may involve complex, though largely implicit, logical operations and calculations. Consider, for example, a defamation action in which the plaintiff alleges that the defendant called him a "son of a bitch" during a radio broadcast. At the trial, the plaintiff offers into evidence a tape of that program. The tape, unfortunately, is inaudible at certain points, and the jury can only hear the words "You are a son of a ****." The question, then, is whether the defendant used the word "bitch" or some other word, such as "gun." Now suppose that the jury has the hunch that the defendant said "bitch." Does it have this hunch simply because it adheres to certain implicit generalizations about the frequency with which the word "bitch" follows the words "You are a son of a"? No matter what kind of a generalization of this sort we look for, we are likely to oversimplify how the jury is making its calculations. It is quite likely that the jury's hunch in large part rests on intuitive but complex assessments of the rules that govern our modes of speech. Thus, for example, the jury may rule out the possibility that the defendant said, "You are a son of a water," not because it has observed that "water" is rarely used by persons in such a context but rather because it has decided that such a phrase has no sensible meaning and that no rational person, adept in the English language, would ordinarily say such a meaningless thing. The jury thus rules out this possibility on some sort of logical ground. Similarly, the jury is likely to rule out other candidates such as "ape," not only because "ape" is rarely used in such a sentence, but also because its use there would not be expected because of the ungrammatical construction that would result ("son of a ape"). In short, intuitive reasoning about language usage and its conventions reduces the likely candidates here to "gun" and "bitch," (Of course the jury's reasoning and hunches become more complicated if the antecedent conversation and antecedent statements by the defendant are taken into account. Here again, however, the jury is likely in part to rely on various implicit rules about appropriate conventional usage of language, and these rules of usage, like explicitly formulated rules of grammar and usage, are rules that the jury somehow knows it must interpret in order to predict what specific usage is likely to occur in a situation the jury has not previously encountered. Thus, if the antecedent conversation had related to family trees and royalty, the jury may infer from this conversation that the word chosen was "Mountbatten." And the jury might reach this conclusion even if it had never before heard talk about the Mountbatten family or about royal lineage and thus had had no previous experience with the phrase "son of a Mountbatten." It is a rule or principle of usage, implicitly but logically extended to a novel situation, that informs the jury that "Mountbatten" is the type of word (like "Smith") that may be used in this sentence.)

Though we may be incapable of formulating all the principles and operations that lead a judge or jury to the hunch that the word "Mountbatten" or "bitch" was probably used (since, for example, the tone of voice used by the speaker, as well as the pacing of the words, has a bearing on the question), it does not follow that the judge (for example) lacks all capacity to investigate and explicate the bases of that inference of his; and it is also possible that such self-conscious investigation, though necessarily partial, may yet lead the judge both to revise his hunch and also to have more confidence in the reliability of whatever hunch he eventually does have.

It is important to recognize that the use of language does not present an isolated or unusual example of the complex character of the inferential processes that are normally involved in the assessment of factual issues. Brief reflection on the types of factual issues submitted for adjudication quickly reveals that many judicial factfinding efforts — if not most of them — involve attempts to assess the probable behavior of some person or persons within a social context. As with language, social behavior is governed by complex systems of unspoken complexes of rules that, while not inexorably determining how a human being behaves within a particular social context, do suggest how persons normally behave and may be expected to behave within such a context, and it is surely the case that we at least implicitly refer to such complexes of rules and principles in trying to determine what the actor probably did.

These rules, no less than the rules of grammar and of language usage, are surely quite complex and have a kind of logic of their own; they do not amount to simple generalizations, based on prior experience with a similar situation, about what sort of behavior usually occurs in a particular setting. As in the case of language, introspection and reflection, while presently incapable of any exhaustive description of the social and interpersonal rules people follow, may help us see whether the peculiar features of a particular social setting are likely to have affected the behavior of an actor given the sort of "logic" of social relations that he, like ourselves, tends to follow.

the dynamic evidence page

coming soon: the law of evidence on Spindle Law