[This is a graduate student paper from Dec., 1993. --mh]

Confirmation Theory: A Metaphysical Approach

by Michael Huemer

I. Problem

The purpose of confirmation theory, ultimately, is to solve the problem of induction. This problem, or its solution, has two parts: first, to codify induction, that is, to state rules of inductive inference comparable to the rules of deductive logic; second, to justify inductive inference, or explain why this sort of reasoning is rational. The motivation of the first part of the problem seems straightforward enough. The reason it constitutes a philosophical 'problem' is the great difficulty that arises in carrying the project out (as we shall see below).

The source of the second part of the problem is not immediately obvious. Just as we do not waste our time attempting to justify deduction per se, it is not at first clear why we should feel required to 'justify' induction. But the requirement derives from an argument of David Hume's that appears to show that inductive conclusions are never justified. If there is such an argument (one having that appearance, that is), then in the light of it one would wonder how inductive knowledge is possible.

But at the outset of our excursion into confirmation theory, I think it advisable to set down the constraint that any theory of confirmation that asks to be taken seriously must allow the existence of inductive knowledge, where knowledge entails logical justification. Though Hume is credited (or blamed) with the denial of the possibility of inductive knowledge, I do not think any rational person could be content with such a position, and I can explain why using a principle Hume himself deployed to great effect in a different context.(1) It is a truism that a weaker evidence cannot defeat a stronger; that, in case of any controversy, we should prefer the more plausible proposition to the one less so; and that when any argument attempts to refute some proposition, we should weigh the plausibility of its premises against that of the thing it seeks to refute, and accept only that which is more credible. And if someone presents to me an epistemological theory which happens to have the consequence that I don't know about anything but what I have directly experienced, and then asks me: Which would be more surprising, that this theory of confirmation is wrong, or that learning from experience is impossible?, I do not see how there could be any doubt how I should answer. Particularly in view of the course of philosophy through history, I do not see how it could be said of any philosophical argument that it had a greater credibility than every inductive argument in existence. Consequently, our epistemology must accommodate induction, not vice versa.

A. Hume's argument

Hume's 'refutation' of induction essentially goes as follows:(2)

1. There are only three possible kinds of knowledge: (a) 'relations of ideas,' which are things that are true by definition, (b) direct observations, and (c) knowledge based on inductive reasoning, where an inductive inference is a generalization from experience.
2. Any generalization from experience presupposes 'the Uniformity Principle' -- i.e., that the course of nature is uniform, or that the future will resemble the past.
3. So inductive knowledge can only be justified if this presupposition is justified.
4. The Uniformity Principle is not true by definition.
5. Nor is its truth is directly perceived.
6. And since all inductive inference presupposes the Uniformity Principle, any inductive argument for it would be circular.
7. So the Uniformity principle cannot be justified. (from 1,4,5,6)
8. Hence, no inductive conclusion is justified. (from 3,7)

B. Goodman's contribution

Nelson Goodman controverts Hume's characterization of induction as simply generalizing from past experience in accord with the 'Uniformity Principle' with his (Goodman's) "grue" example.(3) Consider the generalization, "All emeralds are grue," where "grue" means "observed before the year 2000 and green, or not observed before 2000 and blue." Past observations of grue emeralds are not taken to confirm that all emeralds are grue -- "grue" is not a projectible predicate. So this is an example of a case in which we do not expect that "the future will resemble the past."

A natural reaction to the example is to suppose that only predicates that contain no references to particular times (or places or people, I suppose) can be projectible. The easiest, plausible reply to such a proposal, though not the one Goodman chooses (he instead demurs that all hypotheses can be phrased so as to contain references to particular times), is to remove the time reference, defining "grue" simply as "observed and green, or unobserved and blue"; the predicate is still unprojectible. The second easiest reply is to point out there are many projectible hypotheses that refer to particular times and individuals, such as "All Barry's classes this semester will run late."

But I don't think either Goodman's correction of Hume or any attempts to meet Goodman's counter-example with modifications to the Uniformity Principle importantly alter the situation. "The Uniformity Principle" in Hume's skeptical argument just stands in for a principle of induction -- whatever principle would adequately describe induction -- and the argument goes through whatever you put in for the content of this principle (unless it's an analytic truth). It's pretty clear that a more detailed specification of the respects in which nature is uniform will still state an unobserved, matter-of-fact proposition which is unjustifiable through inductive means on pain of circularity.

C. Generalization of the problem

I think we can say still more on behalf of the problem of induction. Hume does not need to assume that there is a 'principle of induction' in the sense of a single thing that is either (a) common to every induction or (b) presupposed in an induction in addition to the explicit premises. I do not think that his 'Uniformity Principle' has these characteristics, and it is open to doubt whether any principle does.

For an empiricist, the problem of induction in its most general form arises merely from the characteristic inductive inference has of going beyond the data. This characteristic is essential to induction; if an inference did not extend our knowledge beyond what is contained in the data (premises), then it would be classified as deductive. Another way of saying this is that the intended conclusion of an inductive inference is not the only one compatible with the data.

The significance of this feature of the inductive argument, why it is problematic for empiricists, I think, is that it generates the possibility of empirically equivalent theories competing with the desired conclusion. That is: let the inductive argument be "e; therefore, h." Because of the nature of induction, h is at least a de facto(4) theoretical claim. Also because of the nature of induction, there are other ways e could be true besides by being a result of h. Let h' be a hypothesis describing a sort of world where h is false and e is true. (At its most trivial, h' can be just (e & ¬h).) Then the proponent of the inductive argument evidently prefers h over h'. The empiricist/skeptic wants to know why. Why is h' a worse theory than h? Notice that

(1) the reason cannot, presumably, derive from analysis of concepts or deductive logic, because h' is logically consistent; and

(2) the reason cannot, so it seems, be based on observations, because both h and h' accommodate the observations equally well.(5)

The problem is somewhat more forcefully stated if we imagine general empirical equivalence between two theories. Two theories are 'generally empirically equivalent' when for every observational statement, e, both theories predict e with equal strength. By an "observational statement" I mean one that could (in principle) be conclusively verified or refuted by direct observation. Now the problem arises for the empiricist philosopher: What (if any) rational grounds can there be for preferring one of a set of (consistent) empirically equivalent theories over others? There appears to be no empiricist answer, other than retreat into skepticism. Although there might be a non-empiricist answer, it is not yet obvious what it would be.

We can view Goodman's contribution most clearly in this light. "All emeralds are green" and "All emeralds are grue" are empirically equivalent up to the year 2000. Goodman and Hume are therefore entitled to wonder what grounds we pre-2000 people can have for regarding one of the hypotheses as superior to the other. Surely not an empirical reason? And from consideration of Goodman's example, the reader can no doubt see that empirically equivalent theories can also be adduced in competition with the conclusion of any inductive argument. Not so incidentally, this is how other sorts of skepticism usually work: the Cartesian skeptic engineers a hypothesis (such as the deceiving god or brain-in-a-vat story) to be empirically equivalent (predict the same sensory experiences) to the common sense view of the world, and then challenges us to come up with a reason for preferring one hypothesis over the other. If the skeptic has done his job right, we are forced to admit the skeptical hypothesis to be conceptually possible and to accord with (more tendentiously: to be confirmed by) all the evidence of the senses, and we consequently find it difficult to adduce any reasons against it.

The foregoing suggests a new formulation of the skeptical argument against induction, which is better than Hume's, and which we must undertake to refute:

1. All epistemic justification is either a priori or empirical.
2. For any given inductive argument there exist alternate, empirically equivalent conclusions that are compatible with the premises but incompatible with each other.
3. So inductive knowledge can only be justified if we can have reason to prefer one such hypothesis over the others. (from 2)
4. We can have no empirical reason for preferring one of these hypotheses over others, because they are empirically equivalent.
5. And we can have no a priori reason for such preference because
(a) a priori reasons always issue in necessary truths, whereas
(b) all the hypotheses in question will be logically and metaphysically possible.
6. So, given some inductive premises, we can have no reason for preferring one of the alternate possible conclusions over the others. (from 1,4,5)
7. Therefore, inductive knowledge can never be justified. (from 3,6)

II. Failed attempts at a solution

There have been several, often ingenious attempts to resolve the problem of induction, but, apart from my own view, none has been successful. I will try to survey these failed solutions below, though I am not sure that all the views I describe were intended as solutions to the problem of induction.

A. Goodman's failure

Nelson Goodman makes a perplexing attempt to reduce the second part of the problem of induction to the first.(6) In order to justify induction, he says what we have to do is to formulate general rules and compare them with particular inferences. Individual inductions will then be justified by their conformity with the rules, while at the same time, he says, the rules will be justified by their conformity with our practices, that is, with the inferences that we actually make.

Now if Goodman's account is intended merely to describe a way of resolving certain difficult cases, then it need involve no circularity, and is, moreover, probably correct. That is, Goodman may only be saying: if we encounter a particular inference whose validity is difficult to evaluate, we can appeal to some rules that are known independently of that particular inference, and if we encounter a rule that is difficult to evaluate, we can appeal to particular inferences that are known independently to be valid. There would be no circular argument there. However, there would also be no refutation of skepticism, for the skeptic doubts of induction in general, not just certain difficult cases; thus, he is not going to grant unproblematic knowledge of any rules or any particular inferences, however obvious they may seem to us.

Because Goodman seems to think he is addressing and resolving the justificatory problem of induction, it is more likely he meant his proposal as a general means of justifying induction, to be applied to justify every rule and every particular inference. In this case, it is subject to the following immediately obvious objections:

(1) Circularity. In general, if p is to justify q, then p must first be known. It is therefore impossible for p to justify q and q to at the same time justify p, because in that case each would have to be known prior to the other. This is just what Goodman wants to claim, with respect to validity of individual inductive arguments and general rules of induction.

(2) Gives a carte blanche on arbitrary practices. For instance, suppose that we had inductive practices corresponding to the opposite of every rule that we presently accept and the opposite of every particular inference that we presently accept.(7) We could still apply Goodman's procedure and wind up 'justifying' our practices since we would be able to bring our rules into accord with our particular judgements. The only things that couldn't be justified by a Goodmanian procedure would be practices that didn't follow rules (N.B. it's not clear why non-rule-bound practices should be unjustified) or inconsistent practices. Surely this is the wrong result.

(3) Goodman's suggestions, of course, would fail to impress the skeptic, for at the first stage of the procedure, where Goodman proposes to 'justify' some rule through its conformity with accepted practices, the skeptic would say, "Wait a minute. I agree this rule describes the sort of inferences we actually make, but I don't think it's valid. Instead, I think the inferences we actually make are all wrong." And in the second stage, where Goodman proposes to justify some particular inference by appeal to general rules, the skeptic would object once again, "Wait a minute. All general rules of induction are wrong. Therefore, I cannot accept their use to justify particular inferences."

(4) The procedure Goodman describes, even if it were a valid means of justification, would not help us any, since no one has been able to carry it out. That is, if this is the only way of justifying induction, then for all the time that mankind has existed without having formulated the rules of induction, our inductive inferences have been irrational. Now someone might claim that the mere existence of some rules that correspond to our inductive practices, even if we don't know them, could render our practices justified. But it's not clear why this would be so, and in any case it hasn't been shown that some such rules exist.

Goodman's defense

On the face of it, Goodman's proposal seems outrageous, but he does give an argument for why the procedure he describes would constitute a justification of induction -- hence, an argument for thinking that what we accept is valid. It is a linguistic argument:

The task of formulating rules that define the difference between valid and invalid inductive inferences is much like the task of defining any term with an established usage. If we set out to define the term "tree", we try to compose out of already understood words an expression that will apply to the familiar objects that standard usage calls trees, and that will not apply to objects that standard usage refuses to call trees. A proposal that plainly violates either condition is rejected; while a definition that meets these tests may be adopted and used to decide cases that are not already settled by actual usage. Thus the interplay we observed between rules of induction and particular inductive inferences is simply an instance of this characteristic dual adjustment between definition and usage, whereby the usage informs the definition, which in turn guides the usage. (pp. 68-9; emphasis added)

Goodman's argument, in short, is that if we can formulate some rules that correspond to the inductive inferences people call valid, then these rules will define the expression "valid inductive inference" because they state what we take a valid inductive inference to be; and therefore, an appeal to these rules to show that some particular inference is valid is legitimate. Again:

The problem of induction is not a problem of demonstration but a problem of defining the difference between valid and invalid predictions. (p. 68; emphasis added)

Against the principle of charity, I suggest that we take him at his word and, in particular, take his talk about definitions seriously. Attempting to define knowledge into existence is a fairly typical linguistic philosophy ploy. If linguistic analysis had been around during Copernicus' time, I suppose some philosopher might have argued:

In order to determine whether the earth 'orbits' the sun, we must first determine the meaning of the expression "orbits," and for that we must consider its ordinary usage. Now ordinary usage refers to the sun as orbiting the earth. So Copernicus is simply committing an abuse of language in saying that the earth orbits the sun.

I doubt Copernicus would have been impressed. The problem is that if ordinary people consistently call things that have the feature F, "X"'s, this may indicate, not that "X" just means "a thing that has F", but simply that people believe things that have F to also have a second property, that of being X, which belief they could be mistaken in. Otherwise, every dispute over whether A is B would become a linguistic dispute over the meanings of "A" and "B".

B. Answer to Hempel

Before he gives his account of confirmation,(8) Carl Hempel enunciates these three conditions on the confirmation relation that he thinks any theory of confirmation ought to satisfy:

Entailment condition: If e entails h, then e confirms h.

Consequence condition: If e confirms each of a set of sentences, K, then e confirms every logical consequence of K.

Consistency condition: e is consistent with the class of all hypotheses that e confirms.

Hempel's theory of confirmation, which is supposed to satisfy the above conditions, says that for any observational evidence, e, and any hypothesis, h, e confirms h if and only if e entails the 'development' of h with respect to the objects mentioned in e. The 'development' of a hypothesis with respect to a set of objects is to be understood as what the hypothesis would assert if those were the only objects in existence; or what the hypothesis does assert about those objects. For instance, the development of "Everything is pink" with respect to the Empire State Building and Mikhail Gorbachev would be, "The Empire State Building and Mikhail Gorbachev are pink." The development of "There is a perfect being" with respect to Bob Smith would be "Bob Smith is a perfect being." To put it another way: the development of h with respect to O plus the proposition that nothing but O exists entails h; and h plus the assumption that nothing but O exists entails the development of h with respect to O; and the development of h is an observational sentence.

Hempel's theory suffers under the following objections:

(1) Consider the observation that this pen (which I have before me) occupies(9) this region of space (a certain pen-shaped volume). This observation, under Hempel's theory, must confirm both "Every region of space is occupied" and "Everything occupies this region of space." In symbolic logic,

Ops confirms:

(x)(∃y) Oyx, and
(x) Oxs.(10)

But the two hypotheses thus confirmed are contradictory. About the other regions of space besides the one this pen occupies, one hypothesis says they are all filled, while the other says they are all empty.

The first impact of this is that Hempel violates his own adequacy conditions, sc. the 'consistency condition.' Second, I think we should see it as a counter-example to the theory even independent of the consistency condition (which condition I think false): That is, it is implausible in any case that the observation of this pen in this region of space confirms either of the silly hypotheses in question.

(2) To speak more generally, Hempel's account is quite prodigal in doling out confirmation. Let F(a) be an observation report saying anything about an object, a (not necessarily an atomic sentence); and let Q be any proposition whatsoever. Then on Hempel's criterion, F(a) confirms (x)(¬F(x) → Q) (because F(a) entails (¬F(a) → Q)). This enables one, in short, to infer from the observation of something having a certain property, anything we please about the objects that don't have the property. One application of this Hempelian liberality is the 'Ravens paradox': we can confirm that all ravens are black by the observation of, say, white shoes. Hempel is apparently willing to live with this consequence, but I daresay most of us continue to find it counter-intuitive.

A parallel result is that the observation of several green emeralds before the year 2000 would confirm that all emeralds are grue, which is counter-intuitive. Hempel considers this counter-example in his "Postscript" and appears to give in. He does not seem, though, to regard himself as abandoning his theory of confirmation when he concedes that in actuality, contra the implications of his initial account, observations of grue emeralds do not confirm "all emeralds are grue". He thinks the solution is that unprojectible predicates like "grue" must be omitted from the language of science. But he has failed to see that the emeralds case is no different in principle from the ravens case. Both paradoxes result from Hempel's allowing the observation of an X to confirm anything you please about non-X's. He's willing to accept this for non-ravens; why is he unwilling to say the same thing about things observed before the year 2000? That is, since Hempel accepts that observations of non-ravens confirm anything and everything about ravens, why doesn't he accept that observations before the year 2000 confirm anything and everything about things after the year 2000?

Note also that Hempel would presumably be forced to deny that, once we've confirmed something, we are entitled to combine it with other things we know and deduce consequences (which will also be confirmed); for otherwise, we could confirm "(x) if x is not a raven, then there is life on Mars" by observing ravens and then combine it with the knowledge that there are non-ravens to confirm there is life on Mars.

(3) Normally, the observation of some type of object is taken to confirm that there are other objects similar to it. For example, the discovery of a black hole would confirm that there are other black holes. But on Hempel's criterion the observation of a black hole would disconfirm that there are other black holes, because it confirms that this object (this black hole) is the only thing in existence: Observing any object, a, confirms (x)x=a because (presumably) one observes that a=a, which is the development of the hypothesis.(11)

Similarly, we can see that Hempel's criterion makes the observation of any sort of thing disconfirm, and never confirm, the existence of anything even ever so slightly different from it. For instance, the observation of a six foot tall man would disconfirm that there are any people as tall as 6'1" (because it entails the development of "every man is less than 6'1" tall" with respect to the man observed), and the observation of a five-foot and a six-foot man would disconfirm that there is anybody between five and six feet tall; whereas intuitively, the evidence ought to confirm the hypotheses in these cases.

Again, take the hypothesis that there are at least two black holes, which should be confirmed by the observation of one black hole, but would not be on Hempel's theory. The development of "there are at least two black holes" with respect to some class of objects, presumably, would be that that class of objects contains at least two black holes. So the observation of a single black hole would never entail the development of the hypothesis that there are at least two.

(4) Hempel is again too restrictive in only considering generalizations on observations. There is, on his view, apparently no way that any purely theoretical claims (using non-observational predicates) could ever be confirmed. It is scarcely necessary to point out that scientific theory is filled with imperceptible entities with imperceptible properties, like protons, magnetic fields, and potential energy.

(5) Finally, Hempel makes no attempt to actually justify induction. There is no explanation of why it would be rational to believe, or to increase one's degree of belief in, things that have been 'confirmed' under his criterion of confirmation. Furthermore, in the absence of any such justification, it is difficult to feel attached to his account as an accurate description of how we reason.

Nor does Hempel have very much to say as to why we should accept his account, which he describes as a 'definition' of confirmation, other than that it satisfies his three conditions. It would be nice, for instance, if he were to take some typical examples of inductive and scientific arguments and show us how they conform to his model; but then, the examples we have considered above suggest this might be a difficult matter.

C. Mill's methods

Mill's methods of inductive inquiry, which appear in many logic texts today (having thus fared better than the rest of his logic), have more of the flavor of deductive reasoning than either of the preceding descriptions of induction. Because they are pretty sensible, it is possible someone might consider them to explain the possibility of inductive knowledge. I intend to argue, not that they aren't good methods of inquiry, but that the identification of Mill's methods does not resolve the problem of induction.

Summary of the methods:

1. Method of agreement: If E has one condition that invariably precedes it, then that condition is its cause.
2. Method of difference: If E occurs in the presence of C but fails to occur when C is removed, all other conditions being held constant, then C is the cause of E.
3. Method of residues: If E occurs in a given situation, and the portion of E due to some particular antecedent is known, then the remainder of E is the effect of the remaining antecedents.
4. Method of concomitant variations: If variation in C is accompanied by variation in E, then C is the cause (or effect) of E.

Shortcomings of the methods:

(1) The methods are insufficiently general. They only serve to discover cause-effect relations among observable events. They would thus fail to explain, for instance, inferences to the existence of theoretical entities, or to the existence of physical objects on the basis of sensations. To show that I am not attacking a straw man by this objection, I quote Mill: "The four methods which it has now been attempted to describe are the only possible modes of experimental inquiry..."(12)

It's worth noting, apropos of the methods' inability to discover the existence of physical objects, that Mill managed to convince himself that physical objects were mere 'permanent possibilities of sensation',(13) evidently preferring to demote their status rather than to admit the indictment of his own theory of induction. It is pretty obvious, though, that that sort of thing is not what the rest of us mean by "physical objects," and that Mill was being irrational in rejecting the obvious in deference to the relatively tenuous (cf. my remarks in §I on the 'adequacy condition' on theories of confirmation).

(2) The conditions of applicability of the methods are never literally satisfied, at least vis-a-vis the methods of agreement and difference. It is never the case that a set of phenomena have only one antecedent in common, and it is never the case that a single antecedent to an effect is removed without any change in any of the other antecedents. I don't doubt that in practice it is often possible to rule out a priori most of the irrelevant factors, so that we may say we have only one plausibly causally relevant common factor (for the method of agreement), or only one plausibly relevant difference (for the method of difference), but Mill fails to give any theoretical explanation of how we can know which of the infinitely many conditions present in any given situation should be ignored and which should be tested or controlled. Factoring into Mill's naive view in addition the possibility of unobservable conditions, it emerges as not only practically, but in principle impossible to follow the strictures required by Mill's methods.

Since the strict application of the Methods is impossible, perhaps Mill would say we should only approximately apply them, or perhaps they require some supplement. But in the former case, it is not clear our conclusions would still be rationally justified; whereas in the latter, the nature and basis of the required supplementary principles remains unidentified.

One fall-back for empiricists is to claim that in a given inductive inference, what sorts of factors are likely to be causally relevant is itself previously determined through other inductions. For instance, when doing a physics experiment, I don't bother to control for the day of the week because I have previous experience that the day of the week doesn't affect the outcomes of physics experiments. But note that this sort of reply would only work if we assumed -- contrary to fact -- that there are some basic inductions that we can start from which do conform perfectly to the strictures of Mill's methods.

(3) Neither Mill nor the logic books that paraphrase his methods say very much in the way of their justification. Perhaps their logic is considered self-evident. Well, if we assume that a given phenomenon under investigation must have a cause, that the cause must be among a certain set of observable factors, and that it must be a single thing (and not a disjunctive or conjunctive property), then I think the logic of the methods is evident; in fact, in that case, the cause could be deduced in accordance with the methods. But that is a lot to assume. If, then, we do not assume all of this but seek to establish it by means of the Methods, how can we explain the rationality of our mode of inference? From the fact that we have always observed E to be preceded by C, plus even the fact that we have observed E not to occur in some otherwise identical circumstances when C was absent, does it follow that there is a necessary connection between C and E? I take it it does not deductively follow, since the combination of such observations as just described with a case of C existing unaccompanied by E in the future or in an unobserved part of the world is consistent, and would falsify the 'necessary connection' thesis. So there are ways the world might be such that the sort of observations premised in Mill's methods (esp., for the use of the methods of agreement and difference) occur but there is no causation of the sort the methods would conclude; and this brings us back to our question of section I, of why we should reject the (empirically equivalent) hypothesis of that sort of world, in favor of that of the causally connected world.

Mill himself is remarkably unsatisfying on this matter. He interprets causation, essentially, as mere constant conjunction,(14) and he says that all induction is ultimately founded upon the principle of the uniformity of nature.(15) I can see why he would say this. Given the uniformity of nature, we can infer from the fact that C has always been observed to be followed by E that C is always followed by E and always will be; and if that is all there is to causation, we can infer that C causes E. But when it comes to the justification of the Uniformity Principle, Mill naively pronounces that it is itself justified by inductive argument, thus running afoul of Hume's problem (recall section IA above).

D. The Bayesian Approach

According to the Bayesian theory of confirmation, the most impressive theory so far, inductive reasoning is reasoning in accordance with the probability calculus.(16) The mathematical theory of probability contains four axioms:

(1) P(a) ≥ 0, for any proposition, a.
(2) P(t) = 1, if t is a tautology.
(3) P(a v b) = P(a) + P(b), if a and b are mutually exclusive.
(4) P(a & b) = P(a) × P(b|a).

"P(ba)" is read "the probability of b, given a" and means the probability that b would be true if a were true.

From (4) follows Bayes' Theorem by three trivial steps:

P(e & h) = P(h & e).
P(e) × P(h|e) = P(h) × P(e|h).
P(h|e) = P(h) × P(e|h) / P(e)

Suppose that h is some hypothesis and e is some piece of evidence. The Bayesians interpret P(h|e) as the degree of belief you may assign to h upon discovering that e is true, and they claim that inductive inference in general is explained by the application of Bayes' Theorem. P(e) and P(h) are supposed to be the initial credence you give to e and h, respectively, before discovering e. e confirms h on the Bayesian account just in case P(h|e) > P(h).

From the Theorem it is evident that h will be best confirmed when P(e|h) is high and P(e) is low -- that is, when h strongly predicts something which is initially improbable. In the best case, when h entails e, P(e|h) = 1.

I will accept most of the Bayesians' assumptions, but will still have ample criticisms to make. In particular, I accept that:

(1) Beliefs come in degrees. It is a matter of introspection that one believes some things more strongly than others, and we describe the more strongly believed propositions as more 'probable'. It is a matter for stipulation that degrees of belief be measured from 0 to 1, with 0 being strongest disbelief and 1 being the strongest possible belief.

(2) We should try to conform our degrees of belief to the probability calculus. Although some object that this is psychologically implausible, the objection is no more damaging than the parallel objection against canons of deductive logic would be.

I would not accept this on 'Dutch Book' grounds, since "fair betting odds" is a highly dubious definition of degrees of belief -- surely degrees of belief are not dependent on any so culturally specific institution as gambling. Instead, I accept the axioms of probability as self-evident.

(3) "P(h|e)" and the rest of the terms appearing in Bayes' Theorem are correctly interpreted by the Bayesians, as referring to the rational degree of belief in h after e has been discovered, &c.

However, the Bayesian theory suffers from a few unfortunate difficulties.

Objections:

(1) Unknown conditional probabilities: Bayesians implicitly assume that there is some way of determining the quantities on the right hand-side of Bayes' Theorem independently of P(h|e). Unfortunately, this is rarely unproblematic. Let's start by considering P(e|h):

Suppose I am drawing some marbles out of a large bag, and the first five I take out are black, and now I want to know what is the probability of the next one also being black. This conforms to one typical form of induction. Bayesian confirmation theory tells me that in the circumstance described I should assign as the probability of the sixth marble being black: the initial probability of its being black (P(h)), times the probability of the first five marbles being black given that the sixth one is black (P(e|h)), divided by the initial probability of the first five being black (P(e)). I think it's fair to say that this isn't terribly helpful. How in the world do I determine the probability of the first five marbles being black given that the sixth one is black? Not, presumably, by another application of Bayes' Theorem, which would just lead us in a circle. I could try changing the hypothesis so that it entails the evidence -- in which case I will know P(e|h) = 1 -- e.g., I could ask, what is the probability that the first six marbles will have been black, given that the first five were (N.B. merely conjoining the old hypothesis with the evidence)? This is the only situation (i.e. entailment) in which P(e|h) is unproblematic, but this subterfuge only hands us over to the problem of

(2) Unknown prior probabilities: For although the traditional arrangement of Bayes' Theorem, in which P(h|e) appears by itself on the left and everything else on the right, gives the subliminal impression that the quantities mentioned on the right hand side of the equation are previously given data by means of which we can proceed to calculate P(h|e) -- and the term "prior probabilities" (for unconditional probabilities) strengthens this impression -- this is, as a matter of fact, a substantial assumption which calls out for justification. For many important cases, including my "six black marbles" example, I contend that one or both of the 'prior' probabilities that we are supposed to plug into Bayes' Theorem is epistemically posterior to the conditional probabilities. This is because the way people estimate (or calculate) the probabilities of conjunctive facts or of sequences of events is by means of multiplying prior and conditional probabilities of individual facts or events. For instance:

Suppose I want to calculate the probability of being dealt a royal flush. This is not something that is just immediately given for me. Rather, I will take the probability of first being dealt a 10, jack, queen, king, or ace (=20/52); multiply it by the probability of next getting one of the remaining four types of card, of the same suit, given the first card having been as described (=4/51); then multiply that by the probability of receiving one of the three remaining cards required, given the first two cards (=3/50); and so on.(17) Similarly, in my "black marbles" example, if I have some antecedent estimate of the probability of pulling out a black marble, how I assign the probability of pulling five black marbles or of pulling six black marbles will depend on this plus my estimates of how previous drawings of black marbles effect the probabilities of future drawings of black marbles. If, for example, my belief is that the drawings are probabilistically independent (like coin flips) then I will just multiply individual probabilities to get probabilities of sequences. If I believe in induction, then I will give (uniform) sequences somewhat higher probabilities. If I subscribe to the gambler's fallacy, I will give (uniform) sequences lower probabilities.

The Bayesian seeks to reverse this process: he wants me to have the 'prior' probability of drawing six black marbles initially given and use it to determine the probability of the sixth marble being black given that the first five were.

Notice how easy computing confirmation would be if I were not right about this: if unconditional probabilities were generally independently known (prior to conditional probabilities), then we could forget about Bayes' Theorem and just calculate P(h|e) directly from axiom 4 above: i.e., for any h and e, I could just take my prior P(h & e) and divide by my prior P(e), and this gives me the desired conditional probability. Alas, this does not work since I have no independent way of knowing P(h & e).

(3) The return of grue: Like Hempel, the Bayesians are committed to saying that observations of green emeralds (before the year 2000) confirm "All emeralds are grue." This is because, from Bayes' Theorem, P(h|e) > P(h) if and only if P(e|h) > P(e), and the Bayesians interpret confirmation in terms of the relation P(h|e) > P(h).

Take h to be "All emeralds are grue" and e to be "All emeralds observed before the year 2000 are green". Presumably, the initial probability of e is less than one. Since h entails e, P(e|h) = 1. Therefore, P(e|h) > P(e); so e confirms h.

The problem is that although in this case e does raise the probability of h, the discovery of e does not raise the probability of the excess content of h above e, or of the remaining, unobserved instances of h. Intuitively, h makes a claim about emeralds observed before the year 2000; it also makes a claim about emeralds only observed after 2000, and a claim about emeralds never observed. e confirms the first part of h, but it does not confirm (in fact, disconfirms) the remainder of h.

For Bayesianism to solve the problem of induction, it would have to show that for typical inductive arguments, the evidence confirms the excess content of the hypothesis above the observations.

This notion of "excess content" is worth looking into. Karl Popper and David Miller claim(18) that for any h and e, the excess content of h above e is equal to (h v ¬e), for reasons which are unnecessary to examine since they're wrong. Intuitively, the excess content of (A & B) above A should be B, not ((A & B) v ¬B).

My proposal is this:

(a) If h can be written as a conjunction (e & x), where e and x are propositions about different things (separate and distinct classes), then the excess content of h above e is x;

(b) If e entails h, then the excess content of h above e is nothing (or a tautology);

(c) Otherwise, the excess content of h above e is h.

Thus, for example, "All ravens are black" can be stated, "All observed ravens are black, and all unobserved ravens are black." Since observed ravens and unobserved ravens are disjoint classes, the excess content of "All ravens are black" above "All observed ravens are black" is "All unobserved ravens are black."

Now I realize there may sometimes be some difficulty determining what objects a proposition is 'about'. Roughly, a proposition is about the smallest set of objects whose (non-tautological) properties and relations are part of its truth-conditions. For instance, "Bill or Ted stole the lamppost" will count as being about Bill, Ted, and the lamppost. Attempting to precisify the definitions of "about" and "excess content" further so as to avoid all clever logical tricks that might come up would probably be a fruitless project.

I think we have enough now to see the Bayesian's problem. Suppose that x (h) is the excess content of h above e, where e is some observational evidence. In that case, since x and e are about separate and distinct things, neither entails the other, so we can't get an easy determination of either conditional probability that way. Even if we assume that we know P(x) and P(e), we won't know (or at least, Bayesian theory doesn't tell us) P(e|x), so we can't calculate P(x|e) (at least, not by means of Bayes' Theorem). So there is no Bayesian principle for telling us whether e confirms the excess content of h above e.

(4) The return of empirical equivalence: In section I, I claimed that the problem of induction requires us to justify the preference of one theory over other, empirically equivalent theories. Suppose that h and h' are two empirically equivalent theories and e is the evidence presented for one of them. Then all the Bayesians have to tell us is that

P(h|e) = [P(h) × P(e|h)] / P(e), and

P(h'|e) = [P(h') × P(e|h')] / P(e)

But since h and h' are empirically equivalent, P(e|h) = P(e|h'), and we can see that the right hand sides of each of the above equations are identical except for the prior probabilities of the hypotheses, P(h) and P(h'). In other words, on Bayesian principle, P(h|e) > P(h'|e) only if P(h) > P(h'), for h empirically equivalent to h'. So all now turns on what the Bayesian can tell us about these prior probabilities.

Unfortunately, the so-called 'personalist Bayesians' maintain that there are no rational constraints on the distribution of prior probabilities, other than the very weak constraints provided by the axioms of the probability calculus mentioned above. I look upon this line as rather an abandonment of the project than a solution to the problem of induction. If it were correct, a skeptic could always avoid the conclusion of an inductive argument by assigning a low prior probability to the conjunction of the premise with the conclusion but a high prior to the conjunction of the premise with the negation of the conclusion. Moreover, instead of implying that all different distributions of priors are equally rational, why wouldn't the contention that there are no further logical constraints on degrees of belief besides the austere probability calculus rather imply that all different distributions of priors are equally irrational? That is, if there is no logical principle determining what the a priori probability of something is, then instead of just making up a number between 0 and 1 arbitrarily, hadn't I better suspend judgement on the matter entirely, refusing to assign any degree of probability? The Bayesians' oft-relied upon claims about the accumulation of evidence tending to wash out differences in priors are irrelevant here; the problem is that I cannot rationally start with any prior distribution, so I can never get started on the process of confirmation and conditionalization.

More objectively-minded Bayesians are apt to invoke such principles as the Principle of Indifference for the assignment of priors. This principle says that when we don't have any evidence favoring any of a set of alternatives over the others, we should count each alternative equally likely. Now I don't want to criticize this approach too much, since I think the Principle true and plan to use it later on. In the event that we have no reason to prefer A over B nor B over A, it seems reasonable that we would not expect A any more than B, nor B any more than A. Expecting each of them equally is the only natural attitude; otherwise we should be called upon to explain our preference for one of them. (Note that the Principle of Indifference is thus seen to make sense only when probabilities are construed as degrees of expectation or belief.) However, the principle is subject to different, incompatible uses, as is well-known, some of which have the effect of rendering induction impossible. At first glance, it seems the most natural way of applying the principle would be to say, "I shall be indifferent with respect to all the different possible distributions of properties across space-time." But this sort of initial probability distribution would preclude inductive learning.

To explain what I mean by this, let's consider a simplistic example. Suppose that (out of idleness) I am planning on flipping a coin a hundred times, and I previously have no knowledge about how coin-flipping works except that it always results in either heads or tails. Then I know there are 2100 possible outcomes of my 'experiment'. To apply the principle of indifference, it seems, I would give each of these possibilities an equal chance of occurring. But if I do this, I will be unable to partake of any inductive learning. For suppose that after 99 flips I have gotten 99 consecutive heads. My expectation of the next coin being heads given this 'evidence' will equal my initial probability of getting 100 heads (=1/2100) divided by my initial probability of getting the first 99 heads (=1/299), which is ½, the same as it was before.

Bayesians who favor induction would be likely to recommend a different way of using the principle of indifference, for instance: there are 100 possible proportions of heads that might result from my experiment (i.e, 1/100 of the flips are heads, or 2/100, or ... or 100/100). If I assign an equal probability to each of these possibilities, then I will be able to learn from experience in accordance with Bayes' Theorem. Unfortunately, Bayesians have been unable to explain the rationale for using the principle of indifference in this sort of way (or some way that allows for inductive learning), as opposed to the former described application.

Conclusion

I think we can see that none of the above theories of inductive reasoning comes close to addressing our problem. As a test to see whether any description of induction is accurate, we can ask, "Does it imply that observations of green emeralds before 2000 confirm that all emeralds are grue?" Each of the theories considered above, except for Goodman's, clearly implies that green emeralds do confirm "all emeralds are grue": (a) because the development of "all emeralds are grue" for emeralds before 2000 is that they are green; (b) by the method of agreement, correlations between being an emerald and being grue confirm that the property of being an emerald causes the property grue; and (c) because the probability of emeralds observed before 2000 being green given that all emeralds are grue is greater than its probability otherwise. Only Goodman, because he made up "grue", has avoided this consequence; but his theory, we found, gives no logical justification for induction.

The only of the above theories that would plausibly justify induction is the Bayesian theory, which can appeal to self-evident axioms of probability. Unfortunately, it is in need of considerable help for the problem of determining 'prior' probabilities as well as some conditional probabilities. Nevertheless, it will be useful in justifying the following account of induction.

III. Theory

In general, inductive reasoning arises out of the fact that certain observations and sets of observations that we make, as we feel, require explanation, and a hypothesis is justified on some evidence when it produces a plausible explanation for what would otherwise be some surprising observations.

Before I explain and justify that contention, I cannot help making some preliminary remarks about philosophical methodology, since it is primarily misguided methodology and general epistemology that I think has made and will continue to make it difficult for people to accept the true account of induction. These remarks must be kept brief but will serve to explain some aspects of my approach.

A. Remarks on philosophical method

Remark 1: No a priori commitment to empiricism.

Many philosophers of science have taken it as axiomatic that all knowledge must be based purely on experience. As explained in §I above, it is this idea that creates a problem of induction. In fact, the arguments there elaborated (in IA and IC) are logically unexceptionable; if we accept the premises about the possible nature of knowledge, then we are forced into skepticism. I therefore have no intention of accepting the impossible mission of reconciling empiricist scruples with the possibility of inductive knowledge. Nor do I perceive the justification for the great credence that empiricism has received during the last two centuries. Inasmuch as ethics, metaphysics, mathematics, theoretical science, all inductive conclusions, and even knowledge of the external world, are problems for empiricists to account for -- inasmuch, I say, as all of the interesting kinds of knowledge we find ourselves to have are difficult or impossible to explain on strict empiricist assumptions -- hadn't we better acknowledge that this theory (which cannot on principle appeal to any a priori justification and also lacks any empirical support) is wrong?

Remark 2: Philosophy for people, not computers.

I am not going to attempt to provide an algorithm for inductive reasoning, such that we could program a computer to evaluate our scientific theories according to it, and retire the scientists. I seriously doubt that that sort of thing can be done. Inductive inferences are made by conscious beings; these beings are capable of, and do, exercise a certain amount of judgement; sometimes their judgements conflict, sometimes they are uncertain, and sometimes they are wrong. I will not regard it as a serious fault of my view of induction that I fail to prevent these things from happening. I seek to describe inductive inference as practiced by people, and to explain its justification; I do not seek to generate epistemic utopia.

As a corollary of this, we should not insist that philosophical concepts possess a degree of precision comparable to mathematical concepts, or that philosophical principles should be immune from misinterpretation. Vague principles are just as capable of being true as precise ones.

Remark 3: Against 'formal' criteria.

Another thing I will not try to do is to give a purely syntactic or 'formal' (whatever that means) criterion of confirmation. As desirable as it might be (to mathematicians and computer programmers) to have such a thing, I am afraid, alas, that the making of an inductive inference, unsurprisingly enough, requires an actual understanding of its meaning. We ought not to refuse to recognize facts because they are inconvenient for certain of our epistemological ambitions.

Remark 4: On behalf of metaphysics.

Against the strict verificationist criterion of meaning imposed by positivists earlier this century to rule out 'metaphysics', I propose the comparatively liberal criteria of meaning according to which any of the following is sufficient for the concept of X to be meaningful:

a) if we find it (by introspection) possible to think about X and believe things about X;

b) if we are able to classify some things as X and others as non-X;

c) if the things we call X have something in common because of which we call them X;

d) if X's are different from non-X's.

These criteria are the only apology I give for the metaphysics which is to follow. I do not subscribe to the theory that philosophy must do with as few distinct or as few distinct non-empirical concepts as possible; instead, I think the possibility of distinguishing correct and incorrect uses of a concept is sufficient to establish a prima facie justification for its invocation whenever useful.

Remark 5: A liberal helping of 'the light of nature'.

I will have to make frequent appeal to self-evident facts. Although some will want to criticize this sort of thing and demand 'proofs', I blame the nature of my subject matter. I cannot help the fact that philosophy is based on intuition.

Although the above terse remarks are unlikely to alter the opinions of any positivists or empiricists, they will perhaps at least forestall unnecessary, predictable objections.

B. Inference to the best explanation explained

Inference to the best explanation involves these three elements: that the evidence premised is initially improbable (or unexpected), that the hypothesis (the conclusion of the argument) would explain it,(19) and that the hypothesis is the 'best' of the potential explanations. When, and only when, these conditions are satisfied, I say, there is a valid inductive argument from evidence to hypothesis. We have, then, three notions to clarify.

1. Explanation

There are at least three senses in which one thing can be said to imply another, for instance, X implies Y: (1) meaning that Y is a precondition on X; (2) meaning that Y is a consequence deriving from X; or (3) meaning merely that whenever X is true Y is also true. As examples of these, the King of France is bald implies that he exists, in the sense that his existence is a precondition on his baldness; the axioms of geometry imply the theorems, in the sense that the theorems follow from them, and the axioms explain why the theorems are true.(20) It is the second relation that we seek between theories and empirical evidence, namely that of the observed facts being based upon the facts described by the theory. Although I do not think it is possible to strictly define this relation, it is possible to give substantial necessary conditions on it, namely: h explains e only if

1. h and e are true;
2. h is 'prior' to e; and
3. P(e|h) >> P(e|¬h).

I won't claim these conditions are sufficient, since I am sure philosophers will press counter-examples.(21) Fortunately, these conditions will prove enough to make out the justification of an inference to the best explanation.

The first condition requires no comment. The purpose of the second condition is to capture the asymmetry of the explanatory relation, something which is ignored by the standard deductive-nomological model of explanation. It is meant to invoke a metaphysical concept of priority, which is different from (but encompassing) temporal priority. Metaphysical priority is the relation of one fact's being of a more basic, or more fundamental, level than another. Although the explicit identification of this concept is likely to make it a target for philosophical suspicion, it is commonly implicitly invoked in other contexts. Besides, as I think, in inductive reasoning, metaphysical priority is invoked in reductionist theses -- physicalism, for instance, claims that laws and facts of physics are the most basic, or metaphysically prior, facts -- and in any claim that one thing depends on another. There is, again, no definition of priority (other than what has just been said), but several instances of it can be named so as to give the reader the idea by letting him see the similarity in these instances:

1. Events earlier in time, of course, are prior to later events;
2. Properties of and relations between parts (at a given time) are generally prior to properties of a whole (at that time);
3. Categorical properties are prior to dispositional, or causal, properties (again, at a single time);
4. Necessities are prior to contingencies (another way to think of this is that necessities are considered as if they had the earliest time-index, because they are 'eternal truths');
5. Descriptive properties of things are prior to their value properties;
6. Existence of substances or objects is prior to the existence (or instantiation) of properties, relations, or events.

The above six statements are not stipulations but synthetic judgements that I make, based on the sense in each case that the thing which I call 'prior' cannot depend on the 'posterior' thing, but the posterior thing might depend on the prior.

The third condition on explanation is stated as it is because if h only slightly raised the probability of e, we would typically still not consider ourselves to have an explanation. It's also there because, for pragmatic reasons, we only want to waste our time considering theories that are strongly confirmed, and not ones that are only slightly confirmed. So we want P(e|h) to be much greater than P(e).

2. Probabilities

Let's assume in the present context that probabilities are rational degrees of belief. Contra Carnap, the assignment of prior probabilities must violate empiricist strictures against synthetic, a priori assumptions. For suppose we assign a low (but positive) prior to some proposition, x, as we will sometimes have to if we can distinguish sufficiently many alternatives: then the denial of x has a high subjective probability, or, in other words, we believe it strongly. But ¬x is synthetic, because by assumption its probability is less than one and every analytic truth's probability is one. The belief in ¬x is also, so far, a priori, for to be based on experience is for a belief to receive its high subjective probability by conditionalization on some observations; whereas we are assuming ¬x has a high absolutely prior probability. Let us also suppose that ¬x is true. Then, assuming that we are justified in our assignment of prior probabilities, our belief that ¬x constitutes an item of synthetic, a priori knowledge.

Carnap maintained that logical probability judgements are all analytic.(22) If he meant that the propositions to which we give these probabilities are all analytic, then of course that's not true. But second, if he meant that statements of the form, "The probability of h is P" are always analytic, he was also wrong. In fact, such statements are always synthetic, since they say that we are entitled to repose a certain degree of belief in h. 'Analytic' truths are supposed to be ones in which the concept (or the definition) of the subject contains the concept of the predicate, but that is not the case here, since one could very well understand what the proposition that h was, without even having the concept of probability, let alone knowing that h had a probability of P.

The first rule of the assignment of a priori probabilities is to respect your intuitions. I do not know how to specify this 'rule' in any more detail, or even whether it can be further specified, but I will give some examples to illustrate my meaning: it is evident a priori that forces acting on bodies cause them to move. In contrast, it is initially implausible that a force acting on a body causes a different body to change its color. It is initially plausible that conscious beings desire pleasure, but improbable that they desire pain. It is initially improbable that an event can directly cause a spatially distant event; that motions and forces can cause states of consciousness; or that "grue" can cause anything. Contra Hume (et. al.), I think these examples and others show that we can and do have intuitions, completely a priori, about what sorts of things can cause what other sorts of things. The strengths of these intuitions are reflected in initial probabilities. The reason it may be impossible to systematize these sorts of judgements is because they are not determined 'formally' but depend upon grasp of the specific nature of the objects of thought. Knowing that forces probably cause motions, not changes of color, depends on understanding exactly what "force", "motion", and "color" mean.

But since we do not have theoretical intuitions about everything, the second rule of the assignment of prior probabilities is to apply the principle of indifference to the possible states of affairs at the most basic (metaphysically prior) level of reality, when one has no reason for preferring one alternative over any other. For this purpose, alternatives should be as finely individuated as possible (that is, as finely as you, the observer, can discriminate). The purpose of this latter specification is to avoid the sort of inconsistency that could result from attempting to apply the principle of indifference simultaneously to different partitions of the same space of possibilities.

Third, in the absence of reasons for holding one thing to affect the probability of another, different events or propositions are assumed to be probabilistically independent. This principle has the same intuitive motivation as the principle of indifference: if we don't have any evidence, nor any a priori reasons either, linking A to B, then why would discovering A change our degree of expectation of B? If we change our degree of belief in B, we shall reasonably be called upon to explain why we did so.

Fourth, negative propositions are generally more probable than positive propositions (ceteris paribus). The idea behind this is that a proposition is considered 'positive' only because it singles out one specific alternative out of a wide, possibly indefinite range of possibilities. For instance, "The sky is blue" is a positive statement whereas "The sky is not blue" is negative just because "blue" is a narrower category (encompassing fewer possibilities) than "non-blue". "The sky is azure" is, similarly, more positive than "the sky is blue." That the presumption (i.e., the greater initial probability) is with the negative claim then follows the principle of indifference.

Finally, simple propositions are generally more probable than complex ones. The idea behind this is that in a complicated hypothesis, there is more to go wrong. A proposition gets to be 'complex' because it requires the existence of many entities, properties, or relations. Such propositions get low probabilities because their probabilities are products of the probabilities of their individual ontological commitments.

3. The best explanation

In general, there are two reasons why h could be a 'better explanation' of e than h: first, because h is just more initially credible than h; second, because h predicts e more strongly than h does, thus being more of an explanation. Accordingly, the best explanation of e is the one that has the highest product of P(h) times P(e|h).

Notice that from this and the preceding remarks about probabilities, we get an interpretation of Occam's Razor and the preference for simple theories: viz., the Razor enjoins us to pick the simplest and least positive theory we can that would still constitute an explanation of the observed evidence. Introducing new complexities into a theory always lowers its initial probability, so it can only be justified if it improves the explanation, i.e., raises P(e|h), sufficiently.

C. Problems of induction solved

From the preceding analysis of inference to the best explanation, the reader can no doubt see that my justification for induction will be the appeal to Bayes' Theorem. Since on my account there is an inference to the best explanation (from e to h) only when e is initially improbable but h renders e much more probable, Bayes' Theorem directly implies that when there is an inference to the best explanation, the probability of h given e will be much greater than the initial likelihood of h. What I have to do now is explain why my account escapes the difficulties I saddled the Bayesians with above in section IID, and how I answer the skeptical arguments rehearsed in section I.

1. Answer to Hume's argument: This is simple. The premise that inductive inference is based on 'the Uniformity Principle' is false. An inductive inference follows the form of inference to the best explanation, and that is not a premise or a presupposition of the argument; it is just the form of the inference. Furthermore, the knowledge that that form of inference is valid is based on the self-evident (but synthetic) principles of probability rehearsed above.

2. Answer to my argument about empirical equivalence (§IC): The premise that a priori reasons must issue in necessary truths is false; sometimes they only issue in probable truths. And when two empirically equivalent theories potentially explain a phenomenon, we may prefer one of them because of its higher initial probability.

3. Problem of unknown conditional probabilities removed: My extension of the principle of insufficient reason lets us assume probabilistic independence when conditional probabilities (or probabilities of conjunctive facts) are otherwise unknown. Note that this does not, however, destroy the possibility of induction. The invocation of the different 'levels' of reality saves us from that. Probabilistic independence will exist only on the most fundamental level of things; but indifference among combinations of facts on this level will force us not to be indifferent about combinations of facts on more derivative levels. And the intuitive idea here is that the perceptual observations we make are the 'superficial' level, whereas scientific (and other) theories attempt to get at more basic facts. Because the theories will imply certain things about observations, our probabilities for them will (partially) determine our probabilities for observations. For instance, suppose that most possible fundamental theories imply the uniformity of nature in some observable respect or other (I don't know whether this is true or not); then as we start with equal probabilities for the theories, or for different possible combinations of theoretical-level facts, we have to give a specially high probability to uniform series of observations, so we have to make observations of a certain character increase the probability of further observations being similar.

4. Problem of unknown prior probabilities removed (sort of): Well, at least on my theory prior probabilities aren't completely subjective. Although it is difficult to figure out what they are, we won't be forced to conclude induction is impossible (as on some interpretations of the principle of indifference), and we don't have the Bayesians' problem determining probabilities for conjunctive facts or series of events, as just discussed.

Further specification of the means of deciding initial probabilities (especially vis-a-vis my first 'rule' of using our intuitions) is a topic for future research. But even if it proves difficult or impossible to be more specific here, this would certainly not indicate the account thus far given is not true. Rather, it would only indicate to me that as a matter of fact, there aren't any precise rules for determining initial probabilities. If that happens to be the way things are, then it will, of course, be fruitless to demand philosophers provide such a set of rules.

5. Grue hypothesis not confirmed: There is no suggestion that "All emeralds are grue" would, on my theory, be the best explanation -- or any explanation at all -- of emeralds observed before the year 2000 being green. "All emeralds are grue," just by itself, cannot in fact explain emeralds before 2000 being green, because it is not metaphysically prior to the latter. If anything, the hypothesis in this case is posterior to the evidence, on the basis of either temporal or part/whole relations (the color of the pre-2000 emeralds is a part of the hypothesis, and it is temporally prior to the remaining instances of the hypothesis).

But on this showing, no universal proposition can ever explain its instances, because the instances are always prior to the generalization, being, as I claim, parts of it; and this is contrary to a very common conception of explanation. I do think this result is correct though. You can't explain facts by just repeating them, or repeating them with a bunch of other facts added on. You have to cite something different from the explanandum. In cases where a universal proposition appears explanatory, and in which it does get confirmed by its instances, I think the real, suppressed explanans is the existence of some causal connections between properties. For instance, when we think that "all metals expand when heated" both explains and is confirmed by the expansion of various particular metals, what is really explaining and getting confirmed is that the property of being a metal plus heat causes expansion. The fact that all metals expand when heated is just a deduction from this.

"All emeralds are grue" cannot get this kind of confirmation either, unless we consider it antecedently plausible that there could be some sort of causal relation between being an emerald and grue. That there would be a causal connection between emeraldness and green (perhaps mediated by some third factor) is much more plausible, so it would be a better explainer.

6. Empirically equivalent theories not equally confirmed: If e confirms h and h is empirically equivalent to h, there is no suggestion on my theory that h has to be equally confirmed, or even confirmed at all; for if h explains e, it doesn't follow that h also explains e, even though h might predict e. To take an obvious sort of example, suppose that atomic theory explains and predicts several observations of ours. Now suppose some positivist-inspired individual proposes a theory that is just the conjunction of all the observational consequences of atomic theory, but with the denial that there really are any atoms superadded. This new, instrumentalist theory will be empirically equivalent to the atomic theory that the rest of us all believe. It would not, however, even if it were true, explain any observations, since it is not metaphysically prior to the observations (for the latter are a part of the hypothesis).

Incidentally, I do not really mean to deny that we can confirm the conjunction of all the observational consequences of atomic theory in a sense. I have been focusing on direct confirmation in the preceding discussion of inference to the best explanation. But any consequence of a directly confirmed theory can be said to be indirectly confirmed. So we can confirm the observational consequences of atomic theory indirectly, by first getting an inference to the best explanation directly confirming the atomic theory itself. Hopefully, the ambiguity between confirmation and direct confirmation will cause no serious confusion.

Let's also consider two other problems of confirmation theory to see how they are resolved:

7. The ravens paradox: This paradox is generated from Nicod's criterion plus (in Hempel's terminology) the equivalence condition. Nicod's criterion of confirmation says observation of an A that is B confirms that all A's are B, and the observation of a non-A is irrelevant to (neither confirms nor disconfirms) "All A's are B." The equivalence condition says that evidence that confirms a hypothesis confirms anything logically equivalent to the hypothesis. The resulting paradox is that from Nicod's criterion, the observation of a non-black non-raven confirms "All non-black things are non-ravens;" so by the equivalence condition it also confirms "All ravens are black;" but also by Nicod's criterion the observation of a non-raven should be irrelevant to "All ravens are black." Hempel's answer to the paradox, as noted above, was to accept that observations of non-ravens confirm hypotheses about ravens.

On my account it is not at all clear that observation of an A that is B would generally confirm that all A's are B. It could only do so via some plausible explanatory hypothesis, such as that A's cause B's. But it seems very unlikely that non-blackness causes non-ravenhood.

That it is the first part of Nicod's criterion -- the part about an A that is B confirming all A's are B's -- that should be rejected is also supported by the consideration of other examples, such as the grue case.

8. The everything-confirms-everything problem: This paradox derives from two plausible-sounding conditions on confirmation: (a) prediction condition: if we verify a prediction of a hypothesis, then we have confirmed the hypothesis; (b) consequence condition: if we confirm a hypothesis, then we thereby confirm any consequence of the hypothesis. These conditions imply that anything confirms anything, for suppose A and B are any two propositions. A confirms (A & B) by the prediction condition; and thereby it confirms B by the consequence condition, since B is a consequence of (A & B).

But in my view, (A & B) does not explain A, once again because it lacks metaphysical priority; and therefore A will not confirm (A & B) (at least not directly). I intend for my theory to satisfy the consequence condition but not the prediction condition: once you have confirmed a hypothesis by inference to the best explanation, you are licensed in deducing consequences from it, which are indirectly confirmed. Thus (A & B) might, in some cases, get indirect confirmation from A, provided it was a consequence of some explanatory hypothesis; but it would not get direct confirmation, and usually would not get any confirmation.

D. Examples supporting the theory

The best way of seeing whether a theory of confirmation -- or, indeed, any sort of philosophical theory -- is correct is to consider typical examples. Every theory of confirmation that I know of but one has clear counter-examples against it, some of which we have already described. But I believe that my theory will suffer from no counter-examples. Every inference that intuitively we would take to be a valid induction can be seen as an inference to the best explanation, and every inference to the best explanation will intuitively seem to be a valid induction.

1. Induction by simple enumeration: Consider the type of inference instanced when we observe a large number of ravens that are all black, and conclude that all ravens are black. Here our implicit hypothesis is a vague causal hypothesis: either being a raven causes one to be black, or blackness causes ravenhood, or a third factor causes both ravenhood and blackness. In this case, we'd probably go with the last hypothesis, not the first because ravens come into the world already black, and not the second for the same reason plus that there are black non-ravens. But constant conjunctions can be explained by any of these sorts of hypotheses.

The evidence in this case, that all observed ravens are black, is improbable, at least given the denial of the hypothesis, since (by my third rule of probability assignments) the default assumption would be that the probability of a series of ravens being black is the probability of a single black raven raised to the power of the number of ravens observed -- a small number if the series is long.

The hypothesis is metaphysically prior to the evidence in virtue of my fourth criterion of priority (see §III.B.1), viz. that necessities are prior to contingencies, because causal relations between properties or event-types (N.B. not particulars) are necessary. The probability of all observed ravens being black given the hypothesis is one. And the initial probability of the hypothesis should be fairly good since it's intuitively plausible that a common factor (which we now know must be in raven genes) both produces ravens and determines them to be black.

Not meaning to take Goodman's line on this, I suspect that 'projectible' predicates generally correspond to ones that are recognized by normal cognizers because we only choose to name a property if it seems intuitively credible that it is the sort of thing that can partake of causal relations. For this reason, gruesome problems do not arise in a practical context, because people's intuitions tend to agree.

2. Mill's methods in general: What I called "induction by simple enumeration" corresponds to the method of agreement, but the rest of Mill's methods are equally easily explicable. The hypothesis that C is a necessary cause of E would explain why when we remove C, E does not occur, and the evidence here is improbable (at least given the denial of the hypothesis) on the basis of a priori principles that every event (or at least most events) have causes and that like causes tend to have like effects. Similarly, the hypothesis that C causes E would explain a correlation between variations in C and variations in E. Finally, the method of residues isn't really a form of induction at all. It just expresses an intuitive principle stating that composition of causes generally compounds effects. Application of the method of residues is an attempted deduction, or a deduction relative to this implicit assumption.

3. Theoretical entities: When we explain macroscopic properties of substances, such as solidity or liquidity, in terms of forces acting between atoms, or again explain these in terms of the composition of the atoms (e.g., electron structure), the theoretical entities invoked count as potential explainers in virtue of part-whole priority. Sometimes theories invoke historical priority, as in sociobiological explanations of properties of organisms in terms of their evolution. An example of the priority of existential claims would be the positing of fields to explain forces between objects; here the existence of a field (as some kind of entity) is prior to the (causal) relations between objects by my sixth criterion of priority. Finally, the priority of categorical properties over causal properties is invoked in such examples as the positing of wave properties of light to account for interference patterns produced by interacting light rays -- or, of course, any postulation of properties of an object to explain its behavior.

In each of these cases, the existence of the explanations is intuitively taken to confirm the hypotheses that explain; and in all of these cases, we can see that my view upholds the validity of such an inference.

4. Existential inferences: Recall that I harried Hempel with the fact that he couldn't explain the inference from the observation of any object (such as a black hole) to the existence of similar objects (objection 3, §IIB). Can I explain this sort of inference? Well, this is a difficult one, but I think the basis for the inference is essentially this: that if there were only one black hole in the universe, it is very unlikely that we would have seen it, on the basis that we have not searched most of the universe, and we would give the single black hole an equal probability of being located anywhere. Thus the hypothesis that there are many black holes greatly increases the prior probability of our observing one.

A second basis for this sort of inference is that the observation of some type of object demonstrates the physical possibility of such objects, which we might have doubted before; and further, it suggests the existence of mechanisms that produce such objects. For instance, observation of a unicorn would definitely confirm that there are more unicorns because it would suggest a whole species, as a 'unicorn-producing mechanism' (similar to the means whereby other organisms are known to be produced). The observation of a black hole (so far as one can observe it) would suggest the existence of a mechanism whereby such an object can be produced, and that would increase the likelihood that the same mechanism produced other black holes.

5. Other minds confirmed: I can't take time to go into the problem of other minds in much detail, but we can see how the discovery of the existence and contents of other people's consciousness is confirmed by an inference to the best explanation for their behavior. The intrinsic, categorical properties of people are metaphysically prior to their dispositional properties and hence to their behavior. We may assume that the hypothesis of people having certain mental states (desire for ice cream, fear of heights, &c.) predicts that they will behave in certain ways that we can observe each other to do. And these forms of behavior are initially improbable because they are very complex and orderly; the chances against a random assemblage of physical parts without consciousness turning out to be able to play chess, for instance, are astronomical. Thus mental states explain behavior, and are thereby confirmed.

6. The second law of thermodynamics: The standard explanation of the entropy law illustrates my conception of probability: Imagine there's a box filled with some gas, and the temperature is higher on the left side than on the right.

The material is free to flow around the box, from one side to another. In the kinetic theory of heat, the molecules on the left are moving faster, on average, than the ones on the right. Periodically, due to random motion, a molecule will pass over the imaginary line down the middle of the box. If it is passing from the left to the right then the chances are that it will be a fast molecule (since most of the molecules on the left are fast). Similarly, mostly slow molecules will cross from the right side. This will remain true until the temperatures on both sides are equalized.

This example supports my theory on two accounts. First, in respect of the correct method of determining prior probabilities: the physicists are saying in essence, (allowing for a higher average temperature on one side of the box), assign a uniform probability distribution over trajectories of molecules on each side of the box, assuming trajectories of distinct molecules to be probabilistically independent; and from that generate the probable macroscopic state of the gas at a later time. Hence, the principle of indifference is applied to the fundamental state of the gas, which is, the properties of the parts at an earlier time. To find the probability of diffusion occurring we do not, for example, apply the principle of indifference with respect to possible macroscopic states, evidently because we implicitly recognize the microstates to be more basic.

Second, this explanation of diffusion is taken as a confirmation of the kinetic theory of heat, viz. that substances are composed of molecules and temperature is (roughly) a measure of their motion. I don't know what the probability of diffusion occurring given the denial of the kinetic theory of heat would be, but I suppose it's moderately likely. This is a pretty clear case of an inference to the best explanation, where the explanatory hypothesis states more fundamental facts that greatly increase the probability of the explanandum.

This concludes our consideration of examples. Let us turn at last to

IV. Summary & conclusion

I think I have just given the rough solution to the problem of induction. My reasons for thinking my description of induction correct are, first, that it seems intuitively plausible to me; second, that it does validate induction, whereas it appears otherwise very difficult to explain why induction is justified; and third, that consideration of several typical examples of inductive inferences reveals them to accord with my account, whereas it reveals clear counter-examples to every other view of induction I know of.

Certain sorts of reader may be inclined to attack the account on the grounds, first, of its unabashed metaphysics, especially in the appeals to metaphysical priority and necessary connections between properties (causation); second, relatedly, of the explicit a priorism invoked in the assignment of 'prior probabilities'; and third, of the lack of precision and detail in my 'rules' for determining confirmation and my explanations of concepts. I am well aware of these objections, but I do not regard them as serious. The first two were discussed briefly above (§IIIA), and they strike me as mere prejudices. The third objection, while quite true, and perhaps the most important one to discuss, does not strike me as weighing against the truth of my account, but only as perhaps recommending to me further study and elucidation, where possible. And as I have suggested previously, though it is a matter beyond the scope of this paper to discuss in any detail, in the all too likely event that it is not in fact possible to specify the principles of induction with precision, or to analyze certain of the concepts it must make implicit use of into anything simpler, the demand that we do so is a permanent roadblock to philosophical knowledge, and will forever be used to discredit true theories and support false ones. The only way I know of to satisfy the demand for precise analysis is to say something false, to which I think the history of confirmation theory, as of most of philosophy, bears adequate witness.

At no point in the elaboration of my account have I been driven to say anything that could be described as counter-intuitive, and this should be noted well, for it is not an easy state of affairs to achieve. Positivists typically wind up denying we can confirm the existence of any theoretical entities; Hempel tells us that observations of anything not a raven automatically confirm anything about ravens; Mill and other empiricists compose physical objects of 'sense data'; Bayesians must say green emeralds confirm that all emeralds are grue; standard accounts of explanation allow later events to explain earlier ones; and so on. I do not know what these philosophers think justifies their theories. For my part, I do not know how a philosopher can hope to do better than to accord with all of our intuitions.


Notes

1. In "Of Miracles", An Enquiry Concerning Human Understanding, §X.

2. Enquiry Concerning Human Understanding, §IV.

3. "The New Riddle of Induction," Fact, Fiction, and Forecast (Cambridge, Mass.: Harvard University Press, 1955), chapter III.

4. By "de facto theoretical" I mean not actually observationally confirmed, though perhaps observable in principle. Otherwise, the inductive argument would be superfluous.

5. At least. If h' = (e & ¬h) then h' accommodates the data perfectly.

6. "The New Riddle of Induction," op. cit.

7. Exactly how we interpret "opposite" here, whether as contrary or contradictory, or whatever, is immaterial for the purposes of the illustration.

8. In "Studies in the Logic of Confirmation," Mind, vol. 54, 1945, pp. 1-26 and 97-121.

9. For an object to 'occupy' a region of space, the object must fill up the space without extending outside it.

10. The universal quantifier in the second hypothesis ranges over regions of space. To let all quantifiers range over both physical objects and regions of space, we could say, "(x) if x is a region of space, then (∃y) O(y,x)." The argument will be unchanged.

11. In a footnote Hempel says, without explanation, that his account of confirmation applies to a language in which "the use of ... the identity sign is not permitted," but I am disinclined to allow him off the hook of this counter-example by any so ad hoc or arbitrary reason. We might try rephrasing the hypothesis that there are other black holes as, "there are black holes that don't have this spatio-temporal position."

12. A System of Logic, Book III, chapter VIII, section 7.

13. An Examination of Sir William Hamilton's Philosophy, chapter XI.

14. A System of Logic, Book III, chapter V, §2.

15. Book III, chapter III, §1.

16. For introduction to Bayesian confirmation theory, see Howson and Urbach, Scientific Reasoning: The Bayesian Approach (La Salle, Ill.: Open Court, 1989) and John Earman, Bayes or Bust? (Cambridge, Mass.: MIT Press, 1992).

17. I get 1 in 649,740 on this basis. The reader can check my calculations.

18. "A Proof of the Impossibility of Inductive Probability", Nature 302 (1983), pp. 687-688.

19. "Would explain" because to say that the hypothesis does explain the evidence would already be to imply its truth.

20. The fact that different axiomatizations of geometry are possible that would yield the same set of logical implications, understanding implication in the third sense, does not alter the point. If one chooses less evident mathematical facts to derive the more basic principles, then one will just be giving an inferior axiomatization.

21. The main sort would be the case where A causes both B and C, and B is prior to C. In this case observation of B would probabilify C, but B wouldn't be the explanation of C; A would. I wouldn't know how to modify my conditions to rule out this sort of counter-example.

22. See An Introduction to the Philosophy of Science.