By Kevin Korb   Email: korb@csse.monash.edu.au

Calibration of probability judgment refers to how well one's probabilistic predictions match up with the results. If you are perfectly calibrated, then when you predict an outcome with probability p that outcome will, over the long run, occur with frequency p.

Psychologists have studied the (in)ability of humans to reason with probabilities and have discovered that, by and large, we tend to be overconfident: i.e., when we put the probability at 0.5, the results tend to justify that estimate, but when we put the probability at 0.9 (or 0.1) the frequency of the predicted outcome is markedly lower (higher) than that. (See Lichtenstein, Fischhoff & Phillips [1977] Calibration of probabilities. In Jungermann & de Zeeuw (eds.) Decision making and change in human affairs. D Reidel.) To be sure, expertise tends to diminish the overconfidence effect, although not in all domains (cf. Garb [1989] Clinical judgment, clinical training and professional experience. Psychological Bulletin, 105, 387-396).

The reward function in probabilistic footy tipping is clearly a function of two main inputs: your knowledge of the game and your knowledge of the limits of your knowledge of the game. Someone who knows AFL very well may be able to predict the winner of matches with high reliability. Without knowing, however, just what that high reliability is, the tipper is likely to make those predictions with too much confidence. The result in info-theoretic tipping may be disastrous, allowing a fairly ignorant, but well calibrated, tipster to beat the fairly knowledgable, but uncalibrated, tipster.

The calibration value reported now on the footy tipping pages is a crude, but simple, measure:

where n = the number of predictions thus far; o_i is 1 for wins and 0 for losses on the i-th prediction; p_i is the i-th probability value. This value will be 0 for perfect predictors (no longer possible, since extreme probabilities are now disallowed) and about 1/2 for perfectly ignorant predictors who know their ignorance. Values above 1/2 therefore indicate poor calibration.

This calibration measure is symmetric with respect to over- and underconfidence -- i.e., it will not tell you which is your fault. Overconfidence itself can be directly measured by a straightforward modification. Let O = calibration measured only over games where o_i = 0 and let U = calibration measured only over games where o_i = 1. Then overconfidence = O - U. A positive value indicates overconfidence, a negative value indicates timidity.

Other, more exact, measures can be devised. But these give a pretty good indication of one's calibration and can be readily used to reverse engineer performance to determine how knowledgable tipsters are of the game.