Mathematical Details

T-W-D

T = how many games you've submitted a tip for
W = how many games you've tipped correctly
D = how many games you've tipped 0.5 for

Boldness

If we tip probability pi on the outcome of game i and ri is the match result (ri=1 for win, 0 for loss, ½ for draw) then your boldness for all the games tipped so far is:

The boldness is the result of taking the difference between the expected score and the score actually obtained. The above equation can be rewritten as:

Thus a tipper is considered bold if awarded fewer bits than they expected whereas a conservative tipper obtains a score higher than expected. A positive value indicates you are too bold (suggesting you should tone down your probabilities to make them closer to 0.5) whereas a negative boldness suggests you should be a little more daring.

Calibration

By Kevin Korb   Email: korb@csse.monash.edu.au

Calibration of probability judgment refers to how well one's probabilistic predictions match up with the results. If you are perfectly calibrated, then when you predict an outcome with probability p that outcome will, over the long run, occur with frequency p.

Psychologists have studied the (in)ability of humans to reason with probabilities and have discovered that, by and large, we tend to be overconfident: i.e., when we put the probability at 0.5, the results tend to justify that estimate, but when we put the probability at 0.9 (or 0.1) the frequency of the predicted outcome is markedly lower (higher) than that. (See Lichtenstein, Fischhoff & Phillips [1977] Calibration of probabilities. In Jungermann & de Zeeuw (eds.) Decision making and change in human affairs. D Reidel.) To be sure, expertise tends to diminish the overconfidence effect, although not in all domains (cf. Garb [1989] Clinical judgment, clinical training and professional experience. Psychological Bulletin, 105, 387-396).

The reward function in probabilistic footy tipping is clearly a function of two main inputs: your knowledge of the game and your knowledge of the limits of your knowledge of the game. Someone who knows AFL very well may be able to predict the winner of matches with high reliability. Without knowing, however, just what that high reliability is, the tipper is likely to make those predictions with too much confidence. The result in info-theoretic tipping may be disastrous, allowing a fairly ignorant, but well calibrated, tipster to beat the fairly knowledgable, but uncalibrated, tipster.

The calibration value reported now on the footy tipping pages is a crude, but simple, measure:

where n = the number of predictions thus far; o_i is 1 for wins and 0 for losses on the i-th prediction; p_i is the i-th probability value. This value will be 0 for perfect predictors (no longer possible, since extreme probabilities are now disallowed) and about 1/2 for perfectly ignorant predictors who know their ignorance. Values above 1/2 therefore indicate poor calibration.

This calibration measure is symmetric with respect to over- and underconfidence -- i.e., it will not tell you which is your fault. Overconfidence itself can be directly measured by a straightforward modification. Let O = calibration measured only over games where o_i = 0 and let U = calibration measured only over games where o_i = 1. Then overconfidence = O - U. A positive value indicates overconfidence, a negative value indicates timidity.

Other, more exact, measures can be devised. But these give a pretty good indication of one's calibration and can be readily used to reverse engineer performance to determine how knowledgable tipsters are of the game.