Why did my forecast do that?

Forecasters frequently want to know why their forecast had so much (or so little) effect. For example, Topic Leader jessiet recently asked:

I made a prediction just now of 10% and the new probability came down to 10%. That seems weird- that my one vote would count more than all past predictions? I assume it’s not related to the fact that I was the question author?

The quick answer is that she used Power mode, which is our market interface, and that’s how markets work: your estimate becomes the new consensus. Sound crazy? Note that markets beat out most other methods for the past three years of live geopolitical forecasting on the IARPA ACE competition. For two years, we ran one of those markets, before we switched to Science & Technology. So how can this possibly work? Read on for (a) How it works, (b) Why you should start with Safe mode, (c) The scoring rule underneath, and (d) An actual example.

It works because in a market:

(a) changes cost you points, so you can only do so many,
(b) big changes cost you progressively more, and
(c) the farther and more often you move the forecast towards the right answer, the more you gain

In particular, our market has a proper scoring rule, so you expect to gain the most by moving it to your best estimate of the real chances. (Note: This property has been worked out rigorously by our economist — it’s not necessarily true of all markets out there.)

At this point, domain experts often say, “Fine, but I don’t want to play your game — I just want to say what I know.” That’s why we offer Safe Mode, and suggest people start there.

Safe Mode

In Safe mode, you declare your belief, and the market picks a safe trade that moves the estimate towards your belief. This can be done optimally (given some assumptions), but right now SciCast just moves it at most halfway, limited by spending no more than 1% of your available points. You start with 5,000 points, so your first trade can spend up to 50. As the graph below shows, you can make at least 200 trades this way before you start to be unable to influence things in Safe mode. (And even then, you would still have about 500 points available for direct Power mode edits.)

Scoring Rule

You can skip this section, but some people want to know exactly how the costs and gains are determined. We use Robin Hanson’s well-established Logarithmic Market Scoring Rule (LMSR). Your gain or loss is determined only by your edit and the actual outcome. If you raised the chance of the outcome, you gain. If you reduced it, you lose. The amount is determined by:

gain = 100 * log2(new_chance / old_chance)

That’s it. But as noted, it’s one of the few proper scoring rules out there, and the logarithm uniquely has some desirable independence properties that become important for combinatorial forecasts. (LMSR can use any base and any coefficient. We choose base-2 and 100 so that every doubling will gain 100 points, if correct.) A trivial but nice property is that reversing your edit will return you to where you were before.

Note that it is a ratio rule. Your gain (if correct) is the same whether you raise 1% to 2%, or 40% to 80%. But your costs are not, because in the first case you have reduced the opposite chances from 99% to 98% with a ratio about 1, while in the second you have reduced them from 60% to 20% with a ratio of 1/3. The first edit will cost less than 2 points. The second, almost 160!

Cost is a bowl-shaped function: moving toward the edges gets progressively steeper!

Also let me reiterate: your gain or loss is locked in when you make the trade. What other people do after that is irrelevant. Of course, if you make lots of edits on the same question, you will have to add up all the possible gains and losses to determine your position, but each edit is a binding contract.

Of course, until the question resolves, we don’t know what the real outcome will be. So when you make the trade, we reduce your available points by the possible loss. On a multiple-choice question, it’s determined by the worst-case loss. If you have been increasing the chance of only one option, then your worst case is any other option. But if you have been reducing the chances of various options, then your worst-case outcome is the one you have most reduced, ratio-wise.

Example

Let’s take a look at a piece of the Trade History for a question on 23andMe. (Link to Trade History here.)

Reading from the bottom up, we can see and infer that:

ejh444 used Safe mode to declare “Unlikely.” Since it was already in the “Unlikely” bins, Safe mode would have asked whether ejh444 wanted it lower or higher. We infer ejh444 said “lower.” Safe mode moved the chances halfway from 26% to the bin boundary 20%. Then,
jessiet used Power mode to move it to 10%.
1. Cost = 100*log2(.10/.23) = 120 points, invested now and deducted if in fact the question resolves “Yes.”
2. Potential gain = 100*log2(.90/.77) = 22.5 points, if in fact the question resolves “No.”
bw used Power mode to put it back to 20%
1. Gain of 100 points if true, because log2(2) = 1
2. Cost of 17 points. Moving towards the center is “downhill.”
dilettante tried a Safe mode trade, but apparently 1% of dilettante’s available points is not enough do anything.
dilettante switched to Power mode and reduced it 2%.
1. This would have cost 15 points. Moving it to 19% would have cost about 7.4.
2. We can infer from Safe mode’s failure that dilettante had fewer than 740 points available.
Miku tried to raise it a lot, but Miku has too few points for 1% even to nudge it downhill by 1%.
1. That would cost less than 2 points, so Miku has fewer than 200 available.

The Leaderboard

“Wait! The Leaderboard says dilettante has thousands of points!”

Yes. The leaderboard is the expected value of dilettante’s whole portfolio, including investments. Top scorers are usually heavily invested — that’s how they have grown their portfolios. Most of dilettante’s points are unavailable because they are already invested. To make a big new investment, he or she would have to wait for open questions to resolve, or cash in previous investments.

One can “cash in” by reversing or moderating previous edits. But that is a topic for another time.

The Official SciCast Blog

A crowdsourced science & technology forecasting project from George Mason University.

Why did my forecast do that?

Safe Mode

Scoring Rule

Example

The Leaderboard

Other interesting websites

2 thoughts on “Why did my forecast do that?”

Leave a Reply Cancel reply