A matchup occurs when two teams or two players meet in competition. How can you predict the outcome? Outcomes are never a certainty, so the best you could hope for is to estimate the probability of the different outcomes. For example, if you can predict that team A should win with 60% probability and team B should win with 40% probability, it means that in an ideal world in which team A and team B play a large number of games under the same circumstances, then A will win about 60% of the time.
Here's an example. Suppose you can take a series of wagers for 100 that will pay 200 if team A wins and 0 if team B wins. Should you take it? Yes, if you can accurately predict that the probability of team A winning is higher than 50%. Say you took this wager 1000 times. Then you put up 100,000 and more than 500 times it should pay off at 200 which is a total of 100,000, the amount you put up. Suppose that you estimated the probability to be 60%. Then if you were accurate, about 600 times you will be paid 200, which is a total of 120,000. The law of probability guarantees that if your 60% prediction is accurate, and if you can make more and more wagers, then your return will come closer and closer to 120 for every 100 that you put up. In other words, your profit may go up or down, but on average it will approach 20%.
So much for motivation.
The problem, of course, is how to make a prediction. Suppose team A is a good team, and has won 75% of its games against all other opponents, and is playing team B for the first time. If team B is an average team, you might think that team A should have a 75% probability of beating team B. This may or not be true, but it's a good guess. But what if team B is not average. Say team B has only won 25% of its games? Then the probability of team A winning should be larger than 75%; but how much larger? Or what if team B has won 60% of its games, or 80% of its games? When you are placing bets, it's important to have a confidently accurate estimate of the probability, because in effect you are estimating your average profit.
The matchup problem is not limited to team competition. It also applies to individual competitions, and competitions with multiple possible outcomes that take place within games. Here I am thinking of the batter and pitcher. The possible outcomes for a batter-pitcher matchup are, in simplified terms, will the batter get a hit or not. The same kind of conundrum arises. Given the batting average of the batters faced by a pitcher, you can predict whether the probability of the batter getting a hit against that pitcher will be greater or lesser than his average. But what is the actual estimate of the batting average in that particular situation, facing that particular pitcher? And how does it change depending on whether the batter is batting in his home ballpark or not, or whether there are runners on base, or even whether the outfield is playing the batter in, or out, or in a shift.
There are some intuitive guesses for formulas that put information together into a prediction for the probability of a matchup. I struggled with them for a time, and I will describe them in future posts. Then I will show that there is one formula that is mathematically sound (with caveats), and has proven to be accurate in many situations. In computer science, it is known as Naive Bayes. It's the best way to use the available information, in my opinion, and it was the inspiration for AccuBaseball.