February 1, 2012

The Matchup Problem

A matchup occurs when two teams or two players meet in competition. How can you predict the outcome? Outcomes are never a certainty, so the best you could hope for is to estimate the probability of the different outcomes. For example, if you can predict that team A should win with 60% probability and team B should win with 40% probability, it means that in an ideal world in which team A and team B play a large number of games under the same circumstances, then A will win about 60% of the time.

Here's an example. Suppose you can take a series of wagers for 100 that will pay 200 if team A wins and 0 if team B wins. Should you take it? Yes, if you can accurately predict that the probability of team A winning is higher than 50%. Say you took this wager 1000 times. Then you put up 100,000 and more than 500 times it should pay off at 200 which is a total of 100,000, the amount you put up. Suppose that you estimated the probability to be 60%. Then if you were accurate, about 600 times you will be paid 200, which is a total of 120,000. The law of probability guarantees that if your 60% prediction is accurate, and if you can make more and more wagers, then your return will come closer and closer to 120 for every 100 that you put up. In other words, your profit may go up or down, but on average it will approach 20%.

So much for motivation.

The problem, of course, is how to make a prediction. Suppose team A is a good team, and has won 75% of its games against all other opponents, and is playing team B for the first time. If team B is an average team, you might think that team A should have a 75% probability of beating team B. This may or not be true, but it's a good guess. But what if team B is not average. Say team B has only won 25% of its games? Then the probability of team A winning should be larger than 75%; but how much larger? Or what if team B has won 60% of its games, or 80% of its games? When you are placing bets, it's important to have a confidently accurate estimate of the probability, because in effect you are estimating your average profit.

The matchup problem is not limited to team competition. It also applies to individual competitions, and competitions with multiple possible outcomes that take place within games. Here I am thinking of the batter and pitcher. The possible outcomes for a batter-pitcher matchup are, in simplified terms, will the batter get a hit or not. The same kind of conundrum arises. Given the batting average of the batters faced by a pitcher, you can predict whether the probability of the batter getting a hit against that pitcher will be greater or lesser than his average. But what is the actual estimate of the batting average in that particular situation, facing that particular pitcher? And how does it change depending on whether the batter is batting in his home ballpark or not, or whether there are runners on base, or even whether the outfield is playing the batter in, or out, or in a shift.

There are some intuitive guesses for formulas that put information together into a prediction for the probability of a matchup. I struggled with them for a time, and I will describe them in future posts. Then I will show that there is one formula that is mathematically sound (with caveats), and has proven to be accurate in many situations. In computer science, it is known as Naive Bayes. It's the best way to use the available information, in my opinion, and it was the inspiration for AccuBaseball.

January 27, 2012

The Rule Book

To play a game you need two players (or one player can play both sides), data and three dice. There is a data sheet for each team which hands out grades for each player, as well as a season page which creates tables for stealing, batting, and running. The dice should be three different colors, to make it easier to read out the numbers of a roll. Follow these steps for each batter, keep score and repeat until the game is over.

  1. Either player can make substitutions to their lineup, provided the rules of baseball are followed.
  2. The pitcher can intentionally walk the batter.
  3. Runners on base can steal or run on pitch. When stealing:
    • Roll dice on the stealing table, depending on the stealing and holding grades, to find out if runner is safe or out.
  4. The batter can bunt, or swing. When swinging:
    • Tally the batter and pitcher grade: adjust for number of batters faced, park, number of outs, runners on base, hitting and pitching streaks.
    • Roll dice on the batting table, depending on the tally, to find out what the batter does: strike out (K), ground out (G), fly out (F), walk (W), single (S), double (D), triple (T), or home run (H).
  5. Roll dice on the running table, depending on what the batter did, to find out what runners do.

January 26, 2012

Introduction and Table of Contents

I am a mathematician and a baseball fan. For several years, I thought about how to put these two interests together, by searching for ways to use math to better understand baseball. This blog is about a baseball simulation game that I invented, called AccuBaseball. It combines a new mathematical model of baseball with easy-to-follow rules, and plays historical or fictitious games with statistical fidelity.

TABLE OF CONTENTS (planned)

Prologue

  • the matchup problem
  • early experiments with matchup prediction
  • probability in Strat-O-MaticTM
  • the log5 solution
  • pitcher vs batter matchups
  • rediscovery of naive bayes
  • concept of playable simulation
  • credits

Theory

  • naive Bayesian theory
  • calculating odds ratios for pitchers and batters
  • using odds ratios to predict match-ups
  • odds ratios for arbitrary factors
  • bin-conditional probabilities
  • possible extensions to multi-dimensional binning
  • steal estimation and counter-factual
  • observations on the evolution of play: dead ball vs modern bb

Construction

  • the retrosheet project
  • computing odds ratios for batters and pitchers
  • computing odds ratios for outs, bases, park
  • converting an odds ratio to a grade
  • determining rookie pitcher/batter grades
  • scoring and binning a season for batting table
  • computing base running probabilities
  • mapping probability to dice rolls
  • grading stealers and holders
  • determining plausible rosters
  • deriving the management book, stealing, pinch hitting, relieving

Playing the board game

  • the rule book
  • the AB scoring method
  • selecting a roster and lineup
  • scoring a matchup
    • rookies in play
    • adjusting for outs, bases, park, streak
    • adjusting for incomplete fielding
  • rolling on a batting table
    • bunting
    • intentional walk
    • the pitcher at the plate
  • rolling on a running table
    • running after bunt
    • starting the runners
    • implicit errors, e.g. G1
  • rolling on a steal table
    • double steal
  • example play(s)
  • management by the book, stealing, pinch hitting, relieving
  • playing solitaire
  • guidance for fielding new teams or playing teams out of season

Playing the Computer Game