Thanks for checking this out! I have held off writing an article on algorithm creation for a few reasons. The main reason is that algorithm design is not something that can be discussed in a few paragraphs or a brief conversation in email or Direct Message. Not to mention, in the normal world, there is usually too much going on with sports and my non-sports life to sit down and get into it. However, being on quarantine there is one thing we all have right now… time! So, there probably will be no better time than the present to put this article together. Because it is a lengthy topic, I will break this article into multiple parts. I don’t know how many parts it will end up being when it is all done. I will simply start discussing and see where we end. I will however keep all parts of the article on this single link & page. It’ll avoid clutter on the website and keep everything in one place. I will also make it a sticky on the Articles page, so it is always right at the top for new visitors to easily locate. Let’s do it!
Part I – Let’s Get Started!
I am regularly asked; how do you create an algorithm? To truly answer is an hours long discussion. Not to mention, I have always avoided discussing my personal algorithms. It’s something I have guarded since I began writing them. If everyone knew my recipes, why would they need me to cook?!? LOL! So, my go to answer is to tell people to check out Joe Peta’s book “Trading Bases”. It’s a fun read and in the book, Joe Peta discusses how he came about creating his MLB model. The nice part is Joe Peta takes you through the step by step of his model’s development. It’s also helps that his model can be built in Excel. So, most people can easily put it together. If you have not read the book, I strongly advise grabbing it. I will be referencing the book as this article develops. I am not sure how long this article will be when it is all done. Good news is, right now, most of us have plenty of time on our hands!
I will move onto Part 2 of this article on Friday. In the mean time I will give you some homework to get up to speed on what I will be discussing. If you want, grab Joe Peta’s book and read Chapters 1 & 2. When you finish those chapters it will bring you to Chapter 3 titled “Cluster Luck”. Don’t get into Chapter 3 just yet. In Part II (Friday) of my article I want to break down the topics in Chapters 1 & 2 before moving to the complicated topic of quantifying luck. Then I will let you know what chapters to read over the weekend and we will reconvene here next Tuesday for Part III! The parts of the article will continue until we have developed our MLB algorithm. If you have questions along the way, please submit them using the form on the Contact page here at the website. Your questions and feedback are always welcome as it will help me put together content for this article. Happy reading!
Part II – How It Begins
I know some of the mathematics and statistical discussions can be confusing or complex in “Trading Bases” (to avoid typing the title every time, in the future I will just refer to the “book”). It is for this reason that I am only taking a couple chapters at a time for each part of this article. I want to make sure you have plenty of time to read and even re-read the chapters. I want to make sure you grasp the key concepts that Peta covers. At the time of this writing, we are all essentially on quarantine. So, all we have is time right now. Therefore, I will take the time to proceed slowly and be sure you understand each step fully.
Every algorithm begins with a theory. The theory is the variable(s) which you feel correlates to an event’s outcome. Your goal is that your calculation will more accurately correlate to an event’s outcome compared to the sportsbook’s calculation. If you achieve your goal, your algorithm would then be “+EV” or carry a positive expected value. Peta describes in his book the “Daily puzzle to be solved in the form of the Vegas line”. I love the sentence because it’s how I’ve always looked at the design an algorithm. I saw it as me cracking the oddsmaker’s code and designing a better code to beat him. I love the competition of it all. Plus, the competition is on my favorite playing field, mathematics!
Since nobody can predict the future, we attempt to attach probabilities to a specific outcome of an event. The tried and true example for any probability discussion is the old fashioned coin toss. Technically, a coin toss is a 50/50 proposition. When you flip a coin there is a 50% chance of heads and a 50% chance of tails. However, that’s in a perfect environment and assuming a perfect coin with random environmental variables. What if the person flipping the coin accidentally dropped it before flipping it. When they dropped it, the edge of the coin got dinged. Will the ding change the way the air hits the coin and in doing so, favor one result over the other? If it does, no longer is the coin toss 50/50, rather heads may have a 51.1% chance against tails having a 48.9% chance. It’s not much, but if the sportsbook offered +100 odds on both heads and tails, now you have a +EV bet when you bet heads on this specific coin. In theory there is an algorithm for coin toss assessment. You would have to weigh the coin, measure the coin, look for imperfections, assess who is flipping the coin, how do they place the coin on their thumb when they flip it, wind, air pressure and gravity conditions, etc. All these factors can be broken down into numbers. The numbers are then weighted based on how heavily each variable correlates to the final result. A calculation or algorithm is then put together to dictate the probability of a simple coin toss. The goal of an algorithm is to take all the possible dependent variables to an outcome, find the proper weighting of the variables and use them to attach a probability to an event. I know, it sounds complex… it is! It’s why very few people take the time to try and create an algorithm and why even fewer are successful. However, I put my pants on the same way as you each day. We both breathe the same air and my college degree was not in mathematics or statistics. So, if I can do it, you can too! You just have to want it bad enough like I did!
The core of “Trading Bases”, is that Peta wanted to figure out an MLB team’s win percentage so he could bet season win totals. As part of this journey, in Chapter 2, Peta tries to figure out why the Tampa Bay Devil Rays were so strong the previous season. In an effort to breakdown the Rays success, Peta brings up the golden rule of baseball. Baseball’s golden rule is that for every 2 hits you can expect 1 run to be scored for the team. The actual number varies by season, but typically will be between 1.95 and 2.05 hits for every run scored (i.e. 2 hits per run). While the golden rule is simply 2 hits equals 1 run in baseball, Peta correctly surmises there are also other factors which dictate a teams success in scoring runs. Peta also looks at walks, on base percentage (“OBP”), slugging percentage (“SLG”) and isolated power (“ISO”). Now we are starting to build a list of dependent variables connected to a team’s run production. Once we know a team’s run production we can now begin to calculate their expected win percentage.
Now I know we want to jump right into the algorithm creation phase, but we will get to it. First, I want you to absorb the key part of these two chapters… success in baseball is correlated to a team’s runs scored. A team’s runs scored is correlated to their efficiency at the plate and the resulting hitting. I know it sounds simple and this part of algorithm development is simple. For example, whether a game goes over the total is a result of how much the teams score. I know rocket science!! However, we first need to know this initial step so we can move on to thinking about the next step… what will dictate a team’s ability to score in this game. It’s at the second layer where we sink our teeth into the meat of an algorithm’s development. Here’s another example… given the success of the horse racing algorithm, I am getting a lot of questions on how to design one. I would ask you, what dictates which horse wins a specific race? Answer: The horse who gets to the finish line in the shortest amount of time. What is the determining factor in how quickly something gets from one point (the gate) to another (the finish line)? Answer: It’s the thing’s speed. So, knowing the core of a horse algorithm is simple. You need to calculate the average expected speed of each horse in the race to figure out probabilities on who will win. Again, it’s easy knowing the final factor which dictates the outcome of the event. Now the question becomes… how do we calculate it (the final factor) with a high degree of confidence?
In the Part III of this article I will get into the formula for using MLB stats to predict a team’s expected win probability. We will then compare that probability to the sportsbook’s implied probability based on the lines they are hanging. It’s at this point we find our edges in the numbers. If our algorithm is better than the sportsbook’s, we can print money!
For Part III of the article, I would advise you to read Chapters 3 & 4 of the book. I will then have Part III of this article posted for Monday evening. Also, I will say the same thing at the end of each of these parts… if you have questions or feedback please send them over to me. Best way is using the form on the Contact page at the website or sending me an email. Sometimes others have the same question too and it’ll be good to cover.
Part III – The Pythagorean Theorem
When I left you at Part III, I said to read chapters 3 & 4. In re-reading the chapters myself, they cover a lot of ground. I think for the purposes of Part III of this article, I am just going to cover Chapter 3. Chapter 3 discusses an important concept and I want to give that the necessary coverage and emphasis. So, let’s get into it…
I love the first part of Chapter 3 where Peta discusses analyzing retail trader information (he has access through his work at investment bank UBS). Peta uses the data to time the financial markets and make trades. Essentially, this is what I do with The Sharp Plays. I monitor public and sharp action at two global sports books and use that information in various ways/angles to give myself (and hopefully all of you) an edge. Anyway, I enjoyed this part of the book just because it illustrates another connection between financial market trading and sports market trading. Moving on…
The core of the chapter discusses Bill James and world of baseball analytics. I won’t get into Bill James a lot for the purposes here, but suffice it to say he is the father of modern analytics, primarily baseball analytics. Bill James created a formula which works across all sports. It’s known as “The Pythagorean Theorem”. The real Pythagorean Theorem is simply a^2+b^2=c^2. You may remember the formula from algebra. The Bill James version is you take a team’s runs (points, goals) scored and the same team’s runs (points, goals) allowed. You then use these two bits of data in the following formula:
RS^2 / (RS^2 + RA^2) = Team’s Win Percentage
RS = Runs Scored
RA = Runs Allowed
An easy way to do this formula in EXCEL is to lay it out as follows…
|Test Team A
|If you have entered the formula correctly the answer would be —->
|0.562544 or a 56.25% win percentage
The above formula has been tested for MLB and it has been shown to be more accurate if you change the superscript from a “2” in the formula to “1.83”. When Peta ran this formula for the MLB season in the book, he found that only six MLB team’s had final results which were off by more than three wins. Given this analysis, we can now surmise that if we knew in advance a team’s runs/points/goals scored & allowed, we could accurately compute their win percentage on the season.
While the book is about MLB, the formula can be used across all sports. The superscript changes by sport, but the base formula is still the same. Yes, we can cover the superscripts for other sports later on, including the 1.83 for MLB. For now and the purposes of this article, whenever I use the Bill James formula I will use “2” for the superscript regardless of the sport I am discussing. Using “2” just keeps things standardized for the article and avoids any confusion in these early stages.
How will we use this formula since we don’t know what a team will score and allow before a season begins? Chapter 4 of the book is titled “Players’ Performance Projected”. It is in this chapter that we will look at how to project player’s performance. When we project player performance, then we can project what that means to scoring and when we have a scoring projection, now we can project a team’s performance (using the formula above). It is in this way that we will build the model step by step. Yes, building a model is time consuming and laborious. However my goal with this article is to go slowly and step by step. I am leaving Part III intentionally short because I would like you to work the above formula into a spreadsheet and mess around with it before I get to Part IV on Thursday. So, go to your favorite spreadsheet program (Google Sheets, which is free, would work just fine too) and setup the formula above. Then enter the data I have shown and make sure you get the same answer as me. Once you do, you will have a working portion of the model in spreadsheet form. Give your spreadsheet model a fancy name and save it for future updates as we go along!
For Part IV of the article, let’s move on to Chapter 4 which discusses projecting player performance. Now, the book has 25 chapters. Don’t worry the majority of the book is how Peta performed with his model and some tweaks along the way. So, basically, once we get past Chapter 5, we will start to cover 3-5 chapters per “Part” of this article. However these early stages are the core of building a model and so I want to move slowly and step by step. I also want to allow you time to mess around with key formulas on your own, like the one above, before we move to more formulas. HOMEWORK: To prepare for Part IV on Thursday, not only read Chapter 4, but try the “Bill James Pythagorean Theorem” for yourself. Let’s look at it for another sport in the process as well. Let’s try football! The big off-season move in the NFL is Tom Brady going to the Bucs. Go out and find the Points For and Points Against for the Bucs last season (2019-20). Then… (1) when you put the data for the Bucs into the above formula, ask yourself did the Bucs over-perform or under-perform? You would do this simply by comparing the expected win percentage with their actual win percentage. (2) With Tom Brady at the helm, how would you adjust the Bucs Points For and Points Against from last season to predict the upcoming season’s performance? I will do this calculation myself and share my findings in Part IV. Hope you are having fun with this! As I said, this is a slow process and it does take time and effort to create a model. If creating a model were easy, as the saying goes, everyone would do it and make money off of it. What differentiates model makers is how bad they want it and how hard they work for their model!
DATA: Data can be found on most league websites or a website like TeamRankings.com. One of my favorites for MLB is Baseball-Reference.com and last season’s stats can be found at https://www.baseball-reference.com/leagues/MLB/2019.shtml.
Part IV – What Makes A Model Tick
Look at you, moving right along in building an MLB model! A model which could apply to other sports as well. Before moving on, let’s quickly recap what we have covered so far…
1) Hits in baseball have a direct 2:1 correlation to runs.
2) Runs scored and runs allowed have a direct correlation to win percentage.
Now we need to calculate how many hits and thereby runs a team is projected to have in a given season. We also need to know how many runs a team is expected to give up. How can we run both of those assessments? We have to take things down to the player level. It’s at this point where we dive into Chapter 4 of “Trading Bases” titled “Player Performance Projected” and our homework from Part III. It is also at this point where I expect to lose 80% or more of my readers. Why would I lose so many people? Because, this is the point where the actual work begins and where I can no longer take you by the hand. It’s at this point you have to make decisions and engage in critical thinking on your own. For some, this will be the best part of the process. For others, it’ll be the point where they say thanks, but no thanks. Just remember, if making a profitable betting model were easy, straight forward and everyone used the same recipe… what good would the model be? It’s the uniqueness of the model which is how you find your edge. It’s why I also never share even the slightest portion my recipes. If 200 people had my recipe and began attacking the markets using the recipe, the value would be long gone and the recipe would be useless for 98% of the people. It’s at that point I would have to devise a new model to take advantage of the pricing inequalities now created by the first model and find new angles. LOL! Is your head spinning yet?? Hopefully not, let’s move on…
In Chapter 4 of Trading Bases, Joe Peta discusses how he put together player projections to calculate how many runs a team would score and how many runs they would give up. For the purposes of this article, I won’t recap what Peta does. You can simply read that in Chapter 4. Instead, I will get into making your own assessments. Now, the article here so far relates to MLB and thereby moneylines. However, I will get into spreads and totals in later parts of this article. So, those interested only in spreads, hang in there!
OK, you are now at the point where you need to assess things at the player level. You can do this by taking the data on players from the previous year and using it for the year ahead. The problem however is players age, get injured, move to new teams, styles change, etc. All of those things have an affect on the players and of course in the offensive and defensive performances of a team. So, whether you assess players based on Peta’s method (discussed in Chapter 4) or some method on your own, is up to you. In my opinion, the best tool to assess player changes year over year is PECOTA. It stands for Player Empirical Comparison and Optimization Test Algorithm. It is used by MLB front offices all over the league and is an excellent tool to assess how a player will evolve year over year. You can check the data out at https://www.baseballprospectus.com/pecota-projections/. Subscription prices to access PECOTA are very reasonable and can be used, not just in your model design, but also for fantasy baseball too.
Bottom line, for your algorithm to be based on a current season assessment, you first need to update each team’s roster for player moves (signing, trades, retirements). Once you have updated each team’s roster with the current season’s players and their previous season’s stats, now you have to adjust those stats for a new season. We now have to make assessments on each player’s performance expectation for the season ahead. At this point PECOTA comes in very handy to assess players for the upcoming season. You do not need to do individual assessments for bullpen pitchers. For bullpens, Joe Peta found in his studies that using the WAR (Wins Above Replacement) average for bullpens from the previous season is sufficient. Bullpen performance has a lot of randomness to it. So, you can use the average bullpen runs allowed from the previous season as your standard for all teams… then tweak slightly for exceptionally good and bad bullpens. To compute runs scored, you need to analyze every offensive player for hitting and quality of hitting. For runs allowed, you will assess what you expect the starting pitchers to give up combined with what you expect the bullpens to give up. Using the previous season’s stats as the base helps to dramatically minimize the work. Instead of going from the ground up on each player, you are merely tweaking their previous year’s performance. Once you cover offensive production, starting pitcher performance and have your bullpen runs allowed, you are good to go! Now you have the necessary information to plug into the Bill James Pythagorean Theorem (your estimated runs scored & allowed).
You might ask me, what’s the best way to make player assessments? I have given you the paints, now you have to create your masterpiece. You will want to create those updated rosters and adjust expected hits for offensive players. Perhaps you will use expected evolution for each player based on PECOTA. You will then want to do the same for the starting pitchers. At that point you will have a recipe similar to what a lot of people do using PECOTA. The question is really how will you now adjust the recipe further or what assumptions will you make to differentiate your model from everyone else? Therein lies the key and what will make your model special and hopefully profitable! If doing this for all the MLB teams is too much for you. Do it for your favorite team… just one team. Update their roster, make the player assessments, compute the expected runs scored and allowed and see how your team will do this season. Then put your money where your mouth is with an OVER/UNDER bet on your team’s season win total. While doing this for all MLB is key to having a model, if you just want to mess around, do it for one or two teams only. You could also create a spreadsheet using every MLB team’s runs scored & allowed for 2019 and see who over performed and who under performed. It’s a fun exercise and should take less than 30 minutes. Simply create a spreadsheet, list the teams, enter the runs scored and allowed, enter their win percentage and then put in the Bill James Pythagorean Theorem. It’s fun to check out and a good test of your Excel skills!
Also, just a quick side note related to MLB correlations… Peta notes that for every 10 runs a team scores in a season equates to 1 win.
ALGORITHM LAB: I had you do some homework so we could perform a mini-assessment for this part of the article using what we have learned up to this point. To prepare for Part IV, I had you pull some stats on the Tampa Bay Bucs and make some calculations using the Pythagoean Theorem. I also asked you to do an assessment as to what you think Tom Brady will do to the stats (Points For & Against) for the Tampa Bay Bucs this season. My assessment is below…
|Pythagorean Theorem Prediction
|Tampa Bay (2019)
|8-8 for .500
|7-9 for .438
|Tom Brady Effect
|10-6 for .625
|William Hill Win Total (O/U) = 9
OK, in the second row of the table, I list Tampa’s statistics from last season. Tampa scored 458 points and allowed 449 points. When we input the data into the Bill James formula from Part III, we calculate an expected win percentage of .509922. A roughly 50% win percentage will have resulted in a Buc’s record of 8-8. The Bucs finished the season 7-9 meaning they under performed based on their statistics. Funny thing is many of my followers will remember a Premium Play on Tampa -1.5 over Atlanta on December 29th. Why would this game stand out? The game opened with a Winston INT that resulted in points for Atlanta and another turnover which lead to an Atlanta field goal. After the 1st Quarter we found our Tampa -1.5 bet down 10-0! In the 2nd Quarter however, Tampa came roaring back. The Bucs put up 22 points and went into the half with a 22-16 lead. In the second half and with this being the final game of the season for both teams, it looked like both teams were mailing it in. The problem is the Falcons ended up getting 6 points in the 4th quarter (one FG was with no time left) and the game went to OT. In OT, Tampa got the ball first. Perfect! Let’s just get Tampa down the field and into the end zone! Well, first play of OT, a Winston Pick 6! Game over… Atlanta wins 28-22. UGH!!!!!! My point of this long story is Winston had another one of his Pick 6 mistakes and it cost Tampa an 8-8 record. Granted in this game the Tampa kicker also missed three field goals. However, the point remains that Winston both cost his team points last season on offense and gave up points to the other team. So, if everything else stayed the same for the Bucs, what will the removal of Winston and the introduction of Brady do for Tampa’s numbers?
In my analysis above, I felt Tom Brady would result in an average of 5 points per game more than Tampa already was scoring. Las Vegas oddsmakers (some analysis on this: https://www.boydsbets.com/nfl-player-values-against-the-point-spread/) typically value Brady, in a game over a standard QB, as worth 3.5 points to the spread. I added another 1.5 points for the fact that Winston is a below average QB. So, I get 5 points per game for Brady over Winston on offense. I also feel that the removal of Winston will mean 42 less points scored by Tampa’s opponents this year. Winston was notorious for his turnovers. I made the assumption that had Winston been replaced with an average QB, the Bucs would have given up an average of 2.5 less points per game (16*2.5=40). When I adjust Tampa’s scoring to take into account the “Tom Brady Effect”, I calculate Tampa’s win percentage improves to 63.6%. When I multiply 63.6% times the number of games in a season (as of now 16 games per https://bleacherreport.com/articles/2884302-nfl-reportedly-planning-to-play-16-game-schedule-may-delay-2020-season), I get 16 X 0.638 = 10.208 games won. Obviously you can only have 10 wins or 11 wins. So, we round down to 10 wins. Based on my calculations of Tampa’s performance, I calculate they will have 10 wins this season with the addition of Brady and all things being equal. William Hill is currently hanging a win total of “9”. I would thereby see an edge in taking the OVER 9.
I didn’t mean to confuse you by throwing football into this discussion. I did it just to show you the versatility of the Bill James Pythagorean Theorem and to use an example we can all understand.
I could and would have to write a book to properly cover projecting player performances. Only because there are so many ways to do it. However, that’s the good thing. It means there is no set way. How you develop your assessments is what makes your model unique. If you find a way to assess players that correlates properly to outcomes, you will have created a mathematical money machine! It’s going to take effort, time and working through failure.“Nothing in the world is worth having or worth doing unless it means effort, pain, difficulty… I have never in my life envied a human being who led an easy life.” – T. Roosevelt. However, when you create that model and realize you cracked a little piece of the code, it makes it all worthwhile and will start you on your model building journey. There is still a lot more ground to cover in this article. Part IV just brings us to the top of the mountain. You now have much of what you need to create a model. Now we have to navigate down the other side on how to use the model for individual games, how tweak it and how to expand to other sports.
For Part V of the article, I would advise reading Chapters 5, 6 and 7 in “Trading Bases” for Tuesday. I will have Part V on Tuesday, instead of Monday, to give you plenty of time to mess around with the above and read the three chapters in the book. We are getting ready to round the turn into the stretch run. We are almost there! See you back here on Tuesday night!
PART V – Individual Game Analysis
If you have taken the time and put in the work, by this point you have a functional algorithm. Even if you just programmed the formulas from this article into an Excel spreadsheet, you now have a working model. Sure, your model will need to be tweaked and adjusted as you go, but you should have a base. Remember, the best algorithms need constant adjustments to keep their edge sharp. Your algorithm is in a daily battle with the oddsmaker. The oddsmaker is adjusting his algorithms/models to cancel out your edge. So, you have to be ready to always adjust your game plan along with it!
OK, we reached the summit of “Designing An Algorithm”. Now we trek down the other side of the mountain with our algorithm in hand. The algorithm created at this point is setup to analyze futures betting and the big picture for an MLB season. How can we now zoom in and use our season assessment of teams on a daily basis? If you really want to have a functional MLB model to assess individual games (the same theories can be carried to other sports by the way) you need to break down your data for each individual game. In designing your MLB model, I would advise having different worksheets in Excel on each team. So, you would have a Yankees worksheet, Red Sox worksheet etc. On each of these worksheets would be the team roster, pitching staff and your bullpen assessment… laid out in the format and with the data you used to calculate your full season assessments. Doing so will allow you to make easy adjustments immediately for injuries, trades, demotions, etc. throughout the season. Also, on each of the team’s worksheets, it would help to have a “Today’s Lineup” section whereby you could input the day’s batting lineup and the starting pitcher. When you setup the day’s lineup and starting pitcher, you would again use the predicted data that you calculated for each player.
At this point, using the “Today’s Lineup” section that you created, you could analyze the win probability for the team for that day, based on the lineup. You would now be able to calculate that lineup’s expected hits and thereby their expected runs scored (using the 2 hits to 1 run ratio). You would also know the expected runs allowed for the game based on the starting pitcher and your previously calculated bullpen assessment for that team. So, you have all the data you need on the team to get their Bill James Pythagorean Theorem estimated win percentage based on today’s lineup. Yes, you are using the theorem on just one game’s data, but it works exactly the same as though you were assessing the team based on the full season data. You are basically analyzing the resulting runs scored and allowed if the team played with the exact same lineup for the whole 162 game schedule. Joe Peta gets into a further explanation on this topic in Chapter 5. To avoid writing my own book, my goal is that you use Peta’s information from the book as the source for the intricate details of these topics, with my article here to fill in some blanks or expand on ideas. When you have completed your full assessment of a single team using their lineup for the day, you move on to the next team and every team playing that day. Yes, it’s a lot of work. Don’t panic, I have a shortcut coming up.
In Chapter 5, Peta has a breakdown of calculations on how to assess individual game probabilities based on your calculated team win probabilities. I will use Peta’s example of a calculated .600 winning team playing a calculated .400 winning team. Remember, before you calculate the win probabilities for the game, be sure to boost the home team’s win percentage. Peta uses an 8% boost to the home team. The edge can be calculated simply by taking the home team’s calculated win percentage as a decimal and multiplying it by 1.08. The result is the home team’s win percentage with an 8% boost. OK, so let’s break down how we get to each team’s predicted win percentage in an individual game. Peta uses the following example…
|.600 Win Percentage Team
|.400 Win Percentage Team
Now we take those calculations and enter them into another formula to get the team’s projected win percentage for this game…
|Probability the .600 Team Wins the Game
|There’s a 69.2% chance the .600 team wins today.
|Probability the .400 Team Wins the Game
|There’s a 30.8% chance the .400 team wins today.
The numbers should total 100%. If not, you did something wrong. Again, a full and thorough explanation of the above process can be found in Chapter 5 of “Trading Bases” using these calculations. If you haven’t gotten the book, I believe it is $12.99 on Kindle and well worth the investment. It will fill many of the gaps in the article. OK, so now we know the favorite’s probability of winning the game is 69.2%. What does this mean for the moneyline? We now have to calculate this out with another formula. All these formulas can be programmed into an Excel spreadsheet so that when you enter the individual team’s calculated win percentage it automatically adds the 8% boost and calculates everything right down to the moneyline. Knowledge at least of Excel is vital to having a working model here. If you are not familiar, I would strongly advise the “Excel For Dummies” book. The “For Dummies” series is excellent despite the silly name. So, back to calculating the moneyline. Here’s how to calculate the moneyline based on your calculated implied win probability.
The formula is… Calculate a Favorite’s Moneyline based on Implied Probability –
-(Implied probability / (100 – Implied probability)) x 100 = Moneyline
A team that wins 69.2% looks like this for the moneyline calculation… -(69.2/(100-69.2))*100= -225
What that means is this team is a value under -225.
So, if the team we calculate as the favorite in this match is -180 at the sportsbook, we have a value and we would want to bet on that team based on our calculations. There you have it! You have now taken the team’s predicted performance based on our assessment and calculated what the moneyline should be. We compare that to what the bookmaker is hanging and if the lines differ, we play the value.
What about that shortcut? Yes, it is a lot of work to adjust the lineup for each team, each day and then use that to calculate the line for every game. Granted it is the proper way to do things, BUT there is a shortcut.
The first shortcut is to have software designed for you and a data feed that does all the heavy lifting in minutes. However, the expense of this doesn’t exactly make it feasible for most everyday bettors. So, instead I would advise you to do what Peta does. First, go through and use your full season projections for each team to calculate the expected moneyline for each game. Again, if you have your spreadsheet setup, this calculation can be done in minutes using only Excel. Using the full season win percentages you previously calculated doesn’t require you to make team lineup adjustments and it can be done rather quickly. When you do this, you will narrow a ten game schedule down to 1-3 games which show a decent edge against the line. Then you will only analyze those 1-3 games further using the “Today’s lineup” assessment I discuss above. Your workload went down 70-90% in a given day just running the day’s games using season projections first and then doing the “Today’s Lineup” projections. Much easier for the everyday bettor.
You have now added individual game assessment capability to your full season algorithm. Congratulations! You have a true MLB betting model!
Some takeaways from this section of “Trading Bases”… I love in this section of the book how Joe Peta explains the low margins in MLB. People do not realize how advantageous MLB is for the bettor and how finding a dime line makes all the difference. The lower the edge the book has against you, the easier it is to beat the book! It is a great part of the book that often gets glossed over, the value of betting MLB versus other sports!
There is also a VERY IMPORTANT point in Chapter 7 that I LOVED! It’s where Peta discusses how for one of his game assessments, he didn’t think Baltimore would win because he calculated their chances of winning at less than 50%. However, because his calculation showed that Baltimore had a better chance of winning (even though less than 50%) than what the sportsbook was hanging, Baltimore was a VALUE! I wish people would understand this concept more than anything I discuss!!! VALUE is the key to betting. Most sports bettors get this belief that if a team is a value it must be a guaranteed win. No, it’s simple math. If your assessment is more closely correlated to the actual outcome than the bookmaker, you will make a fortune betting sports. Sometimes that will mean you bet a team that has a 25% chance of winning. You don’t expect to win that particular bet, but if the oddsmaker has the line calculated like the team has a 5% chance of winning, long term you will make a fortune on those situations. Of course assuming your model is better correlated to the actual results than the book’s model!
Now I know my article leaves a lot of the leg work to you. Listen, if I took you step by step I would literally have to write a book, just like Joe Peta. I wish I had that sort of time. However, Peta already wrote the book. My goal is through the combination of reading his book and this article that you will have all the tools in your hands to make a proper model.
For Part VI of this article, I would advise you to read Chapter 8 through the end of Chapter 11. It is four total chapters, but we are starting to move away from the heavy lifting and now getting into algorithm function and use. Things will begin to move faster at this point. I will have Part VI of this article posted here on Friday afternoon! I hope you enjoy!
Part VI – Risk/Bankroll Management
The most important part of any gambling endeavor is your bankroll management. In finance terms, it is referred to as “Risk Management”. Whatever you want to call it, managing your bankroll or lack thereof will dictate your success or failure. You will hear me continually say that bankroll management is more important than the bets you make. Someone possessing the greatest sports betting model ever created could find themselves bankrupt if they do not practice proper bankroll management and discipline. One of my favorite passages from “Trading Bases” is as follows (hence the highlighted, bold, red text): “It means that no matter what the endeavor, if you have an edge, a competitive advantage or a carefully constructed model with a positive expected return (+EV), you must avoid wiping yourself out with a singe bet. Never make a bet on one day that imperils your ability to exist the next day.” Basically, your betting for today should be calculated in a way that win or lose, you will be able to bet the same way tomorrow. If that is not the case, something is wrong and you need to reassess immediately!
I knew the everyday gambler had severely poor bankroll management. I have worked with dozens of sportsbooks as a consultant in various capacities. The operations I worked with were built due to the chasing and pressing of bets by gamblers. However, I didn’t realize the complete lack of bankroll control until my interactions on Twitter. Seeing people talk about being busted after losing 4-5 units on a cold run (which isn’t really that cold) or a couple units on a bad day, made me realize that some folks are lost causes. Those folks will spend the next 10, 20, 30+ years just donating to the bottom line of sportsbooks and casinos. I don’t want to see it happen to these folks, but unfortunately it’s inevitable. Sportsbooks are cash cows for one reason, people suck at managing their emotions and their bankroll. For this reason the sportsbooks devour the average bettor’s finances every year. One simple fix could dramatically minimize the damage the sportsbook does to you. Even if you still suck as a handicapper, adding the fix of proper bankroll management will limit the damage the sportsbook does to you each year.
Why don’t people take part in this simple fix? Some people are betting because they need money. So, even though they may have $1,000 or $5,000 to their name, they will bet $200-$500 a game because they have to make the mortgage payment or they need income. Sports betting or gambling in general seems like the “easy” way to achieve these goals. In reality gambling to cover money you don’t have is a recipe for disaster. Other bettors find bankroll management takes the fun and excitement out of betting. In that way, they are correct, that is what bankroll/risk management is intended to do. The goal of bankroll management is to make betting more a business than a leisure activity.
I hate to bring up a tough topic, but we learn best through our failures. On November 24th, 2019 I released a Robin Hood Selection. The play lost. What followed was the worst draw down of any wagering activity I engaged in for 2019. The Robin Hood Selection and the subsequent Robin Hood Club lost 19 units. It sucked! However, such a draw down was not out of the statistical realm. Not based on my opinion, but based on statistical analysis, a professional bettor is expected to have at least one draw down of up to 25+ units in a given year. In October, before any of the Robin Hood Selection issues happened, I posted an article by Pinnacle on the topic. You can read the article at https://TheSharpPlays.com/sports-betting-drawdowns-by-pinnacle-sports/. The point of the article is even professional sports bettors, those betting with an edge against the book, can expect to experience at least one substantial draw down annually. At the end of the year a professional bettor will still finish plus money overall, as I did, despite the 19 unit draw down. HOWEVER, the road to profitability each year won’t just be a straight line up. Similar to the price movement of a stock, the bankroll of a professional bettor will be a series of peaks and valleys along an otherwise upward trajectory. Apple stock doesn’t go from $10 to $200. Apple stock goes from $10 to $30 back to $20, then up to $50, up to $80 and then down to $40. Over that example, Apple was a great investment, moving from $10 to $40. What you don’t realize is along the way, Apple lost 50% of its value from its highest point ($80 back to $40). Sports betting profits for professional bettors follows a similar path as the Apple stock example. If you are only prepared for the peaks and not the valleys, prepare to drown when the next valley hits.
Hopefully the above has provided the emphasis as to why bankroll management is important. So, what’s the best way to manage bankroll? There are all sorts of theories out there from the standard 100 unit bankroll, to the Kelly Criterion to what Joe Peta uses. For the purposes of this article I will discuss what I do and why. However, I strongly advise you to research bankroll management strategies and find the one that best works for you!
I use the old school 100 unit bankroll with a 1 unit maximum bet. It’s simple and it is easy. Are there more effective strategies, yes, but this one works for me! My method ensures that if the 25 unit draw down hits me immediately as I start the year, before I even have a chance to accumulate house money, I will still have 75 units in reserve to keep attacking. Do I adjust my bankroll? I used to adjust my bankroll as it grew, but now I am happy where I am in bankroll terms. However, when I was looking to achieve a certain base unit wager, I would adjust my bankroll every quarter. At the end of the quarter, if I was over 110 units in total bankroll on hand, I would take the profit above 110 units. If I was under 110 units, I leave it alone. Why leave 110 units in there and not 100? I liked keeping the 10 units of extra reserve in my bankroll when I was up money. Obviously, if I have over 110 units, it means I had a good run. It also means, based on standard probabilities, I will be due for a cold run. I know eventually a cold streak will hit and that 10 units will help me weather the storm and maintain my bankroll goals. The 10 units will allow me to stay close to 100 units total when the cold streak hits. Remember there will ALWAYS be cold streaks!!!!! I don’t care how good you are! Now that I am not growing my bankroll, what do I do? At the end of the year I remove the profit above 100 units and start over with 100 units for the new year. Rinse and repeat, year after year.
Do I keep my entire bankroll in sports betting accounts? Of course not. It’s too risky. I have a special bank account which holds my reserve. I usually have 30 units on hand with sportsbooks and the other 70 units in reserve with a bank. When my combined sportsbook balance gets above 40 units, I withdraw the funds into the account which holds my 70 units of reserve. Pretty simple in principle, right? My friends, that is the entirety of my bankroll management plan. I have a wager range of 0.20 units to 1 unit, 100 unit starting bankroll each year, I take out the profits at the end of the year, maintain 70% of my bankroll with a bank and when I was growing my bankroll I would make adjustments on a quarterly basis. It’s not flashy, but it works and it will make sure you never go bust… because if you get to -50 units, it’s time to hang it up. Clearly betting is more a leisure activity than a professional activity.
In the book section of the book I had you read, Chapters 8 through 11, Peta also discusses ways you can analyze the quality of your model by breaking down what bets you are winning and what bets you are losing. I will not reiterate that here, Peta does a fine and clear job of it. So, I will leave that to him to describe to you in the book. However, I will say it is always good to analyze the results of your model to see what you are winning, what you are losing and how you are winning or losing. It allows you to often filter your model down to its sweet spot (i.e. is you model best at picking dogs, favorites, home teams, etc.).
Concluding the topic of bankroll/risk management, I cannot say it enough, but I will say it one more time… managing your bankroll is more important than the bets you make! It is the difference between a winning gambler and everyone else!
Part VII of this article will be posted on Tuesday, April 14th. I would advise you to read Chapters 12, 13, 14, 15 and 16 for Tuesday. I know, sounds like a lot, but his chapters are not that long. Plus the heavy lifting with regard to the complex concepts is now over. So, it should not be too painful. I hope, for those where it applies, you have a very Happy Easter or a Happy Passover!
Part VII – Tweaking the Model
At this point, we have created a model and have a bankroll management plan to attack the start of season. The next step is tweaking that model to optimize its predictive ability so that it best correlates to the team’s actual performance. How do we tweak a model? Well, there are 1000’s of things you can do to tweak your model for optimum performance. I will give you a few examples, but again, this is one of those phases of model building that will vary by creator. Everyone must do their own tweaking. If I tell you what to do then 500 models will be created with the same formula and that will negate any edge. When it comes to making adjustments to his model, all Peta says in the book is he “made what he thought were appropriate adjustments.” Even Peta, for the purposes of his book, does not expand on what those adjustments were because again… that’s what makes his model special. Tweaks would typically involve adjusting your player or team variables and the weighting of those variables. You might find that your individual team predictions are 10% lower across the board for every team in the league (compared to actual performance of each team). Well, you could boost all your predictions with a 10% additional weighting. Sometimes the universe doesn’t make sense as to why something might be the case. Don’t fight it, just work with it! Think of this phase of model building like cracking a code or combination. Adjusting your model can be a lot of work, but if you get that combination right, you have unlocked the vault of sports betting profit! Trust me, the juice is worth the squeeze!
With regard to our MLB model, the primary method of making adjustments to your model is through editing of those team roster sheets you created (discussed previously in this article). The roster sheets are the individual team pages containing the team’s roster with your predicted stats for each player. By taking the time to create those sheets before the season, you can now easily make adjustments or even expand into other statistics. I cant stress enough that all this takes time. Back 20 years ago, few bettors were doing this so it was easy to find ways to gain edges. Now, thanks to technology, bettors have sharpened their games. However, so have the books. It’s a constant cat and mouse game. Your first season will require the most work in getting your model going. Once you have a profitable model, typically the workload for future seasons is 10-15% of what you spent in the creation/season 1 phase. So, don’t worry, it does get better!
A side note for this section, if you are looking for the exponents to use for other sports (i.e. how we use 1.83 for the Bill James Pythagorean Theorem for MLB) you can visit https://en.wikipedia.org/wiki/Pythagorean_expectation. I will discuss other sports later in this article.
I would also like to add another side note. On the last page of Chapter 12, Peta talks about one of the most important things in the whole book. Peta discusses Warren Buffet’s saying that “they don’t ring a bell at the top”. Some of you reading this are young guys in their 20’s. While I hardly think I am old, I am definitely older than you. Regardless of your age, make a point to cherish every moment. It’s not easy, it takes effort, but when you look back you want to minimize the regrets you have. We will all have regrets, but this section of the book was especially poignant for me. I am avoiding writing a book report about Peta’s book, rather using it more like a textbook for this article. However, I felt this is just one of the important lessons in the book. It goes well beyond sports. I talk about my aunt all the time. She was a major presence in my life. She passed away a few years ago. As you know, my wife and I do a scholarship in her memory. Coincidentally, right now we are actually going through the applications for this year’s applicants. Anyway, my aunt texted me one day asking if I could drive her to a routine medical appointment and just generally checking in as she always did. I told her I would take her to the appointment and we chatted briefly about the news of the day. It was the last text message I ever got from her. There was no “ringing of a bell” to let me know this was the last time. She was in good health, so there was no inkling I would not have the opportunity to text with her again. It is just how life works… you never know what the future holds. I wish there was a bell. You never know and it isn’t easy, but always try to end conversations or enjoy those special moments as though you may never get to experience them again or with those same people. Be it a family trip to a favorite place, a conversation with a loved one, whatever. Don’t take for granted that you will do it right “next time” because you may not get that opportunity. Do it right the first time and work to do it right every time!
Life lesson concluded, now back to model design. Peta discusses that once the MLB season is a quarter of the way through, Peta now removes his pre-season projections and replaces them with the actual stats for the current season. What a deal! Now you don’t have to use your predictions, you just use the actual stats. The reasoning is that there is a strong enough correlation to performance over 40 games and performance over the full season. So, now you can go into your roster sheets and make some adjustments. First, save your predicted stats because you can see how good you did at the end of the season by comparing your projections to the real stats you started with. Doing so will help you make adjustments next year in your predictions. Second, make a new roster with each team using the player’s current stats. You will now use the new roster to make your win percentage and other calculations. Why not just wait until the season is a quarter of the way through and start there with the actual stats? Typically, any good model will be VERY STRONG in the first part of the season. As the season goes along, the oddsmakers will make adjustments. So, you want to have a model ready to go in Game 1 of those first 40 games. If your model is strong, the first part of the season will provide the best returns over the season. Just look at Peta’s model. The beginning of the season was winning, but then the model began to struggle. Those of you who followed me last season remember my MLB totals algorithms won 51 units for the first half of the season…. then lost 10 units in the second half. It was a great +41 unit season despite the second half, BUT had I not bet the early part of the season, I would have missed out on huge profits. My models clearly took advantage of bookmaker’s inefficiencies. You want to be able to do the same with your model.
Side note, Interleague play can mess with models due to the rule changes for the teams. Peta discusses some techniques as to how he addressed these issues and I think those adjustments are spot on. I will refer you to the book and his discussions of how he adjusted his roster analysis for Interleague play.
In the chapters we read for this part of the article, Peta brings up a discussion he had with a friend. I am paraphrasing, but basically, Peta’s friend asked… “Why do we always bet on crap teams?”. I laughed out loud at this passage. It is a question I receive CONSTANTLY for any content I release, paid or free. It usually never occurs before the games. Always after the games whereby people could then tell me why we should not have bet on this or that team. I also get the “why do we always bet UNDERs… UNDERs never win!”. What those people suffer from is confirmation bias. Confirmation bias is the practice of using information to support your belief, but avoiding information that counters your belief. If I released the KC Chiefs UNDER was the sharp side and the game went OVER, I would undoubtedly receive the genius message that you never bet a KC game UNDER. Except the Chiefs over/under record this past season, including playoffs, was 10 OVERs and 9 UNDERs. Hardly blowing away the OVERS! People don’t look at something such as how Premium Plays are 42-19 and that we were regularly betting underdogs and “crappier” teams. Instead people only look at how today we bet another crappy team for a Premium Play, but this time it lost. We don’t remember that the week before we were 2-0 on “crappy” teams in Premium Plays. How this relates to model building is two fold. First, you need to have an open mind and let the model do its thing. So many people cannot bet “crappy” teams. These folks would much rather bet the Yankees every game, the Lakers, the Chiefs, Alabama, Duke, etc. Sure, there are times to bet those teams, but most days everyone else is betting those teams. If everyone else is betting a team, it is theoretically impossible for that team to have value. If everyone is betting Duke, the book doesn’t say, let’s take care of these wonderful bettors and give them a sweet line today. As betting comes in, the line moves in favor of Duke’s opponent (it was probably already shaded to the opponent to begin with too), which by nature will add value to the opponent. At a certain point, no matter how good Duke may be, the line will be way out of touch from reality. Meaning, it is time to bet the other team since they are a big value or “overlay”. If I had Duke going up against a high school team, of course we would all bet Duke, but what if the high school team is +1250 points? Of course the high school team is a value. There is only so much time in a game and of course Duke would never run it up that outrageously. Not to mention, it’s not possible to score 1251 points in 40 minutes to cover (that’s 1 point every 2 seconds). My point in this dramatic example is no matter how unequal the teams may be, there is a point where the point spread or the moneyline become a value for one of the teams. Your goal is to find that point with your model. Will you bet on crappy teams? Yes, you will bet on crappy teams and you will bet UNDERs in games involving high scoring teams more often than not thanks to the public killing any value on the popular side. If you can’t handle that, you shouldn’t be betting or at least not in any serious way.
Another good lesson is in Chapter 16. In Chapter 16, Peta asks why books don’t put out more information on the action like the financial markets do. Peta feels that books should put out information as to how many tickets or how much money is being bet on games. Peta wrote his book in 2010, but obviously since that time this information has begun to pop up all over the place. Today, there are popular apps and websites that provide this type of betting information both free and for a price. Many bettors use this information heavily and base their wagers on their assumptions in reading the data. In Chapter 16, notice that Peta doesn’t say that knowing this information will be helpful to the bettor. He says it would be helpful to THE BOOK!!! Peta is spot on! The information helps to generate action and that’s all books care about… the more action, the more their edge comes into play and the more money they make. It’s simple math for them, but so many bettors walk into the wood chipper every day using ticket and money percentages to decide what they are betting. You are a sportsbook’s dream! Books put this information out to increase betting like Peta says. You see tickets on one side and money on the other and you assume the money side is the sharp side. You see tickets and money heavy on one side and you assume that’s the public side. You see money and tickets on one side, but the line move the other way and assume the line move is revealing the sharp side. Sure, sometimes you will be right and sometimes you will be wrong… about the same amount of times as if you flipped a coin to make your bet. Books know this information has ZERO value otherwise they would not put it out. Trust me, I work with books, books who put out this information and have discussed the reality of this information being put out there!!! The information has value to the books to increase volume, not to the bettor!!! I discuss this in my article Why Ticket/Money Percentages Don’t Tell The Whole Story. Bottom line, unless you can see who is making the actual wagers at a sportsbook and the bettor’s past performance on those types of wagers, at best you are guessing when you use ticket & money percentages. Unless you know why the book is really moving the line, at best you are guessing and again, it’s no better than a coin toss. Which means, when your read of the ticket and money percentages or line moves is correct, confirmation bias will kick in and you will say to yourself… yep, I figured out the sharp side perfectly. I am a GENIUS! These ticket & money percentages, reverse line moves, etc. are invaluable!! Then the next time you do the same analysis, but you lose, well, that losing memory will get pushed aside. Confirmation bias will allow you to continue to see reason in following (and paying for) ticket and money percentages. Still don’t believe me? Track your performance for your betting using ticket and money percentages. For those same games, flip a coin to pick a side and track that performance too. Then after about 100 picks in your spreadsheet, analyze the difference between the coin toss and the handicapping using ticket and money percentages. If your ticket and money percentages record is not at least 5% better than the coin toss win percentage, might be time to re-think the $39, $59, $149, etc. price you are paying for the data. Just keep a penny in your pocket instead.
For Part VIII, which I will post on Friday (April 17), I would advise reading Chapters 17, 18, 19, 20, 21. It’s five chapters, but most of them are brief so it should not be too heavy of lifting. Read two chapters a night and you will be good! We are slowly coming to the end of the book and thereby the article. I expect there to be about 10-12 parts to this article. Upcoming discussions include applying the model to other sports (including NFL power ratings), analyzing your model and a question and answer segment. Hope you have a good few days and see you back here on Friday!
Part VIII – Analyzing Your Model
So we have created our model, we have setup a bankroll management strategy, we have discussed tweaking the model along the way (which I will discuss further here too)… now what? Now it is time to analyze the performance of our algorithm. To me, this is one of the best parts of model design. You are taking the actual results and assessing where your model is good and where it is bad. Obviously, we want to maximize the good and minimize the bad, but how do we do it?
First, proper DETAILED record keeping is essential for dissecting all the minutiae of your model. I would suggest creating a new spreadsheet where you will log all the wagers your model suggests. I would suggest starting the following columns for each bet:
2) Team You Bet
3) Opening Line & Closing Line
4) Your Unit Rating
5) Was Your Bet on the Home or Away Team
6) Is the Team American or National League (even break down by division too)
7) Starting Pitcher
8) Day or Night Game
9) Result of the Bet
10) Score of the Game and the Total (with the total you can see if there is any correlations to OVER/UNDER in your bets on certain teams or pitchers)
11) First Five Innings Result and Score (It will allow you to analyze the 1st 5 Innings results for sides and totals to see about correlation there. Sometimes a model may be much better for the 1st Half than the full game or vice versa. Knowing this could prove valuable)
12) Notes (anything worth noting on the outcome of the game)
Of course you can add any other columns you think would be good. You could have umpire tracking in your games, stats by pitcher in the game, any multitude of things, but I figure the above will start you off. Once you create the spreadsheet, the rest is easy. You just log your action and then you can run all kinds of sorting and analysis within the Excel spreadsheet. How you sort and what variables you look at is entirely up to you. If you are not familiar with all the capabilities of Excel, I would strongly suggest the “Excel for Dummies” book I mentioned earlier. There is also an “Excel Formulas and Functions for Dummies” book. Once you are adept at how to use Excel, now you can run the analysis on your results. Often the analysis will result in finding little gems to develop future betting angles. Perhaps one specific pitcher is performing for you. Maybe your model is great when it bets on home teams but not so good when it bets on visiting teams. Maybe your high rated plays are awful but the lower rated plays are on fire. All of these assessments can help you dig into your model and tweak it further. Even if your model has been great up to this point, taking the time to digest the analysis of your performance can be invaluable. It can help to make a great model even better! So, take the time to run the analysis and look for angles within your results. Peta’s analysis allowed him to find things such as the lack of correlation the model had with the Minnesota Twins results. It also showed him how the individual unit ratings were performing. Having a winning model is excellent but knowing why your model is performing and how is vital to the long term success and operation of your model.
A very important item to remember, and one which Peta discusses in these chapters, is that you WILL NOT win every week or every month. Every year your model should be a consistent winning performer. However, there will be periods within that year, even back to back months where the model loses. It’s inevitable, so accept it and be prepared to deal with it. There is no model ever created (I am assuming nobody has a time machine yet) that can predict the future perfectly. Too many bettors feel a losing month is failure and that the information, strategy or method must also be a failure. It is such a flawed way of thinking. If I posted plays from an algorithm and the algorithm opened with a 12-3 record but then lost 5 straight games, you know the level of outcry that would occur. People would be saying the model sucks, there is something wrong, it is trash, fade the algorithm, etc. People who say that would be considered morons! I don’t have to worry about offending them because they already stopped reading this article when work & effort became part of the process. My point is, as was Peta’s point, even though there were 5 straight losses, the model is still hitting 60% (12-8)!!! So, unless I was betting -150 moneylines, the model would be profitable despite the 0-5 run in this example. Any successful betting system, even the most successful, will have losing periods (usually even a severe losing period or two) over the course of the season/year. A professional knows it’s going to happen, has a bankroll management system to handle such fluctuations and the fortitude to deal with losing. A simpleton cries and complains wondering why the model isn’t hitting 70% at all times. Anyway, I digress. Point being, in any betting activity and regardless of how good you are, you are going to have losing streaks, bad ones too. Peta lost two straight months in the book, BUT he was still up money for the season. Don’t experience a losing month and automatically assume there is a flaw with the model. Even two losing months, like Peta experienced, did not mean he had a bad model. It is just how betting life works. No betting activity sees profits go up in a straight line. As Peta says, investing in the markets or in sports will see “lumpy” returns. Also, remember that just because your model rates a game highly, does not mean it is guaranteed. It just means it carries a higher long term win probability, but short term, anything can happen. The key is sticking with it because as we find out in the later chapters I had you read for Part VIII, eventually things regress to the mean. Eventually, the profits returned to Peta’s model quite strongly. Patience, discipline, risk management and accepting the model would lose at times, even extended periods of time, were vital to Peta’s success.
Side note to the above paragraph is a quote from Peta in the book, “Even when an investable edge exists, the profit stream is rarely smooth; in fact, it’s almost always lumpy. Proper risk management during the lean times allows for the harvesting of gains when they eventually emerge.” Love the quote because it is spot on for anything you do involving sharp or advantage betting!
Also, when Peta looked back and analyzed his model he found different analytical categories that he felt would be helpful to optimizing the data he was analyzing. For example, when it came to pitchers, Peta began to look at using xFIP and SIERA for his model. Your model will typically start by using standard analytics but as you analyze things, you will notice new angles or weights to improve the model. For example, let’s say you make an NFL model. Your analysis shows your model is strong and when you dive into what makes it strong, you seem to notice that rushing stats have a great correlation to the actual results. Perhaps you could look at advanced analytics that are connected for rushing yards and see if those would help to improve your model. There is a lot of trial and error as well as investigation and research that goes in. Remember though, if you put in the time and crack the code, your model will reward you well! You will put in the data each day and basically have a computerized money machine. How cool is that?!?! LOL!
Once again, a model is work, but it can be fun work as you review the performance analysis and you tweak the model’s analytics. Once you have gotten to this point in the model, you have put the heavy work behind you. The process of tweaking and analyzing is easy by comparison. It is something you will do every month and every year as your model rolls. Also, don’t forget to give your model a name. I just use version numbers. It allows me to track the adjustments along the way. So, when you create the model initially, make it version 1.0. When you tweak the model in minor ways it might be version 1.1. Be sure to track those changes so you can know what makes one model version different from the other. If you have a major change to your model then you move it to version 2.0 and so on. Keeping track of the changes will also allow you to roll things back to a previous version if you want or because the betting environment has changed.
For Part IX of the article, I would advise you to read through the end of the book… Chapters 22, 23, 24, 25 and the Epilogue. For Part IX, I will wrap up the book’s coverage of model creation and tie away some final thoughts for the book. For Part X, I will discuss designing models for other sports. It’s a long topic so I won’t be breaking it down as detailed as I did for MLB of course. Instead I will break down some of the major sports with things to think about for your model in that sport. My goal will be to give you the basics so you can begin to brainstorm on your football, basketball, etc. model. Your homework at the conclusion of Part X will be to send in any questions. I will then use Part XI to cover those questions and I expect to conclude the article fully at that time. We are almost there! Hope you have a great weekend!
Part IX – The Conclusion
I hope you enjoyed reading “Trading Bases”. It is a pretty easy read, it’s educational and quite fun. Anytime someone would ask me about a book on model building, “Trading Bases” by Joe Peta is the one I would always suggest. Hopefully, you now see why. While we have come to the conclusion of the book, the article will continue on for another two parts after today. Not too much to cover in terms of the final chapters for the purposes of model building, but there are some important betting lessons I will cover for Part IX here.
VINDICATION… the model had been fighting the Twins all season. When you have a model there will always be those one or two teams that will fight rational thought. The teams, despite the lack of statistical quality, will somehow come out a lot better (or worse) than they should. Sometimes this can be due to an assessment error, but assuming your model is good, usually it’s old fashioned luck. Teams and players get lucky. There is no way to fully account for luck because, by definition, luck is an unpredictable value. However, luck always and eventually wears out. Such a situation occurred for Joe Peta with the Minnesota Twins. The Twins killed the model’s profit up to August. Then in August, the Twins collapse finally came… the regression to the mean. Over the next two months and by the end of the season, the Twins action actually ended up being profitable for the model! Which means, all that luck came crashing down in the final two months. The model was right all along but luck kept the Twins propped up. In the end, the Twins paid the piper and the persistence of the model won the day.
When I release selections, paid or free, sometimes they follow or fade a certain team. People will ask “Why are sharps always betting Tampa Bay?” or “Why do we constantly fade Oakland?”. The reason, 98% of the time, is simply because the sharps clearly value Tampa at a higher rate than the market and consider Oakland overvalued. So, just like Peta with the Twins, the sharp money will stick to their model, even though they may be getting killed in the short term. The sharps trust their models, so they know in the end that Oakland’s overvalued pricing will eventually show itself and they will end the season profitable betting against the Raiders (provided that’s what their model says to do in the individual games). Same goes for Tampa. If the model says to bet Tampa Bay, even if Tampa is 0-5 ATS, they don’t handicap the model, they roll with it! The lesson in this paragraph is another tough one to teach people. Far too many bettors give up on profitable angles because short term, those angles are not paying off. So, they jump off. However, what inevitably happens is Tampa goes 10-1 ATS to finish the season and they missed those profits. Trust your model!! Don’t shy away from bets because it is losing on a team. If your model is legit, the progression to the mean will eventually occur. Also, even if your model faded a team who had a winning season, let it go. You won’t have a model that is consistently perfect betting on every team. Don’t handicap the model! Let the model do the handicapping for you, win or lose! So, this coming season of any sport, if the model is following a specific team is not performing, don’t cry. Keep betting the team if that’s what the model says. Again… don’t handicap the model, let the model do it’s thing! If you have a proper bankroll management plan, you should not be sweating the fact you are 0-5 on a team. It might be annoying but it should not be a major problem that is frustrating you. It’s all about the long term and the purpose of the model is to take human thought out of individual betting positions & decisions. Don’t add that thought back in by handicapping what your model says to do. Sometimes you will be right with your handicapping (second guessing the model) and sometimes wrong. In the end, it all shakes out in the wash. All that thought will have just been a waste of time. Trust the model!
I will compare this back to the hottest product of 2019, LOL, the Sharp Consensus. Holding a record of 42-19, the Sharp Consensus angle was an excellent performer! Do the math, it’s a 68.9% winning angle. However, many gamblers tried to make it even better! If there was a Sharp Consensus play on a bad team, they would say, I am going to pass on this one. If the play lost, now that had confirmation bias (remember that from earlier?) that their handicapping of the Top Sharp Consensus plays was a good idea. If the next time they handicapped the Sharp Consensus angle they missed a winner, well, that time was forgotten. Bottom line, the Top Sharp Consensus angle was a 68.9% winner, it’s pretty good by itself. Just roll with the wins and the losses. Take the thinking out of gambling especially when you have a winning angle! When you remove the thinking you remove a lot of the emotion of betting and that is CRUCIAL to long term success!!
The return on Peta’s model was 32.82 units and since he used a 100 unit bankroll, this is a 32.82% ROI. If this was a hedge fund, he would be the talk of Wall Street for that year. Since it is sports betting, the average bettor will say, “only 32 units in a season!” While at the same time this same bettor has never had a winning season. The goal in sports betting for any season is usually, based on a 100 unit bankroll, an 8-40% ROI. If you are a +EV bettor, you expect at least an 8% ROI. If you have had a monster season, you could see around a 40% ROI. Anything above 40% is usually not consistently repeatable and anything under 8% might have you checking whether it was just luck. The 8-40% range is usually the neighborhood to confirm that you indeed have an edge on the odds. So, any bettor starting with 100 units and hoping to have 300 units by the end of the season is slightly (I am being kind) delusional. Is it possible? Sure, you could play some wild parlays, get hot and cash in big. Unless you are betting 10 and 20 unit plays, which is insane, it’s going to be tough to pull off that +300 unit return with 1-2 unit bets. Even if you magically did, such performance is not something repeatable on a consistent/annual basis. So, instead of expecting to set records, grind those profits. Turn sports betting into an income generator, not an income eraser!
When it comes to grinding, in the final chapters of the book, Peta talks about how he came about betting the Detroit Tigers to beat the NY Yankees in the MLB Playoffs. The series price on the Tigers showed to be a tremendous value. So, Peta went all-in on the bet (sarcasm)! Peta bet the Tigers to win the series against the Yankees for… 1 unit! I love this part because I hope it illustrates to the average bettor the importance of bankroll management. Yes, as I conclude the book and begin wrapping up this article, I have to touch on bankroll management again. You might have read his analysis of the Tigers at this point and thought… time to go in HARD on Detroit!!! You’d be wrong. Regardless of the edge, you always maintain discipline in your betting. In Peta’s case it was to never bet more than 2 units on any bet. You see, when betting just 1 unit, you have a chance to increase your profits by 1 unit which is, relatively speaking, a decent bump. People love to pickup a 1% profit in a stock trade, how fun to do it betting sports. However, when it comes to sports there is this feeling that a bet would need to be 5% of your bankroll to be anything worthy of our time. It is just not true. Those little 1% victories add up. Also, those little 1% losses, should not hurt… and that’s really the key. Once again, you need to remove emotion from your betting. It’s something few gamblers even come close to doing. I will again bring up an example using the Robin Hood Club. At one point, the Robin Hood Club was down 5 units. In the whole scheme of betting, being down 5 units is nothing. Yet the panicked messages I got, people beside themselves wondering how they would cover the loss, talking about losing money they could not afford, etc. left me shaking my head. Stop! If you are going to bet seriously and refuse to bet within your means, you are going to get blown up. It might be later rather than sooner but you blowing up your finances is INEVITABLE! It’s just the mathematics of betting. Long term, you are a winning proposition for the sportsbook! The book may pay you this month or this season but that’s OK. The book plays it smart. The book is in it for the long haul with you and they know, those little short term loses to you are nothing when compared to the financial devastation long term that they will wreak on you. Some of you probably could have paid for all your kids to go to college given the money you have lost betting sports. I don’t mean to rub this in or be nasty. I am doing it to be blunt and direct! My hope is you will be able to step back, realize how bad of a bettor you are while betting your way and try something new. Even if you don’t take up building a model, at least refine your bankroll management strategy. Just that one fix, as I said in the previous chapter on bankroll management will make a world of difference… even if you still can’t pick the winner in a one horse race!
Side note to all this coverage on profits, bankroll and betting, when it came to the playoffs, Peta used the full season stats for his model assessment. Peta also realized that the top 1,2,3 pitchers in the rotation would be seen more than the #4 and 5 pitchers in the rotation. So, when he calculated his series wagers, he assumed runs allowed based on a team using only pitchers #1-#3. It’s a good way to look at it since they will get the bulk of the action. Thereby weighting your algorithm to the performance of #1-#3 is ideal to optimize the analysis. I will let Peta go into the specifics in his book but I wanted to point this out for your reference.
I am going to post Part X of this article on Friday, April 24th. I will then post Part XI, what I expect to be the final part of the article, the following Friday on May 1st. Waiting a full week between Parts X and XI will allow people to catch-up in their reading, complete all their homework and then send any questions. Which leads me to the homework. In preparation for the conclusion of the article next Friday, I would ask you to please begin to think of any questions about the article or model building. I will have you do the same thing following Part X which will be posted Friday. I will then select various questions and post them here to further the discussion as Part XI. Usually when one person has a question, others have similar questions and it will allow me to go through those items with all of you still reading this article. I believe Part XI will then be the conclusion of the article. I hope you have enjoyed the ride thus far! Start putting together those questions!
Part X – Model Development Beyond Baseball
I’ll start by tempering your expectations for this part of the article. To truly cover the topic of models for other sports would require another article, probably a book. So, my goal is, through using the above model you created as a base, to touch on how it could be applied to other sports. I will leave the heavier lifting to those of you looking to create the model. However, if you have come this far in the article, you have the work ethic and motivation to put the time in to crack the code. It should not be too hard for the wheels to begin moving for you and for you to transition to basketball, football, hockey, etc. Also, don’t forget, “The Google” is an amazing resource. I get asked questions all the time. One of them was “What is the Bill James exponent for football?” Well, if you copy that exact sentence or even similar wording right into “The Google” and press enter, the first entry is Wikipedia which explains how it is 2.37. I am always happy to answer questions and enjoy the discussions, don’t get me wrong! So, don’t hesitate to contact me. However, sometimes it wouldn’t hurt just to run your question in different ways through Google before checking in. There have been many times that people will send me a question, I Google it and give them the answer I find. LOL! Sometimes you can cut out the middle man and get a much faster response. Again, there is literally a world of information out there and sports analytics have many very good resources.
When I began writing this article, the question I would get the most is, how do you setup the model for point spread betting? I will touch on a few basics to hopefully set you on your way. First of all, the Bill James Pythagorean Theorem can be used for all major sports. The only difference is the exponent. In baseball, we use an exponent of 1.83. Research has shown that the correct exponent for basketball is 13.91, for football the exponent is 2.37 and for hockey the exponent is 2.05. So, we have the setup for the Bill James Pythagorean Theorem, now the question becomes how do we predict the points/goals for and points/goals against for our model to apply to these other sports?
In baseball, we learned that for every two hits a team gets, they will typically score one run. Does a similar relationship exist in other sports? Why yes it does! In hockey, 9% of all shots result in goals (on average). So, we could expect a team that had 30 shots to score 2.7 goals. Obviously you can’t score .7 goals but you can use 2.7 in the Bill James formula to calculate expected win percentage. Therefore, one angle could be to use player statistics to calculate the expected shots per game for a team. You could then tweak the shots on the high or low end to see how it changes a team’s performance. By fine tuning to shots and then extrapolating expected goals, you can limit the statistical variance in your calculations. You can even create one calculation to see the team’s low end of expected shots, another calculation to assess the high end. You could then see what a team’s win probability would be if the team achieves the low end of shots and if the team achieves the high end of shots. If both probabilities show value to the moneyline the book is hanging, you have a play! Obviously, we also have to work goalies into this equation but my point is you can analyze hockey similar to baseball… run it through the same probability and head to head formulas you use for baseball and get a money line calculation.
In football, if you take the yardage a team achieves on offense and divide by 15, you can get an approximation of their points. Obviously, the same applies to how many yards the defense allows. So, if you can put together a calculation to predict yards allowed and yards gained for a team, you can extrapolate points, which means you can further extrapolate win percentage (thanks again Bill James!). Now you have all the numbers you need to use the same head to head and probability calculations to compute moneylines in football. Don’t forget also, the example in Part IV of this article where I assess how Tom Brady would change the Tampa Bay Bucs performance. It is just another means of doing an NFL analysis to get points scored and allowed.
For basketball, shots and field goal percentages are the keys obviously. It doesn’t break down much further. Basketball is easier because it’s common to see shot attempts and the resulting FG% as part of standard stats posted for teams. The catch with basketball is working out a calculation to account for the variance that can occur with a team’s FG%. Unlike yards or hits, FG% can vary wildly from game to game. What determines it? Do teams tend to regress to the mean after a game they shot lights out? I tend to find that knowing the team’s FG% average range is the key and then you work from there. If a team is shooting on the low end of their average FG% and is playing a team that is predicted to shoot on the high end of their average FG%, BUT still shows value for the low % team based on the odds, you have a bet! You have a bet because in a worst case expected scenario, the under-performer is an over-performer based on the current line. The model builder will want to pull together stats, play around and work to unlock that code. I cannot give you the recipe because then everyone will have Grandma’s secret chocolate chip cookies. You wouldn’t want the recipe either, it would no longer be a secret. Play around and create your own recipe that nobody else uses and which gives you the edge on the odds! It’s a fun journey and trust me, when you get to the end and realize what you have discovered, it is like drinking from the Holy Grail after a long journey!
There are statistical relationships in all sports that are similar to those in baseball. The key is taking the time to investigate them. When you have your model setup to predict points/goals/runs for and against for other sports, then you just plug it into the Bill James formula and analyze it like you would an individual baseball game. No difference. It will give you what the probability of one team beating another which will then give you the moneyline for that game. You then wager based on the value the sportsbook’s lines provide. So, this part of the article covers adjusting to moneyline analysis for other sports… what about the spreads and totals?
The easiest way to handle a spread model is to create team power ratings. Yes, these are the Sagarin or KenPom type ratings everyone often uses. Most sportsbooks create their own power ratings to start a season and then tweak those ratings as the season goes along. How do you create your own version of power ratings? Again, I will do my best to consolidate a long topic into a short one. There is a short cut that can use to get you going in the right direction. I’ll use NFL. In the NFL it is usually considered among the bookmaking powers that be, that the difference between the top team in the NFL and the bottom team in the NFL is 20-21 points. Now that we know what the spread should be between top and bottom, we have to fill in the other 30 teams in between. We also have to figure out what single metric or combined calculation we will use to assess and thereby rank the quality of a team. Obviously, now you have to figure out a metric(s) that you feel correlates to how good a team performs. Once you calculate that metric for each team, you rank those teams and then set the top team as 20 points of separation away from your bottom team. Now you have ranked your teams. What does this look like in practice?
Let’s say you feel “Yards Per Point” is a good metric to decide how a team performs. As discussed above and which you can see in the table below, the average for the NFL is 15 yards per point. In reality, your calculation would use a number of different variables to add up to a single number which would then be used to rank each team. For the purposes here, I am just showing how you can use a combined or single metric, lay it out in a table, set the #1 team to a power rating of 1 and the worst team to a power rating of 21. You can really use any number scale so long as you then shake out your ratings in between with top to bottom separation being a 20-21 point spread. So, let’s just use 2019 YPP for lack of a more complicated alternative. Hopefully, the below example will illustrate what I am describing in creating power ratings. So, if you did this assessment using Yards Per Play as your metric, your power ratings might look like this (using TeamRankings.com YPP for 2019)…
You can now use these power ratings to calculate your spreads each week. As the season goes along, you would tweak these ratings based either on your calculated predictions or actual statistics. So, in this example, if San Fran and Oakland played on a neutral field, the spread should be San Fran -20. If San Fran was the home team, we would add 2-4 points depending on how we assessed their home field advantage. How did I come up with the separation between power ratings. I used 0.6 with teams having the same YPP having the same score. When I did, I did not quite get to 21 by the end of my rankings. So, I increased it to 0.8 of power rating separation and that worked like a charm. Thereby, my fancy technique was just trial and error for this analysis. Again, don’t base your power ratings solely on Yards Per Play. Also, the separation of power rating should be a little more thought out than my 0.8 trial and error just so things fit nicely for this illustration. For your metric you want to use a combination of factors that when combined create a variable for each team, rank the teams by your combined variable and then plug them into a table with the top team a 1 and the bottom team 20 or 21 depending on what you think works best for you. Then work on filling in the ratings in between. You will now have created a crude power ratings setup. Then through weekly analysis you will refine your power ratings and your power rating model. It might take a season to dial in how best to predict the metrics you want to use and thereby your power ratings, but it can be done.
Obviously the power ratings example above can apply to NCAA football, basketball and the NBA. You just need to figure out how you will rank the teams and what metric to use. Then it is a matter of figuring the proper top to bottom spread and filling in the gaps in between. Yes, like any of this, it is a process, takes time, will have you experience multiple failures along the way and be a lot of work. However, imagine carrying a list of numbers on your phone (in the case of NFL Power ratings) whereby you made your weekly adjustments and now you can just do some basic subtraction, create your point spreads and walk into the book to place your bets. It’s pretty cool. You have a secret weapon in your pocket and the book will never know what hit them!
So, that was a crash course on other sports modeling to create moneylines and to create power ratings for spread analysis. The question now becomes, how do you assess totals? Well, let’s use baseball for starters. Your baseball model has an individual game roster where you calculate expected runs scored and runs allowed based on a specific lineup. Yeah, you see where I am going and yes, it can be that easy. Let’s say you calculate that the Twins will score 5 runs and allow 5 runs against the Tigers who will score 3 runs and allow 4 runs. Well, that means on average the Twins will score 5 runs but Detroit is expected to only allow 4 runs. OK, the average between those two is 4.5. We then see the Twins are expected to allow 5 runs but Detroit will only score 3 runs. It gives us an average of 4 runs. Well, we now have an expectation for Detroit scoring 4 runs and Minnesota scoring 4.5 runs. We could reasonably extrapolate that we expect this total to be 8.5 using our metrics. If the total is 9, you have a value to the UNDER. Even just a half run in baseball totals, due to their size, is a good advantage. So, I have done nothing new to the MLB model we created earlier. I just used the same data put out by the individual game analysis used to calculate moneylines and calculated a total at the same time. Just remember that totals are affected also by weather/wind and umpires, so those variables need to make their way into your calculation. However, for analyzing totals in MLB, the bulk of the work is already done by your model.
What about totals in other sports? Again, the models you created to calculate points/goals scored & allowed for the other sports, so you could calculate individual game moneylines, is all you need for a start. You then need to assess the other factors (if any) that can affect totals beyond just players (like officials and weather). However, most of your work for totals is done because the basis for your model uses a calculation whose goal is to figure out win probability based on points scored and allowed! The transition therefore to totals calculation is quite easy!
I was asked about other books like “Trading Bases”. Many people have spoken highly about “Statistical Sports Models in Excel” and “Statistical Sports Models in Excel Volume 2” by Andrew Mack. I purchased and skimmed these books and they are quite cool. The books provide illustrations, formulas and analytical techniques for creating sports models. I plan to dive deeply into these books when I finish this article. However, the nice part of both of these books is, as they say in the title, they are all done in Excel. You don’t need any programming knowledge and Andrew Mack (the author) walks you through the process. Remember, no matter how good the models may be, any good model maker will not reveal all their secrets. We saw this with Peta and his “adjustments” in the book. So, like anything, the Excel books by Mack could provide amazing insight and tools to set you on the way. Ultimately however, unearthing the next Holy Grail of betting will rely solely on your ability to advance the thinking and assessment further than what is provided in any book. The books appear to be a great course on formulas to use in Excel, format for setting up your spreadsheet and how and why you would use specific analytical techniques. I am looking forward to really diving into both books in the weeks ahead. All setup and read to go in my Kindle app!
The final part of this article, Part XI, will be posted next Friday, May 1st, instead of on Tuesday as has been the usual schedule. I want to allow a decent amount of time for people to catch-up in their reading, spend time creating their model and work on everything in between. Then by Tuesday or Wednesday you can submit any questions. So, your homework for Part XI is to think of any questions you have from this article and email them to me (you can also use the form on the Contact page here on the website). I will use Part XI to cover some common questions and to wrap up everything within this article. Thank you for sticking with it for so long and I hope the information will be useful to you! Have an excellent weekend and I will see you back here next Friday night!
Part XI – Wrapping It All Up
Well, we have come to the end of our journey! I hope you have enjoyed the article. It has definitely helped pass some quarantine time for me and I hope the same is true for you. The topic of building an algorithm is something I have been asked to cover for over a year now. Usually, there was so much going on that finding the time just didn’t work out. Well, thanks to quarantine, I have had plenty of time the past month. I hope this article will help you in your algorithm adventures moving forward! Yes, I will be keeping the article permanently on the Articles page here at the website, so you can refer to it anytime.
In Part X, I asked for any questions you might have had on the process or the topic of algorithms. I received many questions along the way and have answered a lot of them in different parts of the article as it developed. So, I wasn’t sure if I would get any. However, I was pleasantly surprised that there were a few, so let me jump into those…
Do we need to use power ratings or can we just use a moneyline conversion when dealing with point spreads?
While there are of course benefits to using power ratings (at a minimum it provides a different technique and view on the event), moneyline to spread conversion works just fine. You can simply look at the moneyline the book is hanging and compare it to the calculated moneyline from your model. If the moneyline the book is hanging is way off from your model’s calculated moneyline, in theory, so is the spread. At that point, you can then make the decision as to whether taking or laying the points is better than betting the game on the moneyline. How can you tell which is better? There’s unfortunately no shortcut nor substitute to live assessment. Until you see how your moneyline model performs and track how those same teams fared ATS, you will only be speculating. So, the short answer to this question is… you do not need to create another model to analyze point spreads. Simply compare your moneyline to the book’s moneyline and if there is a large difference, chances are the spread is also a value for the team your model shows as having value on the moneyline. Then you need to assess live performance of the ATS wagering your model showed using the moneyline to moneyline comparison. Keeping detailed records is vital to proper model analysis down the line.
Should we be concerned about sports with a smaller sample size? Is it harder to assess football where teams only have 16 games in a season versus baseball which has 162 games?
Obviously, in any statistical assessment, the more data you have the better you’re able assess your model. However, I would not let the smaller sample size dissuade you from creating an NFL model, BUT I would be sure to temper my expectations. In baseball, when you have a model that might have had 300-500+ plays over the course of a season, you have a nice sample size to say whether your model is good or not. In football, with maybe 50 plays, you have to be prepared for the possibility that your results were just luck (good or bad). So, don’t give up after ten NFL plays that go 0-10. Look at my 1st Quarter algorithm this past year in the NFL. The algorithm sucked at the beginning of the season, but by the time the season was done, everyone was asking for the 1st Quarter algorithm each week. It became automatic! So, don’t give up, but also don’t celebrate based on the results of a small sample size. Again, creating a model takes patience, time and effort… three things many gamblers do not want to involve in their gambling (or lives). Most bettors just want to quickly come up with a play, bet money and win money and live the leisurely life. I have yet to find a successful gambler, beyond short term luck, who was profitable, but did not put in a ton of time nor pay their dues along the way. Trust me though, if you work hard at this, you will be rewarded with the coolest toy ever! You will have a model to predict events that you bet on. Doesn’t get any better!
How do we create a model for live or in-play betting? Should we allow our pre-game model assessment to play a part in how we bet in-play?
We have a very interesting topic here and hardly one I can answer in just a few sentences, but I’ll try! The key to any in-play betting is to assess where the teams stand at that specific point in the game and if the performance up to that point in the game is more attributed to luck or skill. Whether you have a model you are basing your assessment on or you are just making a fun in-play bet, you need to know why one team is winning… skill, luck, injuries, bad call, turnovers, etc. versus the other. Therefore, you want your in-play model to look at the stats pregame, to provide you an initial expectation, and you want your model to also have a means to analyze the actual stats up to that point in the game. You then want your model to give you an assessment as to whether the stats up to that point support the current result. If they do not, the losing team is probably a very big value. Rarely is the winning team a value only because the public typically piles on the team that is winning at the time. I am not saying there is never value on a winning team. It’s just that winning team value is very limited and rare compared to losing team value. What does an in-play assessment look like? Let’s say Team A is winning a football game 14-0 after 5 minutes of the first quarter, with the other team beating them heavily in yardage. You could assume Team A is winning due to getting a disproportionate amount of luck. Obviously there is more involved to make such an assessment, this is just a simple illustration. Let’s also say your model going into this game showed Team B as a value. Because of the current performance in the game, you will have a price on the losing team (Team B) that is not respective of their actual performance. Instead the price is based solely on the score. Often the score IS NOT reflective of true performance in a game. Score is only one quantification of performance. Granted the score is what matters to the public bettor, but to the professional bettor, how teams achieved that score is much more important. If a team won every category but score, yes they lost that day but clearly their actual team quality is higher than the score quantification provided. Which means the losing team may be vastly undervalued in their next game (or in the case of in-play, for the rest of the game) and provide a great betting opportunity. Back to the example, you had Team B as a value coming into the game from your model. During the game Team B went down 14-0, but it appears due to bad luck. So, now you almost get a double shot of value on Team B in-play. A double shot of value because first, they were a pregame value and now they are an in-play value due to bad luck. Now the question becomes how do you bet it? It depends on your bankroll and pregame betting. However, allowing yourself to add another half a unit or unit in-play is not a bad idea. At most, you are putting 2 units on a single bet and that is hardly reckless. Especially if you have a winning and performing model. In which case your 2% wager is a smart investment because it is on a calculation that is +EV long-term. So, the above is the theory to proper in-play betting. Now you want to figure the variables that would help you figure out if the winning team at the time is lucky or good. Luck runs out (if not in that game, in subsequent games), cream always rises to the top.
How should we split our bankroll between futures and daily betting?
I would usually say you’d want to have, at most, 10-15% dedicated to futures bets. Since futures bets are based 100% on predicted data, rendering you unable to utilize in-season data (actual performances), you don’t want to be over leveraged. You want the vast amount of your bankroll to be free to allow you to adjust your predictions as the season goes and take advantage of those actual results. You and your model will become more intelligent as you see how the players and teams actually perform that season. Therefore, you want the bulk of your bankroll free to be able to attack as you adjust.
Let’s say we have built our model to do NFL season projections and come up with win totals. How should we think about calculating odds to assess who will win a division?
Basically, when you break down the expected win percentages of each team going into the season, you would just separate the teams by division and order them by win percentage. You then have an expected order of performance in each division. Now, you have four teams in a division. If all teams were even, they would each have a 25% chance of winning the division. True odds would be +300 in such a scenario. Since all teams are not even, what do we do? There are some incredibly advanced mathematical ways to calculate the odds of winning a division. However, I fear I might lose too many people and it won’t be helpful. It requires analyzing the performance with emphasis on division games. So, let’s simplify it a little. We have already calculated our expected season win percentage for each team. We have then separated out the teams by division and ranked them in order of our calculated win percentage. I would then say to calculate how many wins that means each team will have. Your spreadsheet should look something like this…
|Overall Win Pct. (model)
In this example our calculated win percentage has Miami at the top of the AFC East with a calculated win percentage of 65%. It means Miami is theoretically expected to win 10.4 games, Buffalo next with 8.8 wins, NY Jets with 5.6 wins and New England with 2.4 wins. Clearly, our model shows Miami head and shoulders above the rest of the teams with a 1.6 win advantage over Buffalo. In a normal scenario, the average price for a division favorite is +120. To calculate odds for the other teams you then move that +120 price up roughly $0.50 for every 0.75 wins the top team is calculated to be ahead of the next team. So, in the example above, Miami would be roughly +120, Buffalo +220 (1.6 wins behind = roughly +$1.00), NY Jets +440, New England +653. Again, this is not a perfect calculation, but it is a much faster shortcut to put you in the ballpark for a valid assessment. If you calculated the above performance for the AFC East and your book had Miami +180, obviously that’s outside a 50 cent variance (minimum I would consider using this method) and you have a value!
How do we go about creating a golf model?
It’s funny how many people asked me about creating a golf model this past week. Again, it’s tough for me to answer such a complicated question with a few sentences. Instead, for those interested, I would turn to our textbook author for this article… Joe Peta. Joe Peta actually wrote a book “Joe Peta’s Tour Guide Presents A 2019 Masters Preview” about golf analytics. I have not yet read the book, but I actually just purchased it for my Kindle app the other day. Peta also has added a 2020 addendum to the book which can be found on his Twitter page at (https://twitter.com/MagicRatSF/status/1238249168633765891?s=20). I read the few pages that Amazon puts up to preview the 2019 book and it gets fully into golf analytics and tournament assessment. At a minimum, I would assume it would get you going on your golf model journey. Hopefully, the book will take you even deeper. Once you know the analytics that are key to performance, then the next step becomes getting those analytics for each player, ranking the players by either that analytic or a combination of analytics, then calculating their overall chance to win the tournament. Once you have their probability to win the overall tournament, you can use that calculation to assess individual match-up bets (similar to using the win probabilities in the head to head formulas we used above for a baseball game). First things first though, take a look at Peta’s book, learn about the key analytics and his method of breaking them down, then go from there. I am sure you will find a few of your own tweaks to Joe’s methods along the way too. I mean after all, at this point you have become a modeling master! 😉 😀
Well my friends, that’s it! The article “Designing An Algorithm (A Step by Step Guide)” has come to a conclusion. I wish all of you the best of luck in your model development and in your action. I am sure in the months ahead I will have other articles that will connect back to this one. Obviously, algorithms and model building are key to what I do. So, the topic will definitely continue into the future in various ways. Thanks for following along with me here and if you have questions, you know I am happy to help! Just send an email or use the form on the Contact page here at the website. Stay safe & healthy! Good luck in your model creation journey and in your action!