Basketball upsets, high school genetics, and personal genomics In 2007 the eighth seeded Golden State Warriors defeated the number one seeded Dallas Mavericks, which is the only time an eighth seed has ever beaten a one seed in a best of seven series, and only the second time overall. For several reasons this is considered one of the biggest upsets in basketball history. The Mavericks were a perennial favorite to make deep playoff runs and were in the NBA finals the previous year, which by many accounts they should have won. What's more, the Mavericks won 67 games during the regular season, one of the highest regular season win totals in history, while the Warriors barely played .500 basketball and made it into the playoffs on the last day of the season after a decade of futility. Professional basketball is one of the more predictable sports because the number of possessions per game and per seven game series gives the better team time to exert its advantages. So was the Warrior's victory a statistical aberration the likes of which we have never seen before and have not seen since, or is there another explanation? Basketball is also different from other sports in that teams have very different playing styles and can encounter "bad matchups". Also, the general public and even NBA analysts tend to let a team's reputation over the last couple years1, and record during the course of a season unduly influence a team's chances of winning. It doesn't matter how a team has played the last ten years, all the matters is how well the team is predicted to play the next seven games. A closer examination of the circumstances of both teams in this particular series will make a Warriors victory not only seem less unlikely, but fairly likely. For starters, the Warriors beat the Mavericks in all three meetings during the regular season. Come playoff time, regular season series records are often thrown out either due to the small sample size (a valid point), players not playing because of rest or injury (also valid), or playoff basketball being different than regular season basketball2 (also fairly valid). However, the Mavericks only lost 15 games during the regular season, so the Warriors alone accounted for 1/5 of their losses. In addition, the Warriors second victory of the season over the Mavericks ended their 17 game win streak, and I have to assume the Mavericks were trying to win that game. With this information it doesn't seem completely impossible that the Warriors might win four out of seven games, but there's more. It was a well kept secret that after the midseason trade for Stephen Jackson the Warriors were not the same old team they had always been, but were statistically playing some of the best basketball in the league. In the meantime the Mavericks were twiddling their thumbs waiting for the playoffs to start. What's more, Don Nelson, the coach of the Warriors, was the former coach of the Dallas Mavericks and knew that the star player of the Mavericks, Dirk Nowitzki, struggled when guarded by smaller players. With all of this additional information betting on the Warriors would have been a safe bet, who where 11 to 1 to win the series, and in fact one of the most famous sports gamblers, Harabolos Voulgaris, did just that and calls it his greatest gambling moment. As we just saw, additional circumstances can change what is viewed as an assured outcome. A similar thing is seen in an introductory genetics course in the form of epistasis. A gene may typically have a certain effect on phenotype, but when another gene is present the expected phenotype is affected. A famous example is hair color. A person may have the gene for dark colored hair, but if they don't have the necessary protein to transfer the pigment to the hair follicle the person will be an albino. While this is a harmless result, it's not hard to imagine scarier scenarios. Researchers are now fairly certain the human genome contains less than 30,000 protein coding genes, but we don't know the function of the majority of these or all the possible ways they interact. In addition, the genome contains an unknown number of non-protein coding genes. With the advent of whole genome sequencing we can find associations between certain genes and diseases, and potentially identify patients who are at high risk. It is fairly harmless to tell a person predicted to have a predisposition to heart disease to eat healthier and take statins, but taking preventive measures for breast cancer is a different story. Although a high percentage of patients may develop a disease with a certain genotype, because of epistasis a patient with this high risk gene may actually have no risk or even have less risk than the general population. As a result, predicting disease through personal genomics will always be an inexact science just like sports betting until we have all the information, and don't expect this any time soon. 1. Mark Cuban recently said that the Clippers can change their owner and players but they will always be the same old Clippers (this is a bad thing). People have pointed out how ridiculous that sounds, but these are the same people that likely thought the Warriors were the same old Warriors in 2007. 2. This is one of the favorite things for talking heads to say come playoff time, and while I would normally be as nervous as a long-tailed cat in a room full of rocking chairs to take their statements to the bank, here they do have a point. NBA players pace themselves during the regular season because of how grueling an 82 game schedule is. Whether this is due to laziness or an attempt to be fresh during the playoffs and avoid injury depends on who the player is and who you ask. As a result, the intensity of playoff basketball is much different. Shots that were once open are now heavily guarded. What used to be an easy fastbreak is now a hard foul. Furthermore, star players play more minutes in playoff games since they are must win games. While a poor bench may affect your regular season record, it has less of an impact in the playoffs if your starters can play over 40 minutes per game.