| Abstract |
This paper describes a system and method that can be used to order teams involved in a sport that uses unbalanced schedules in the same manner as teams' records expressed as winning percentage is used for sports with balanced (round-robin) schedules. The rating is called Performance Against Strength of Schedule, and is based upon assigning values to each chain of head-to-head(-to-head) wins.
|
|---|---|
| Contents |
This is the result of a study undertaken to determine why the formula used by the National Collegiate Athletic Association (NCAA) to rank basketball and baseball teams seemed to work well for the former but not for the latter. In the process, we (the author with much assistance from Boyd Nation) characterized the differences between the scheduling topology of the two sports and identified a system that was immune to the problems we’d identified.
It turned out that in the process we invented a very general method that applies to any competition that does not involve a “round robin” schedule to approximate what the winning percentages would have been had a round-robin schedule been played.
Background
If all teams contending for a championship played a balanced schedule then winning percentage could be used to approximate the probability that a given team would win a contest with a team chosen at random from the field. When that value is known for all teams, the probability that a team i will win a contest with a specific team j is given by:
|
If every team played every other team an equal number of times at a neutral site, winning percentage would be the appropriate choice for the probability values. But there are 52975 pairs of teams in division one basketball, and only 27 games allowed per team. In division one baseball, there are 41,041 team pairs, and 56 games allowed per team. So we define the objectives for the study as:
|
|
[ W × V ]* is just the value of a win over team i according to the characterization by V of team i's opponents, and PA characterizes a team by its performance against that characterization of its opponents.
|
PAi is a measurement of how well team i has performed, and dSOSi is a measurement of how well team i's opponents have performed.
The value vector V is arbitrary: it defines a property that can be passed along the chain of head-to-head wins from loser to winner. If V i = 1, then dSOS is essentially Opponents' Winning Percentage and PA is a combination of Winning Percentage and OWP.
|
With that definition of V, the Performance Against SOS equations can be written as:
|
The intermediate term [ W × V ]* is given a unique value for reporting purposes. It is the value of a win over team i in the PASOS measurement, and that would be a useful input to a team's scheduling process.
Both aSOS and dSOS consist of elements that are a "percentage of a percentage", and PASOS is a percentage of a percentage of a percentage, so to translate the values of PASOS into the same range as Winning Percentage we define adjusted Winning Percentage as:
|
|
Only the dSOS value applies to the team corresponding to the winning percentage; the aSOS is used to calculate the teams' opponents' PASOS (aWP) values. Plotting dSOS and WP against PASOS rank so that teams with the same winning percentage are separated:
The PASOS uses only wins and losses (criterion 1); a winless team has a zero value since SOS is weighted only by wins (criterian 2); and an undefeated team has just PASOS = DSOS (criterion 3).
It is worth noting that the PASOS' definition of schedule strength is not very different from the RPI's. Although there is no fixed ratio that applies to every team, overall the contributions to DSOS are roughly 2/3 opponents' winning percentage and 1/3 OOWP (plus a smaller fraction of OOOWP). The most important difference is that there are no duplications in the PASOS definitions of WP, OWP, and OOWP.
The effect of a team's winning percentage on its own SOS in the RPI definition is clearly visible when compared to the PASOS definition with no duplication. The difference between the trend lines indicates the degree to which a team "played itself", and at the lower values lost to itself!
|
then use the probability formula to calculate the expected winning percentage for the favored team.
The motivation for this study was to discover why the RPI appears to work very well for basketball but not as well for baseball, so the first step was to quantify the errors in the RPI's SOS for basketball.
The RPI is more accurate for basketball than baseball because there is less duplication overall between opponents and opponents' opponents, so the errors in the OWP and OOWP calculation tend to be in the same direction. In basketball, the errors re-inforce each other, and in baseball the errors cause OWP and OOWP to partially cancel each other. A conjecture would be that this due to there being more teams (326 compared to 287) and fewer games (27 compared to 56) in basketball.
The second step was to apply the PASOS to the 2003-04 basketball season.
The combination of winning percentage and schedule strength is the same, and we note that the smaller range for the SOS function is another indication of the difference in the sports' schedule topologies.
For basketball, the correlation between expected winning percentage by higher ranked teams to actual winning percentage is not quite as strong as for baseball.
| ||||||||||||||||||||||||||||||||||||||||||||||||
Here we've assigned a value to a win based upon the opponent's rank: 1 for a win against a top 25 team, 1/2 for a top 50 opponent that's not a top 25, 1/4 for a top 100 team opponent that's not in the top 50, 1/8 for a win vs a team in the 101-200 range, and zero for a win over a team in the 200+ range. The difference between this report and the PA is that we also count losses: 1/8th for each loss to a top-25 team, 1/4th for a loss to a top 50 team, 1/2 for a loss to a top 100 team, and 1 for each loss to a +100 team. All teams can be ordered by this metric.
A more precise ordering can be obtained by setting V to be the rating used to produce such a report and applying the Performance Against algorithm. Instead of a range of opponents' values such as 1-25, we just use the exact value for each opponent. For the RPI:
Applied to the PASOS itself, the result is:
Using the Performance Against algorithm to re-rank teams can provide a ranking that is independent of the SOS definition used by the system. The ordinal ranking by adjusted Winning Percentage doesn't look much like the ordinal ranking by RPI value:
When a team's ranking is adjusted based upon its wins against teams ranked higher or lower than it combined with its opponents' adjusted rating based upon wins over teams ranked higher or lower than the opponents, relative orders become much more nearly identical. In other words, while the RPI SOS and PASOS SOS are different, the dSOS provided by PA(aWP) and PA(RPI) wind up being almost the same despite the different rating systems, especially in the important rankings (the top 50). The same results obtain when a different rating system is used (in this case Boyd Nation's Iterative Strength Ratings (ISR):
The same relationship holds for PA(RPI) Rank as a function of PA(ISR).
The PA(ISR) doesn't correlate as well to the RPI or aWP via the PA function because its value doesn't depend only upon winning percentage and SOS. More precisely its version of Winning Percentage and therefore SOS are calculated using an adjustment that accounts for home field advantage. The RPI correlations are also less than perfect because of the bonuses and penalties that are included in addition to Winning Percentage and SOS.
See this conjecture regarding the mathematical reason the PA function improves the correlation between arbitrary rating systems.
The last property of the generalized Performance Against method provides a mechanism for comparing systems that incorporate factors other than wins and losses. Typically these involve game location (as the ISR does) and/or margin of victory.
Summary
Because only wins can contribute to a team’s rating, each game can contribute only once. The reason the RPI appears to be more accurate in basketball is that the errors in OWP and OOWP tend to be in the same direction.
An opponent's aSOS explicitly defines the value of a win over that team. This means that the effect of any particular win is easily visible, quantifying the concept of a "quality win".
The PA is a generalization of the "gory details" reports used within rating systems to qualify a team's ranking. Instead of "top 25" that varies among systems, it uses the actual values within the system for each opponent.
References and Credits