System and Method for Rating Teams In Sports with Unbalanced Schedules

With Application to Arbitrary Rating System Analysis

© Copyright 2004, Paul Kislanko

Abstract This paper describes a system and method that can be used to order teams involved in a sport that uses unbalanced schedules in the same manner as teams' records expressed as winning percentage is used for sports with balanced (round-robin) schedules. The rating is called Performance Against Strength of Schedule, and is based upon assigning values to each chain of head-to-head(-to-head) wins.

Contents
  1. Background
  2. Principles
  3. Case Study: The Ratings Percentage Index
  4. The Performance Against Algorithm
  5. PA-Strength of Schedule and the adjusted Winning Percentage
  6. Validation
  7. Generalization
  8. Summary and Credits

This is the result of a study undertaken to determine why the formula used by the National Collegiate Athletic Association (NCAA) to rank basketball and baseball teams seemed to work well for the former but not for the latter. In the process, we (the author with much assistance from Boyd Nation) characterized the differences between the scheduling topology of the two sports and identified a system that was immune to the problems we’d identified.

It turned out that in the process we invented a very general method that applies to any competition that does not involve a “round robin” schedule to approximate what the winning percentages would have been had a round-robin schedule been played.


Background

If all teams contending for a championship played a balanced schedule then winning percentage could be used to approximate the probability that a given team would win a contest with a team chosen at random from the field. When that value is known for all teams, the probability that a team i will win a contest with a specific team j is given by:
P × ( 1 - Q )

( P × ( 1 - Q ) + Q × ( 1 - P ) )
where P is the probability that team i wins a game against a randomly-selected team and Q is the probability that team j wins against a randomly-selected team.

If every team played every other team an equal number of times at a neutral site, winning percentage would be the appropriate choice for the probability values. But there are 52975 pairs of teams in division one basketball, and only 27 games allowed per team. In division one baseball, there are 41,041 team pairs, and 56 games allowed per team. So we define the objectives for the study as:

What we’d like to do is find an adjustment to winning percentage that characterizes the games that do get played by how representative the opponents are of the field as a whole. This is usually done by defining some measurement of schedule strength to combine with winning percentage in some way. It would be natural to require the adjusted winning percentage to have the same properties as winning percentage, and for the adjustment due to SOS to share those properties.


Principles

  1. The input to the algorithm should be only number of wins and losses
    The reason we only want to use wins and losses is because we want the rating system to have the same properties as winning percentage, and that’s all that is used to calculate it.
    WP =W

    ( W + L )
  2. A winless team should have a zero value
    This follows from the fact that until a team beats another team, there is no evidence that it should be ranked higher than any team
  3. An undefeated team should have a value proportional to the “quality” of its opponents
    This is why there are so many ranking systems - there are an infinite number of ways to characterize the “quality” of teams against which their Winning Percentage was achieved

Of course, there’s a continuous range between winless and undefeated, but a rating system that has properties that conform to both principle 2 and principle 3 will necessarily have some means of combining them to determine intermediate values to determine the value of a win by team i over team j.


Case Study: The Ratings Percentage Index

The RPI is usually defined as the sum of 25 percent of a team's winning percentage (WP), 50 percent of the team's opponents' winning percentage (OWP), and 25 percent of the team's Opponents' Opponents' Winning Percentage (OOWP). Sometimes it is described as 25 percent winning percentage and 75 percent strength of schedule, with SOS defined as 2/3 × OWP + 1/3 × OOWP.

There are several problems with this definition as an alternative to Winning Percentage for ranking teams. The most obvious is that it fails criterion 2, since 75 percent of the formula is based upon how well other teams have performed and criterion 3 because the "quality" of an undefeated opponents is not based upon how well its opponents have performed. That eliminates it as a useful measure of probability that a team would win against a lower-ranked team, but even if that were corrected there still would be problems because of the way OWP and OOWP are defined.

The RPI definition of OWP is a weighted average of the opponents' winning percentages (where opponents' winning percentages do not include games between the team and the individual opponent). The weight is the number of games played agaist the opponent, so just playing a game adds to the team's OWP by the RPI definition.

This would be equivalent to defining a batting average in baseball by the average of the BA for each game played. A 0 for 5 day followed by a 3 for 4 day would give (.000 + .750) = .375 instead of 3 for 9 = .333. In basketball, a player in a 3-game tournament who hits 2 of 10 shots, then 3 of 6, then 4 of 10 would have a shooting percentage of (.200 + .500 + .400)/3 = .433, when in fact for the tournament she was 9 for 25 = .360.

There's no other formula in all of sports statistics that makes this mistake.

True OWP vs RPI OWP

"True OWP" is just the total of opponents' wins minus the team's losses divided by the total of opponents' games played minus the team's games played - in other words, a true percentage. The errors are not very large in the RPI's OWP component, but they become significant when used in the RPI definition of OOWP.

The RPI's definition of OOWP is the average opponents' OWP's. This doesn't take into account that the team's OOWP includes the records of the team itself and the records of teams that are also included in the OWP.

True OOWP vs RPI OOWP
The influence of winning percentage duplication in the RPI's definition of OOWP is obvious. The "true opponents' opponents' winning percentage" in this graph does not include the team for which OOWP is being calculated, but the RPI's average of opponents' OWP's does, so the RPI's version of OOWP is off by a varying amount that depends upon how often the team's WP is included as its own "opponent's opponent's winning percentage." The significance of the error in the calculation becomes clear when when we observe that the combination of WP and OOWP in the formula carries as much weight as the OWP portion of the RPI.
RPI as WP+OOWP averaged with OWP

Most of the teams that have a high RPI ranking have a high RPI because their WP+OOWP is high, and their OOWP is only high because it includes the team's WP in its OOWP.

The RPI fails to pass criteria 2 and 3 just becaue a team's ranking depends more upon how their opponents have done against other teams. It turns out that the RPI also doesn't conform to criterion 1, since the RPI values depend more on the schedules than the wins and losses.


The Performance Against Algorithm

For any column vector V that assigns value Vi to each team i, let square matrices W and G be defined by Wi,j = wins by team i over team j, and Gi,j = games played by team i against team j and define [ result ]* to mean resulti is divided by the number of games played by team i.

Then
PA = [ W × [ W × V ]* ]*
defines the PA algorithm.

SOSV = [ G × V ]*
is equivalent to team i's strength of schedule with respect to rating V.

[ W × V ]* is just the value of a win over team i according to the characterization by V of team i's opponents, and PA characterizes a team by its performance against that characterization of its opponents.

dSOS = [ G × [ W × V ]* ]*

PAi is a measurement of how well team i has performed, and dSOSi is a measurement of how well team i's opponents have performed.

The value vector V is arbitrary: it defines a property that can be passed along the chain of head-to-head wins from loser to winner. If V i = 1, then dSOS is essentially Opponents' Winning Percentage and PA is a combination of Winning Percentage and OWP.

(aPA is a 1:1 mapping of PA into the same range as winning percentage as is described below.)


PA-SOS and the adjusted Winning Percentage

When V is defined as Vi = OWPi, the result is the Performance Against Strength of Schedule. Opponents' Winning Percentage is defined as:
OWPi=( (∑ team i's Opponents' Wins) - Team i's losses )

( (∑ team i's Opponents' Games) - Team i's games )

With that definition of V, the Performance Against SOS equations can be written as:
aSOS =  [ W × OWP ]*
dSOS =  [ G × aSOS ]*
PASOS =  [ W × aSOS ]*

The intermediate term [ W × V ]* is given a unique value for reporting purposes. It is the value of a win over team i in the PASOS measurement, and that would be a useful input to a team's scheduling process.

Both aSOS and dSOS consist of elements that are a "percentage of a percentage", and PASOS is a percentage of a percentage of a percentage, so to translate the values of PASOS into the same range as Winning Percentage we define adjusted Winning Percentage as:
aWPi = 3√PASOSi
and
dSOSPi = √dSOSi

Only the dSOS value applies to the team corresponding to the winning percentage; the aSOS is used to calculate the teams' opponents' PASOS (aWP) values. Plotting dSOS and WP against PASOS rank so that teams with the same winning percentage are separated:

The PASOS uses only wins and losses (criterion 1); a winless team has a zero value since SOS is weighted only by wins (criterian 2); and an undefeated team has just PASOS = DSOS (criterion 3).

It is worth noting that the PASOS' definition of schedule strength is not very different from the RPI's. Although there is no fixed ratio that applies to every team, overall the contributions to DSOS are roughly 2/3 opponents' winning percentage and 1/3 OOWP (plus a smaller fraction of OOOWP). The most important difference is that there are no duplications in the PASOS definitions of WP, OWP, and OOWP.

The effect of a team's winning percentage on its own SOS in the RPI definition is clearly visible when compared to the PASOS definition with no duplication. The difference between the trend lines indicates the degree to which a team "played itself", and at the lower values lost to itself!


Validation

The algorithm was validated by correlating the expected winning percentage to actual winning percentage and by applying it to a different sport (basketball).

For a game between team i and team j let
P = aWPi  ;   Q = aWPj

then use the probability formula to calculate the expected winning percentage for the favored team.

The correlation would be higher if the intervals were changed to include an equal number of games, and the graph would be nore nearly logarithmic.

Application to Basketball

The motivation for this study was to discover why the RPI appears to work very well for basketball but not as well for baseball, so the first step was to quantify the errors in the RPI's SOS for basketball.

The error in the RPI formula for OWP is actually slightly higher in basketball than for baseball, but overall the character is the same.

There's actually much less of an error in the RPI's OOWP calculation, but the influence of a team's winning percentage on its own OOWP is still visible.

The RPI is more accurate for basketball than baseball because there is less duplication overall between opponents and opponents' opponents, so the errors in the OWP and OOWP calculation tend to be in the same direction. In basketball, the errors re-inforce each other, and in baseball the errors cause OWP and OOWP to partially cancel each other. A conjecture would be that this due to there being more teams (326 compared to 287) and fewer games (27 compared to 56) in basketball.


The second step was to apply the PASOS to the 2003-04 basketball season.

The combination of winning percentage and schedule strength is the same, and we note that the smaller range for the SOS function is another indication of the difference in the sports' schedule topologies.

For basketball, the correlation between expected winning percentage by higher ranked teams to actual winning percentage is not quite as strong as for baseball.


Generalization

A team's record against the top or bottom 25, 50, or 100 is often used as an additional metric within a rating system using a report such as:
Team PAS vs PAS

Road
vs
Home
vs
Rank 1-2526-5051-100101-200200+1-100101+
Texas 1 10-3 10-9 10-1 20-0 0-0 7-4 9-0 16.875
Wins values 10 52.5 2.50.0 20.0
Losses values -.375 -2.25-0.5 -0.0-0.0 -3.125

Here we've assigned a value to a win based upon the opponent's rank: 1 for a win against a top 25 team, 1/2 for a top 50 opponent that's not a top 25, 1/4 for a top 100 team opponent that's not in the top 50, 1/8 for a win vs a team in the 101-200 range, and zero for a win over a team in the 200+ range. The difference between this report and the PA is that we also count losses: 1/8th for each loss to a top-25 team, 1/4th for a loss to a top 50 team, 1/2 for a loss to a top 100 team, and 1 for each loss to a +100 team. All teams can be ordered by this metric.

A more precise ordering can be obtained by setting V to be the rating used to produce such a report and applying the Performance Against algorithm. Instead of a range of opponents' values such as 1-25, we just use the exact value for each opponent. For the RPI:

Applied to the PASOS itself, the result is:

Using the Performance Against algorithm to re-rank teams can provide a ranking that is independent of the SOS definition used by the system. The ordinal ranking by adjusted Winning Percentage doesn't look much like the ordinal ranking by RPI value:

but the PA(RPI) and PA(aWP) rankings are nearly the same:
(Note that since these are ordinal rankings "better" is lower.)

When a team's ranking is adjusted based upon its wins against teams ranked higher or lower than it combined with its opponents' adjusted rating based upon wins over teams ranked higher or lower than the opponents, relative orders become much more nearly identical. In other words, while the RPI SOS and PASOS SOS are different, the dSOS provided by PA(aWP) and PA(RPI) wind up being almost the same despite the different rating systems, especially in the important rankings (the top 50). The same results obtain when a different rating system is used (in this case Boyd Nation's Iterative Strength Ratings (ISR):

shows the base correlation, but when PA(aWP) rank is compared to PA(ISR) rank we get

The same relationship holds for PA(RPI) Rank as a function of PA(ISR).

The PA(ISR) doesn't correlate as well to the RPI or aWP via the PA function because its value doesn't depend only upon winning percentage and SOS. More precisely its version of Winning Percentage and therefore SOS are calculated using an adjustment that accounts for home field advantage. The RPI correlations are also less than perfect because of the bonuses and penalties that are included in addition to Winning Percentage and SOS.

See this conjecture regarding the mathematical reason the PA function improves the correlation between arbitrary rating systems.

Summary

The last property of the generalized Performance Against method provides a mechanism for comparing systems that incorporate factors other than wins and losses. Typically these involve game location (as the ISR does) and/or margin of victory.

References and Credits

  1. Frequently Asked Questions includes an example PASOS calculation.
  2. The study was suggested and guided by several articles published by Boyd Nation, without whose comments and suggestions it could not have been completed.
    1. A Look at the Distance Matrix August 15, 2000
    2. Theoretical Winning Percentage, Part I August 22, 2000
    3. The Real 2000 RPI's October 17, 2000
    4. There are other Sports? April 1, 2003
  3. Much of this material was previously published on SEbaseball.com.
  4. Game results for the studies were obtained from: