Abstract 
This paper describes a system and method that can be used to order teams involved in a sport that uses unbalanced schedules in the same manner as teams' records expressed as winning percentage is used for sports with balanced (roundrobin) schedules. The rating is called Performance Against Strength of Schedule, and is based upon assigning values to each chain of headtohead(tohead) wins.


Contents 
This is the result of a study undertaken to determine why the formula used by the National Collegiate Athletic Association (NCAA) to rank basketball and baseball teams seemed to work well for the former but not for the latter. In the process, we (the author with much assistance from Boyd Nation) characterized the differences between the scheduling topology of the two sports and identified a system that was immune to the problems we’d identified.
It turned out that in the process we invented a very general method that applies to any competition that does not involve a “round robin” schedule to approximate what the winning percentages would have been had a roundrobin schedule been played.

If every team played every other team an equal number of times at a neutral site, winning percentage would be the appropriate choice for the probability values. But there are 52975 pairs of teams in division one basketball, and only 27 games allowed per team. In division one baseball, there are 41,041 team pairs, and 56 games allowed per team. So we define the objectives for the study as:

The RPI is usually defined as the sum of 25 percent of a team's winning percentage (WP), 50 percent of the team's opponents' winning percentage (OWP), and 25 percent of the team's Opponents' Opponents' Winning Percentage (OOWP). Sometimes it is described as 25 percent winning percentage and 75 percent strength of schedule, with SOS defined as 2/3 × OWP + 1/3 × OOWP.
There are several problems with this definition as an alternative to Winning Percentage for ranking teams. The most obvious is that it fails criterion 2, since 75 percent of the formula is based upon how well other teams have performed and criterion 3 because the "quality" of an undefeated opponents is not based upon how well its opponents have performed. That eliminates it as a useful measure of probability that a team would win against a lowerranked team, but even if that were corrected there still would be problems because of the way OWP and OOWP are defined.
The RPI definition of OWP is a weighted average of the opponents' winning percentages (where opponents' winning percentages do not include games between the team and the individual opponent). The weight is the number of games played agaist the opponent, so just playing a game adds to the team's OWP by the RPI definition.
This would be equivalent to defining a batting average in baseball by the average of the BA for each game played. A 0 for 5 day followed by a 3 for 4 day would give (.000 + .750) = .375 instead of 3 for 9 = .333. In basketball, a player in a 3game tournament who hits 2 of 10 shots, then 3 of 6, then 4 of 10 would have a shooting percentage of (.200 + .500 + .400)/3 = .433, when in fact for the tournament she was 9 for 25 = .360.
There's no other formula in all of sports statistics that makes this mistake.
"True OWP" is just the total of opponents' wins minus the team's losses divided by the total of opponents' games played minus the team's games played  in other words, a true percentage. The errors are not very large in the RPI's OWP component, but they become significant when used in the RPI definition of OOWP.
The RPI's definition of OOWP is the average opponents' OWP's. This doesn't take into account that the team's OOWP includes the records of the team itself and the records of teams that are also included in the OWP.
Most of the teams that have a high RPI ranking have a high RPI because their WP+OOWP is high, and their OOWP is only high because it includes the team's WP in its OOWP.
The RPI fails to pass criteria 2 and 3 just becaue a team's ranking depends more upon how their opponents have done against other teams. It turns out that the RPI also doesn't conform to criterion 1, since the RPI values depend more on the schedules than the wins and losses.
For any column vector V that assigns value V_{i} to each team i, let square matrices W and G be defined by W_{i,j} = wins by team i over team j, and G_{i,j} = games played by team i against team j and define [ result ]^{*} to mean result_{i} is divided by the number of games played by team i.
Then


[ W × V ]^{*} is just the value of a win over team i according to the characterization by V of team i's opponents, and PA characterizes a team by its performance against that characterization of its opponents.

The value vector V is arbitrary: it defines a property that can be passed along the chain of headtohead wins from loser to winner. If V _{i} = 1, then dSOS is essentially Opponents' Winning Percentage and PA is a combination of Winning Percentage and OWP.

With that definition of V, the Performance Against SOS equations can be written as:

The intermediate term [ W × V ]^{*} is given a unique value for reporting purposes. It is the value of a win over team i in the PASOS measurement, and that would be a useful input to a team's scheduling process.
Both aSOS and dSOS consist of elements that are a "percentage of a percentage", and PASOS is a percentage of a percentage of a percentage, so to translate the values of PASOS into the same range as Winning Percentage we define adjusted Winning Percentage as:


Only the dSOS value applies to the team corresponding to the winning percentage; the aSOS is used to calculate the teams' opponents' PASOS (aWP) values. Plotting dSOS and WP against PASOS rank so that teams with the same winning percentage are separated:
The PASOS uses only wins and losses (criterion 1); a winless team has a zero value since SOS is weighted only by wins (criterian 2); and an undefeated team has just PASOS = DSOS (criterion 3).
It is worth noting that the PASOS' definition of schedule strength is not very different from the RPI's. Although there is no fixed ratio that applies to every team, overall the contributions to DSOS are roughly 2/3 opponents' winning percentage and 1/3 OOWP (plus a smaller fraction of OOOWP). The most important difference is that there are no duplications in the PASOS definitions of WP, OWP, and OOWP.
The effect of a team's winning percentage on its own SOS in the RPI definition is clearly visible when compared to the PASOS definition with no duplication. The difference between the trend lines indicates the degree to which a team "played itself", and at the lower values lost to itself!
The algorithm was validated by correlating the expected winning percentage to actual winning percentage and by applying it to a different sport (basketball).
For a game between team i and team j let

then use the probability formula to calculate the expected winning percentage for the favored team.
The motivation for this study was to discover why the RPI appears to work very well for basketball but not as well for baseball, so the first step was to quantify the errors in the RPI's SOS for basketball.
The RPI is more accurate for basketball than baseball because there is less duplication overall between opponents and opponents' opponents, so the errors in the OWP and OOWP calculation tend to be in the same direction. In basketball, the errors reinforce each other, and in baseball the errors cause OWP and OOWP to partially cancel each other. A conjecture would be that this due to there being more teams (326 compared to 287) and fewer games (27 compared to 56) in basketball.
The second step was to apply the PASOS to the 200304 basketball season.
The combination of winning percentage and schedule strength is the same, and we note that the smaller range for the SOS function is another indication of the difference in the sports' schedule topologies.
For basketball, the correlation between expected winning percentage by higher ranked teams to actual winning percentage is not quite as strong as for baseball.

Here we've assigned a value to a win based upon the opponent's rank: 1 for a win against a top 25 team, 1/2 for a top 50 opponent that's not a top 25, 1/4 for a top 100 team opponent that's not in the top 50, 1/8 for a win vs a team in the 101200 range, and zero for a win over a team in the 200+ range. The difference between this report and the PA is that we also count losses: 1/8th for each loss to a top25 team, 1/4th for a loss to a top 50 team, 1/2 for a loss to a top 100 team, and 1 for each loss to a +100 team. All teams can be ordered by this metric.
A more precise ordering can be obtained by setting V to be the rating used to produce such a report and applying the Performance Against algorithm. Instead of a range of opponents' values such as 125, we just use the exact value for each opponent. For the RPI:
Applied to the PASOS itself, the result is:
Using the Performance Against algorithm to rerank teams can provide a ranking that is independent of the SOS definition used by the system. The ordinal ranking by adjusted Winning Percentage doesn't look much like the ordinal ranking by RPI value:
When a team's ranking is adjusted based upon its wins against teams ranked higher or lower than it combined with its opponents' adjusted rating based upon wins over teams ranked higher or lower than the opponents, relative orders become much more nearly identical. In other words, while the RPI SOS and PASOS SOS are different, the dSOS provided by PA(aWP) and PA(RPI) wind up being almost the same despite the different rating systems, especially in the important rankings (the top 50). The same results obtain when a different rating system is used (in this case Boyd Nation's Iterative Strength Ratings (ISR):
The same relationship holds for PA(RPI) Rank as a function of PA(ISR).
The PA(ISR) doesn't correlate as well to the RPI or aWP via the PA function because its value doesn't depend only upon winning percentage and SOS. More precisely its version of Winning Percentage and therefore SOS are calculated using an adjustment that accounts for home field advantage. The RPI correlations are also less than perfect because of the bonuses and penalties that are included in addition to Winning Percentage and SOS.
See this conjecture regarding the mathematical reason the PA function improves the correlation between arbitrary rating systems.
The last property of the generalized Performance Against method provides a mechanism for comparing systems that incorporate factors other than wins and losses. Typically these involve game location (as the ISR does) and/or margin of victory.