Women's Hoops Blog

Inane commentary on a game that deserves far better

Monday, February 14, 2005

As March approaches, folks start paying increasing attention to RPI numbers. Sometimes, I would say, too much attention.

The basic RPI formula used by the Selection Committee is pretty simple. It is just a weighted average of your winning percentage (25%), your opponents' winning percentage (50%), and your opponents' opponents' winning percentage (25%). The latter two make up the Strength of Schedule portion of the formula.

Computing the formula involves first answering some complicated questions. You must decide, for example, whether every game counts on both sides of the ledger -- whether a win by your team adds not only a win to your winning percentage but also a loss to your opponents' winning percentage. Different assumptions seem to lead some of the different services to have slightly different results even though they all have the same basic formula.

Moreover, the Selection Committee itself apparently makes some "secret adjustments" based on a system of bonus points. No one knows exactly how that system works, so the RPI numbers we find on the internet are not the same as the RPI numbers that the Committee has in front of it.

The RPI has some flaws. The most commonly mentioned are: (1) it doesn't distinguish between home and away games (the men's Committee changed that this year); (2) it doesn't distinguish between games early in the year and late in the year; (3) it takes no account of margin of victory; and (4) it takes no account of injuries to key players.

To these, I would add some more technical criticisms.

First, I think the weighting is off -- I think too much weight is given to opponents' winning percentage and not enough to opponents' opponents' winning percentage. When Minnesota played Iowa this year, our SOS number rose substantially, but when we played Penn State, our SOS fell. Penn State was a better team, but wasn't counted as a better team because it had a low winning percentage against a very tough schedule.

Second, I think the formula distinguishes too much between really bad and just ordinarily bad teams. To a team in the top 50, there shouldn't be much difference between playing the #150 team and the #250 team -- it should be an easy win either way. But the latter can affect your SOS number dramatically, while the former doesn't.

Finally, there is a recurring quirk that results from conference play. This year, for example, Duke plays UNC and Maryland twice each. When UNC and Maryland play each other, no matter who wins, the game adds two wins and two losses to the "opponents' winning percentage" portion of Duke's RPI number, and a dozen or so wins and losses to the "opponents' opponents' winning percentage" portion. As you repeat that over and over again during conference play, everyone's SOS number converges on 50%.

Partly as a result of that convergence, teams' RPI numbers are often separated by a fraction of a percent. And because the numbers are so close, even a single game late in the season can change your SOS and RPI ranking substantially.

What does this all mean?

It means that the RPI is only a starting point for analysis. It means that bracket projections based solely on RPI are meaningless. It means that an argument like "our RPI is 37 and yours is 40, so we should get a bid ahead of you" is, without more, unpersuasive.

To make decisions about who gets in and who gets what seed, you have to look closer. You have to look at records against top 10, top 25, top 50 teams (and that is where the RPI probably has its greatest use). You have to look at a team's best wins and its worst losses. You have to make a subjective adjustment for all the factors that the RPI misses.

Luckily, the Committee does all that -- it does not simply rely on RPI. It takes a close look at a variety of factors, and for the most part, it reaches the right result.