|The CFR OP Trophy|
We have just completed the fourth year of Organized Play for Championship Formula Racing and the ranking system has largely remained the same over that period of time. One of the main things I tried to do with the original system was to give different values to different races based on the competition. In theory, a win against a bunch of strong competitors should be worth more than a win against a group of rookies. The trick is judging the competition.
What I have done up to now is to count a driver's best race finishes over the last two years. I would then use that value to judge that driver's ability. The ability of the drivers in a race would add or subtract points from the value of that race.
The main problem with that concept is that it is circular. If I win 5 races against relative newbies I'm going to look like a world beater. But I can't judge the value of those wins until I figure out the value of those wins...
I also had a siloed community problem. If a group of drivers mostly race only against each other how do I compare them to others?
Late this season, I decided to try to improve the system.
My main goal was to try to improve the system's ability to value different races and tournaments. Specifically and in relative order of importance:
- Improve the system's ability to value individual drivers
- Do something about silos
- Prevent radically high tournament ratings
- Create a more predictable and consistent ranking system
- Provide points for ladder sub-tournaments
- Rebalance tournaments in relation to races
Ratings vs. Rankings
CFR Organized Play is all about figuring out which driver had the best season -- what I would call a ranking. It is not about figuring out who the best driver is -- what I would call a rating.
This is a distinction that happens (perhaps unintentionally) in sports all the time. The winner of the NBA Finals may not be the team that was arguably the best team in the NBA right now. But it was the team that won the playoffs and thus had the best season as defined by the NBA.
The NBA and most professional sports leagues are able to easily ignore ratings when crowning a champion because they can fairly structure seasons and playoffs to give everyone a similar chance and similar competition.
CFR Organized Play does not do that, instead it simply observes everything going on as organized by various people and has to judge a champion. This is where ratings come into play. The system uses ratings to try to figure out how much value to give every race because it can not rely on some overall structure to keep things even.
I bring this up as a preface to talking and ELO and ratings because at the end of the day, these are just tools for trying to fairly figure out a ranking. But driver ratings have never been goals in and of themselves for CFR Organized Play. And... I'll talk about that more a little further down this page.
ELO to the Rescue
I've been using a ELO a lot over the last couple of years in a slightly related project to rank and categorize the best F1 drivers of all time (more on that much later). So I felt comfortable using ELO as the basis of how good individual drivers are. For those not already familiar, ELO can be used to figure out how likely one person is to beat another person based on the difference in ELO scores between the players. After each event, the actual results are compared to what ELO thought would happen and adjustments are made to each player's ELO scores.
A complication with using ELO is that as described above, ELO works best if it is updated after every game or short event. But I don't always get race results immediately after the race and PBeM races take months to end. How does that work with ELO if in middle of that month long race, some of the drivers participate in 3 live, in-person events? What's their ELO?
So, I decided that I would only recalculate ELO at the end of every season. Every race that season would assume each driver had the ELO they started the season with. I would then add up all of the ELO adjustments from their races that season and calculate a new ELO for next season. I'm sure this means that my ELO scores are not as accurate as they could be, but I do not think calculating on the fly would be feasible.
Playing around with K
ELO calculations have a variable called K that tends to get tweaked by people who use ELO in different situations. What I ended up doing with K is using it as a way to express my confidence in a particular driver's ELO rating which is how I ended up addressing silos.