Tuesday, October 13, 2020

Changes For 2021 Organized Play

The CFR OP Trophy

We have just completed the fourth year of Organized Play for Championship Formula Racing and the ranking system has largely remained the same over that period of time.  One of the main things I tried to do with the original system was to give different values to different races based on the competition.  In theory, a win against a bunch of strong competitors should be worth more than a win against a group of rookies.  The trick is judging the competition.

What I have done up to now is to count a driver's best race finishes over the last two years.  I would then use that value to judge that driver's ability.  The ability of the drivers in a race would add or subtract points from the value of that race.

The main problem with that concept is that it is circular.  If I win 5 races against relative newbies I'm going to look like a world beater.  But I can't judge the value of those wins until I figure out the value of those wins... 

I also had a siloed community problem.  If a group of drivers mostly race only against each other how do I compare them to others?

Late this season, I decided to try to improve the system.


Goals

My main goal was to try to improve the system's ability to value different races and tournaments.  Specifically and in relative order of importance:

  • Improve the system's ability to value individual drivers
  • Do something about silos
  • Prevent radically high tournament ratings
  • Create a more predictable and consistent ranking system
  • Provide points for ladder sub-tournaments
  • Rebalance tournaments in relation to races


Ratings vs. Rankings

CFR Organized Play is all about figuring out which driver had the best season -- what I would call a ranking.  It is not about figuring out who the best driver is -- what I would call a rating.  

This is a distinction that happens (perhaps unintentionally) in sports all the time.  The winner of the NBA Finals may not be the team that was arguably the best team in the NBA right now.  But it was the team that won the playoffs and thus had the best season as defined by the NBA.

The NBA and most professional sports leagues are able to easily ignore ratings when crowning a champion because they can fairly structure seasons and playoffs to give everyone a similar chance and similar competition.

CFR Organized Play does not do that, instead it simply observes everything going on as organized by various people and has to judge a champion.  This is where ratings come into play.  The system uses ratings to try to figure out how much value to give every race because it can not rely on some overall structure to keep things even.

I bring this up as a preface to talking and ELO and ratings because at the end of the day, these are just tools for trying to fairly figure out a ranking.  But driver ratings have never been goals in and of themselves for CFR Organized Play.  And... I'll talk about that more a little further down this page.


ELO to the Rescue

I've been using a ELO a lot over the last couple of years in a slightly related project to rank and categorize the best F1 drivers of all time (more on that much later).  So I felt comfortable using ELO as the basis of how good individual drivers are.  For those not already familiar, ELO can be used to figure out how likely one person is to beat another person based on the difference in ELO scores between the players.  After each event, the actual results are compared to what ELO thought would happen and adjustments are made to each player's ELO scores.

A complication with using ELO is that as described above, ELO works best if it is updated after every game or short event.  But I don't always get race results immediately after the race and PBeM races take months to end.  How does that work with ELO if in middle of that month long race, some of the drivers participate in 3 live, in-person events?  What's their ELO?

So, I decided that I would only recalculate ELO at the end of every season.  Every race that season would assume each driver had the ELO they started the season with.  I would then add up all of the ELO adjustments from their races that season and calculate a new ELO for next season.  I'm sure this means that my ELO scores are not as accurate as they could be, but I do not think calculating on the fly would be feasible.


Playing around with K

ELO calculations have a variable called K that tends to get tweaked by people who use ELO in different situations.  What I ended up doing with K is using it as a way to express my confidence in a particular driver's ELO rating which is how I ended up addressing silos.

First I wanted to measure my silo problem and make sure it exists.  So I crunched some numbers.  

So, yes.  Most drivers in a Detroit or San Marino race only ever race in those series.

In the charts on the right, you can see every community I identified and the percentage of "silo" drivers in an average race.  Red cars represent drivers who never leave that series and blue cars represent drivers who have participated in at least one race in another community.

Why is this important?  As good as ELO is, if two groups of game players never mix their ELO really only tells you how good they are within their community.  The less cross over, the less confident I should really be about the accuracy of an ELO score.  Lets think of it this way.

We have a chess tournament with 4 players.  The top 2 players end up playing each other while the bottom two end up playing each other.  If the opponents in this tournament never change ELO will tell you that the best and 2nd worst player at this tournament are equally good because they both won their games.  It will also tell you that the 2nd best and worst player are the same.  

Of course once you mix the opponents up it will not take ELO long to figure out what is really going on.

So what I wanted to do was change my K value depending on how confident I felt about ELOs.  What this does is reduce the amount a driver's ELO changes when racing against drivers who don't get out much.

I starting thinking about his from a community perspective.  But I figured out that it is really a bit more complicated than the chart on the right.  For instance, large ladder series like Redscape and P1 look very different if you look at the races at the top end instead of the bottom end -- where new drivers usually come in.

So what I ended up doing was calculating a K value for every driver in the rankings.  That K value is based on how many races the driver has raced outside of their main community.  A driver's K ends up being 5, 10, 15, or 20.

Remember that K values have no direct impact on rankings.  No one will get more points or fewer points from a race against drivers with higher or lower K values.  Also note that I'm not throwing any shade on Detroit and San Marino or any future outpost of CFR.


Smoothing out the Scores

So what am doing with all these ELO scores?  The average of the top 10 ELO scores in any given race define that race's score multiplier.  The average of the top 20 ELO scores in any given tournament or season define that tournament's score multiplier.  This is similar to how the system works now but with a couple important changes.  

First off, I'm not using raw ELO.  I assign the highest ELO in the land a value of 1.75 and the lowest a value of 0.5.  And then I scale everyone else's values in between.  ELOs above 1000 get to be above 1 while ELOs below 1000 are below 1.  These numbers are tweaked to provide hat I consider to be enough value difference without ending up with a race or tournament that has a really high or low value compared to everything else.

A corollary to the above scaling is that I'm no longer adjusting scores if the event is live as opposed to asynchronous and I'm not rewarding an event or race for having more or less drivers.  This should remove the possibility for people to game the system and makes things more straightforward and less complicated.

Also, because ELOs do not change mid-season, race and event scores will not change after the event is scored.  Because the current system was constantly adjusting the weight of races and events scores would change seemingly randomly over the course of a season.  This will make everything much more predictable and consistent.


What Value Tournaments?

The next question I wanted to deal with is how much a tournament should be worth in relation to races.  At the same time I also thought about how many races and tournaments should count towards the Organized Play championship.

There wasn't any magic or complicated math here.  I picked a bunch of different values until I ended up in a spot I liked.

Going forward only the top 3 finishers in an event score points and they score 1/2 the value of a race.  So, while the winner of a race will get 23 points times that races multiplier the winner of a tournament will gain 11.5 points time that tournament's multiplier.

In the end I decided that points from the top 2 tournaments and top 5 races seemed right still, so that stays as it was.


What Exactly is a Tournament Anyways?

At this point, I've dealt with all of the really important things I wanted to deal with.  But several people had brought up an interesting point.  In the 2 large ladder series, we have groups of people who participate in a series of races but do not get counted as a tournament of their own.  

So when I was tinkering with tournament points I kept this in mind and broke out a couple seasons of ladder series to see how this would go.  Going forward, I will be counting ladder sub-series as their own series and not part of the greater event.

It doesn't devalue the higher series and gives some tournament points to more people.


Math?

If you want some more formulas and math... check out this page.

2 comments:

  1. When does this new approach take effect?

    ReplyDelete
  2. Since it is never defined, I am left to guess that ELO is an abbreviation for: Elves Laughing Outloud (ELO).

    ReplyDelete