Forums

Chess.com ratings are so inaccurate

Sort:
basketstorm
wiredtearow wrote:

What I was trying to argue is that the more games you play, the more you'll reach a more accurate rating.

No. Not even after 10 thousand games. This isn't coin flipping. I'm sorry but you don't know what you are talking about.

wiredtearow
theNobody14161 wrote:
wiredtearow wrote:
theNobody14161 wrote:
theNobody14161 wrote:
David wrote:

All that means is that you've had a bad run of results lately and your rating has dipped below where it has historically been. You can see this in your rating graph. If you play more games, your rating should return to its usual level

We will know in about two weeks, provided I have time to play

So, finally got time to play, and yeah, raising the ratings of my opponents seems to help my rating climb considerably. While it is perhaps more evidence of a rating settling point issue inherent to rating calculations on chess.com and pertinant to this topic, I also wonder if there has ever been any attempt to standardize ratings. Is a 1200 FIDE today the same as a 1200 FIDE 20 years ago if there is nothing to stop rating deflation? Others seem to comment that 1200 on chess.com isn't quite the same as 1200 on lichess, just because the player pools are different.

I'd say that it's slightly inaccurate until you reach a rating where your odds of winning are 50 50. If you're in a level where the odds are imbalanced, you're either better or inadequate for that rating. The beauty of it is, overtime, the more games you play, the more likelihood that you'll reach a more accurate rating. It's a self correcting system that only needs to focus on keeping the games fair and that's exactly what they're trying to do.

When it comes to standardizing ratings, FIDE doesn't necessarily need to align/standardize the rating system for itself, chess.com, and lichess. It has no real responsibility or need to do that. It could stay the way it is where people's ratings in FIDE, chess.com, and lichess are segregated. It's more straightforward that way to me. These are completely different chess environments offering very different experiences so it makes sense that the ratings from these stay individual from each other.

Whole point of my post is that, no, ratings will not converge in a self-correcting manner. they will converge at different points based on whether your opponents are rated higher than you or not.

Why is everyone misinterpreting my comment when I said that ratings are "self correcting"? I clarified it in a later comment but yeah. This isn't what I was referring to when I said that the rating system is self correcting.

wiredtearow
basketstorm wrote:
wiredtearow wrote:

What I was trying to argue is that the more games you play, the more you'll reach a more accurate rating.

No. Not even after 10 thousand games. This isn't coin flipping. I'm sorry but you don't know what you are talking about.

I know that it isn't coin flip. But am I crazy in saying that it's a universal experience that after 500+ games, people actually struggle with people rated higher than them? Is my experience unusual?

You say that I don't know what I'm talking about. But you're the one who is refusing to elaborate your point even though it is the bolder claim. Bolder claims need to be substantiated more since they challenge the norm. So go on, elaborate it if you know what you're talking about.

basketstorm

Why did you dislike my comment? This is very rude. Before disliking my comment do some actual calculations and account for every factor that we have here on chess.com and you will see that I was right.

basketstorm
wiredtearow wrote:
basketstorm wrote:
wiredtearow wrote:

What I was trying to argue is that the more games you play, the more you'll reach a more accurate rating.

No. Not even after 10 thousand games. This isn't coin flipping. I'm sorry but you don't know what you are talking about.

I know that it isn't coin flip. But am I crazy in saying that it's a universal experience that after 500+ games, people actually struggle with people rated higher than them? Is my experience unusual?

You say that I don't know what I'm talking about. But you're the one who is refusing to elaborate your point even though it is the bolder claim. Bolder claims need to be substantiated more since they challenge the norm. So go on, elaborate it if you know what you're talking about.

I already explained everything and provided calculations and data and references to Arpad Elo book. It is you who are doing bold claims. My claims are backed with real data and calculations.

If you don't feel comfortable with math, that's ok. Turn on the logic then: no matter how much time you average INACCURATE data, you will not gain any accuracy, result will fluctuate forever.

wiredtearow
basketstorm wrote:

Why did you dislike my comment? This is very rude. Before disliking my comment do some actual calculations and account for every factor that we have here on chess.com and you will see that I was right.

I've already elaborated my position. It's your turn to elaborate yours. The thing is, I'm actually open to learning more about it. It's just that you're not doing a good job of convincing me. Trust me. I'm not the type to push for something that's clearly proven wrong already. You're just being rude and unconvincing.

wiredtearow
basketstorm wrote:
wiredtearow wrote:
basketstorm wrote:
wiredtearow wrote:

What I was trying to argue is that the more games you play, the more you'll reach a more accurate rating.

No. Not even after 10 thousand games. This isn't coin flipping. I'm sorry but you don't know what you are talking about.

I know that it isn't coin flip. But am I crazy in saying that it's a universal experience that after 500+ games, people actually struggle with people rated higher than them? Is my experience unusual?

You say that I don't know what I'm talking about. But you're the one who is refusing to elaborate your point even though it is the bolder claim. Bolder claims need to be substantiated more since they challenge the norm. So go on, elaborate it if you know what you're talking about.

I already explained everything and provided calculations and data and references to Arpad Elo book. It is you who are doing bold claims. My claims are backed with real data and calculations.

If you don't feel comfortable with math, that's ok. Turn on the logic then: no matter how much time you average INACCURATE data, you will not gain any accuracy, result will fluctuate forever.

Oh yeah. In this case, I agree fully. Especially considering that some players actually are on and off. Returning players will be rusty, cheating will have to be accounted for. I have no argument that rating will ALWAYS fluctuate. But doesn't it stabilize again after a few games? Will it not self correct to give you your "new actual" rating? Like how do you even address this? Are you suggesting that chess.com should have "assessment games" again for returning players?

wiredtearow

I guess my point is, what's wrong if rating fluctuates? It does its job better that way. It helps you assess how you fare against other people.

If anything, the idea that it fluctuates actually helps it become more accurate since it constantly adjusts based on your capability relative to other players.

ChewbaccaPlaysChess123

yeah, my rapid rating is 570 but whenever I do game review on one of my games it shows I play like 1000+.

basketstorm

No it does not stabilize and it does not self-correct. You would think that it does especially when everyone claims that. But I'll repeat, there's pool isolation (by rating bands, by timezones, by time controls) and inherent inaccuracies, you cannot turn that into something stabilized and accurate, no way. Just try to run some simulations you will see.

Returning players: chess.com has temporary RD increase (affects the K-factor) for returning players. Still does not solve the issue.

As to suggestions, players must be unrated (or just their rating should not affect rating of other players) until they play N amount of games (like 30-50) against rated players, that's first, to avoid that initial inaccuracy introduced during sign-up. Matchmaking should be forcibly wider, not just within narrow band of ratings.

And special pool-joining games must be arranged regularly. For that, chess.com needs to build player graphs and outline individual pools and pair players between pools. After such special games, ratings of the whole pool must be recalculated. That would mean that your rating will go up or down every day even if you didn't play that day, just because you belong to a certain pool and the system got more information about actual strength of your pool. This is not part of Glicko, Glicko is overly simplistic, it is advertised as improvement over Elo but in fact, Arpad Elo himself predicted all these issues that we have in online chess and proposed solutions. Glicko is inferior compared to Elo's ideas.

wiredtearow
basketstorm wrote:

No it does not stabilize and it does not self-correct. You would think that it does especially when everyone claims that. But I'll repeat, there's pool isolation (by rating bands, by timezones, by time controls) and inherent inaccuracies, you cannot turn that into something stabilized and accurate, no way. Just try to run some simulations you will see.

Returning players: chess.com has temporary RD increase (affects the K-factor) for returning players. Still does not solve the issue.

As to suggestions, players must be unrated (or just their rating should not affect rating of other players) until they play N amount of games (like 30-50) against rated players, that's first, to avoid that initial inaccuracy introduced during sign-up. Matchmaking should be forcibly wider, not just within narrow band of ratings.

And special pool-joining games must be arranged regularly. For that, chess.com needs to build player graphs and outline individual pools and pair players between pools. After such special games, ratings of the whole pool must be recalculated. That would mean that your rating will go up or down every day even if you didn't play that day, just because you belong to a certain pool and the system got more information about actual strength of your pool. This is not part of Glicko, Glicko is overly simplistic, it is advertised as improvement over Elo but in fact, Arpad Elo himself predicted all these issues that we have in online chess and proposed solutions. Glicko is inferior compared to Elo's ideas.

Yeah I guess that's a good idea. I would imagine that some people actually start playing unrated and ease up on the gas once they reach a certain rating. In that case, it doesn't reflect their rating anymore. What do you think about removing dormant accounts from the calculation? Meaning, players who haven't played rapid in 3 months should be removed from the percentile calculation?

David

Lichess uses Glicko-2 - if your system is as superior as you claim, I'm sure they'll be happy to implement it. Chess.com would rather invest its resources into something that might attract new players to the game rather than satisfy some obscure corner of the existing user base.

wiredtearow
David wrote:

Lichess uses Glicko-2 - if your system is as superior as you claim, I'm sure they'll be happy to implement it. Chess.com would rather invest its resources into something that might attract new players to the game rather than satisfy some obscure corner of the existing user base.

Yeah I don't really know how the 2 rating systems compare but I'm fully with this idea. For what it's worth, chess.com ratings actually work. Why are people so stressed with being rated accurately? As long as I get matched with people who are on par with me, and if my rating helps me assess my capabilities against other players, what's actually the problem?

basketstorm

Leaderboard histograms integrate to different numbers not to the ones displayed under the graphs. That tells me that their information on percentiles might be outdated/incorrect. Anyway these are just general stats, shouldn't matter too much.

basketstorm
David wrote:

Lichess uses Glicko-2 - if your system is as superior as you claim, I'm sure they'll be happy to implement it. Chess.com would rather invest its resources into something that might attract new players to the game rather than satisfy some obscure corner of the existing user base.

Glicko-2 adds volatility, just another factor to calculate the K-factor, but that is still the same increment-based system which does no maintenance to the player pool.

basketstorm

And I don't think lichess has resources to do any global changes. Chess.com with their revenues - maybe

David

Here's the thing: the ratings might not be perfectly accurate but they are accurate enough - there's no benefit in doing a whole bunch of work to change a system that works perfectly fine for the overwhelming majority of situations.

There's an enormous difference between 1|0 and 2|1, yet results for both are grouped together as "Bullet" - you could argue the same for Blitz and Rapid. Do we we really want a rating for every single time control that anyone has ever played? Chess.com might get around to adding a Hyperbullet rating if the 30s type hyper bullet games gain more traction, or even just to get Arkadiy Khromaev off the top of the leaderboard, but it makes zero sense to dedicate any significant amount of computing or development resources into a project such as this.

basketstorm
David wrote:

Here's the thing: the ratings might not be perfectly accurate but they are accurate enough - there's no benefit in doing a whole bunch of work to change a system that works perfectly fine for the overwhelming majority of situations.

There's an enormous difference between 1|0 and 2|1, yet results for both are grouped together as "Bullet" - you could argue the same for Blitz and Rapid. Do we we really want a rating for every single time control that anyone has ever played? Chess.com might get around to adding a Hyperbullet rating if the 30s type hyper bullet games gain more traction, or even just to get Arkadiy Khromaev off the top of the leaderboard, but it makes zero sense to dedicate any significant amount of computing or development resources into a project such as this.

Oh no, my comment was gone. I was saying that there is point. FIDE found that their rating differences do not represent strength difference (reflected in game outcomes) and did not hesitate to recalculate all ratings below 2000. It happened this year. Players got new ratings (bumped up for most).

Game outcomes is the real data, not the value you obtained through increments (rating) with a hope that things will balance out on their own. If you have game outcomes, you know the strength difference and if there is enough cross-play you can build the graph and assign ratings from scratch.

Time controls - yes, better to have single control per category or more categories, not wise to mix. Like FIDE typically uses 3|2 for Bullet.

wiredtearow
basketstorm wrote:
David wrote:

Lichess uses Glicko-2 - if your system is as superior as you claim, I'm sure they'll be happy to implement it. Chess.com would rather invest its resources into something that might attract new players to the game rather than satisfy some obscure corner of the existing user base.

Glicko-2 adds volatility, just another factor to calculate the K-factor, but that is still the same increment-based system which does no maintenance to the player pool.

Yeah to me, even though the rating might be "inaccurate" or not 100% correct, I do think that the rating system does what it's supposed to do. Let's say that after some point, the level of performance isn't same anymore, I just think that the current rating system will eventually match them to people on par with their skill level. In that sense, I still find the rating system adequate. It's not really game breaking.

Most people who complain about the ratings are frustrated players who are stuck in a certain elo and I would imagine that they won't disappear no matter how accurate we calculate the ratings. In that sense, I agree with David. Complaining about a completely working system that has very minor flaws is just nitpicking.

And I feel like, more than actually praying that chess.com addresses this gap in the rating system, I do think that you just want your point to be validated. In that case, I think that there's truth in what you're saying but I don't really find huge faults in the rating system that is worth addressing urgently. More than the rating system, I do hope that they spend their efforts in continuously improving their cheat detection system.

Vardhansshah

Thanks