Sign in to follow this  
andre_beton

The History of the Best Rikishi of All Times (Video)

Recommended Posts

I spent some time trying to visualise the (modern) history of professional Sumo using machine learning and developed an algorithm (similar in idea to Elo etc.)  that models it quite well in my opinion, although I am far from being an expert, especially for anything before the 1990s. I would welcome any feedback, so I can make further improvements in the future and perhaps use it to create predictions for tournaments, too. Anyway, the video:

 

  • Like 14
  • Thanks 5

Share this post


Link to post
Share on other sites

Wow, this is incredibly good stuff, especially on the visualization end! Some people on the forum have worked with Elo-type data for rikishi (Doitsuyama, Gurowake, and myself included). I will definitely have a look at my old data and will compare them to yours. It would be interesting to know more about your approach. In particular, I was curious about the machine learning part. If I had the skills for that, an ML approach would be something I would have tried for years.
One thing I noticed is that the average ratings have gone up quite substantially over the last 100+ years. Of course, one could make the case that sumo wrestlers are much better now than before, but it could also be a statistical artifact of ratings inflation which plagues so many Elo-type approaches.

Share this post


Link to post
Share on other sites

My approach is inspired by Elo, Trueskill, WHR, etc. The main improvement conceptually vs Elo is that my algorithm 'smooths' back and forward until convergence, that is ratings depend on bouts before and after a certain date. This makes it much more useful from a historian's perspective.

Would you not classify Elo as machine learning?

Re inflation: the main reason here is that during the earlier years the data only includes the top division and later more and more divisions are included. This of course leads to more spread of the ratings. It is a very interesting problem to figure out whether today's rikishi are better than their predecessors, I think for chess there has been a lot of work done with regard to it.

Share this post


Link to post
Share on other sites
Posted (edited)

Wajima appears to be named Hanakago (his heya).

Edited by Asojima
  • Haha 1

Share this post


Link to post
Share on other sites

Yep, something went wrong with Wajima Hiroshi & Arase Nagahide. My limited pre-Eurosport Sumo knowledge clearly shows here.

Share this post


Link to post
Share on other sites

I am no data scientist, and I must admit that I never heard of Trueskill or WHR before you mentioned it. I noticed that when Takanohana did not fight for almost a year in 2001/02, your graph showed a decline in ratings. But that makes perfect sense when your algorithm also takes the future into account. When I was hooked on Elo (until around 2017), I did it mainly to be good in sumo games, and there you have to work with the data of the past. But again, your perspective is super-interesting to me and I definitely appreciate the effort.

The „problem“ with all these approaches is that they all have some face validity. In order to assess how accurately they depict reality, you would need them for predictive purposes. Did you check whether the variant you have chosen was superior to other approaches, e.g., using a different k factor?

Share this post


Link to post
Share on other sites

@Randomitsuki The approach I use is (maybe slightly counter-intuitive) also usable for predictive purposes. Of course, I cannot time-travel. However, the smoothing forward and backward also results in better current (as in today) ratings, as the past bouts can be estimated more accurately. So I optimised the hyperparameters (my equivalents of the k factor) by testing predictive power out-of-time (i.e. by not using 'future' bouts) from a certain cut-off date (I think it was Jan 1st 2018).

  • Like 1

Share this post


Link to post
Share on other sites

I guess Hakuho's slump right after Asashoryu's intai reflects an inherent flaw of the ELO system, i.e. less points to be gained with less high-profile opponents.

Anyway, amazing piece of work! (Applauding...)

Share this post


Link to post
Share on other sites
Posted (edited)

As will easily be detected, I am not a data scientist.  And I am not quite sure what factors are included in the calculation of sumo strength rating.  

But it strikes me as odd that in the case of Kisenosato, the peak of his "sumo strength" came in March of 2013, followed by another slightly lower peak in the first half of 2016, followed by a steady decline.  That seems odd given that Kisenosato won two straight yusho in early 2017 – the only Top Division championships he secured in his whole career.  I realise that the graph shows a narrowing of the gap in sumo strength between then yokozunas Hakuho, Harumafuji, and Kisenosato, but Kisenosato's sumo strength remains lower than that of the other two at a time (albeit a brief time) that he was dominating the sport.  

Perhaps if I knew what you mean by "sumo strength rating", it would make more sense...

But clearly a great deal of work has been put into this! (Bow...)

Edited by Amamaniac

Share this post


Link to post
Share on other sites

@Amamaniac Thanks for the feedback.

1. The Sumo Strength Rating indicates the likelihood of winning a bout. You can find the exact formula in the description on youtube.

2. I create the ratings using all data available to me which includes more than just the top division since the 1950s and more and more of the lower divisions since then. Hence, even a dominant top division rikishi may lose rating points if the divisions get 'closer' to each other (because newly promoted and relegated rikishi perform in such a way).

3. One point of making this is exactly so we don't have to rely on winning records and the official rankings. They are misleading with regard to 'pure ability' because e.g. rikishi do not participate in tournaments due to injuries, or someone has an easier fight schedule.

  • Thanks 1

Share this post


Link to post
Share on other sites
Posted (edited)

Thoroughly enjoyed the video! Great work :)

Edited by Hakuho
  • Like 1

Share this post


Link to post
Share on other sites

Amazing piece of work! Thank you for sharing!

Wonder who's the topper in the area-under-the-curve list (basically the product of rating and time active)? Would that be difficult to calculate?

  • Like 2

Share this post


Link to post
Share on other sites

This needs to be submitted to the sumo museum and have it keep going all day. 

Share this post


Link to post
Share on other sites
Posted (edited)

@Moderators.  This topic definitely belongs in the Sumo Information forum.

Edited by Jakusotsu
Done.

Share this post


Link to post
Share on other sites
On 10/06/2020 at 05:45, Randomitsuki said:

One could make the case that sumo wrestlers are much better now than before, but it could also be a statistical artifact of ratings inflation which plagues so many Elo-type approaches.

In almost any sport you can name, the former is certainly the case but I don't see the same kind of massive jumps in technical knowledge, physicality or training methods in sumo. I think it's one of the few sports in which you could put any top rikishi from the last half century or so into the ring and have a good battle. 

Definitely looks like rating inflation to me.

  • Like 3

Share this post


Link to post
Share on other sites

@John Gunning Definitely there wasn't such a big jump in skill as the ratings would indicate. The reason for the 'inflation' is that for earlier years lower division bout results are not available to me. If I only train the model on top division results, inflation isn't an issue, but the ratings become slightly less accurate (in the out-of-time tests) because there is no information available for upcoming new top division wrestlers.

The mean of the rating is at 1500. In the early days the mean skill is a average top division wrestler, later it's in the third or fourth division.

This is also the reason for rating inflation in chess, I believe: more and more players (with less skill) were added to the system over time.

 

  • Like 1

Share this post


Link to post
Share on other sites

If Randomitsuki is no data expert, then I am a cave man. That being said, the pretty obvious inflation might be possible to normalize, might it not? Take the top wrestlers with their respective peaks (or even groups of top wrestlers of a certain time with their group average peaks). Such data points should more or less group around a linear score progression over time. If, say, top Ozeki (or all Ozeki) of a certain time are analyzed accordingly, one might generate a second kind of linear increase, which is not neccesarily parallel to the top one. With such estimated increases and their difference in increase factor, one might calculate dynamic corrective factors to apply to to all data points at any given time. This application might not make sense for the actual calculations that generate the Elo, but maybe for tweaking the numbers before the actual data output into a visual timeline, like you did.

Do I make any sense?

Amazing work! I like the visuals. Add a koto sound "ping" to every moment one line crosses another one, with the sound's pitch based on how high up we are in the graph.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this