Jump to content

Recommended Posts

Posted

I love my Herculean sumo projects. While several of my giant stats-heavy projects did not come to a satisfactory conclusion yet (e.g. comparable strength ratings for all rikishi since 1934; talent prediction based on Elo points, age, and number of bashos), my somewhat declining gaming career re-awakened a very old idea for just another project of epic proportions. Voila, after an estimated 200 hours of hard work of programming and compiling data in the last few months, the project is now finished. It allows me to make my daily gaming predictions in mere minutes, instead of the 2+ hours a day.

Now what have I done?

Step 1: I made a list of potential predictors of bout outcomes. The list is comprised of no less than 25 factors that might influence winning or losing. Many of these predictors are of course highly similar, i.e. strongly correlated. The 25 factors (in no particular order) are as follows:

a) difference in rank between higashi and nishi rikishi (with ranks ranging from 1 for Y1e to 70 for the J14w)

b) difference in body weight

c) difference in body height

d) difference in Elo points gained during the basho

e) difference in Elo points gained throughout the last year

f) difference in career winning percentages

g) difference in winning percentages of the corresponding division

h) difference in winning percentages throughout the last year

i) difference in winning percentages for corresponding technical preferences (computed through a method described in this old thread

j) difference in winning percentages for corresponding body weight (e.g. if the higashi rikishi is heavier, the difference will be between the historical winning percentage of the higashi rikishi against lighter opponents, and the winning percentage of the nishi rikishi against heavier opponents)

k) difference in winning percentages for corresponding body size (using the same method as with j))

l) difference in Elo points (strength rating)

m) difference in winning percentage during the basho

n) difference in winning or losing streak length

o) difference of wins in the last five bouts

p) difference in virtual ranks (computed from actual rank and number of wins)

q) difference in proximity to demotion (with high values if the virtual rank is near the bottom division border)

r) difference in winning percentage of the rikishi one has already faced during the basho

s) difference in ranks of the rikishi one has already faced during the basho

t) difference in average Elo points of the rikishi one has already faced during the basho

u) difference in head-to-heads

v) ratio of head-to-heads

w) difference in head-to-heads in the last year

x) ratio of head-to-head records in the last year

y) length of streak in head-to-head records.

Step 2:

I computed all these values for all bashos since Haru 2002 (the inception of Sekitori-Toto game). I used a statistics program called SPSS, and used a method called stepwise multiple regression. This analysis shows me which of the above factors makes a significant contribution to the bout patterns. This analysis provides me with a value between 0 and 1 for each bout (the higher the number, the more probable the winner). I can then use these values to predict the outcomes of daily bouts, and compare these with my actual gaming performance.

The results:

As of now (after 20164 bouts), there are six significant predictors from the list above: The best predictor by far is the difference in Elo points. It is followed by career winning percentage, difference in head-to-head records, size difference, weight difference, and winning percentage throughout the basho. While some of these predictors did not surprise me (Elo and career winning percentage as indicators for overall strength, head-to-head as adapted overall indicator, winning record as basho-specific indicator), I was surprised by the two indicators of body weight and body size. By and large, lighter and bigger rikishi are at an advantage!

I used several different types of predictions which I won't go into in this post (maybe upon request). And I found out that the automated prediction would have boosted my Sekitori Toto performance quite remarkably. I don't think that a fully automated prediction will be able to compete against the very best players, but it might come darn close. After all, the automated predictions does not see Iwakiyama hobbling off the dohyo, and it does not read impending kyujo announcements by Kintamayama.

The best news, for me, however, is that in the future picking my bouts will involve copying one page from the Doitsubase, pressing a key, and voila, all predictions are there within a second or two.

Posted

Splendid!

I have always been interested in the methods of your investigations (as far as you have shared them in the forum), so I wish you good luck (hm, or maybe better "good elimination of the luck", since this is what it is all about..?) and I hope all this work pays off in good sumo-games results. I hope you don't lose the fun, however - I was having the impression that you in the end enjoy spending those 2 hours a day for your picks.

I have a couple of questions on the details - pure curiosity - I don't have the knowledge to challenge or elaborate your work...

This analysis provides me with a value between 0 and 1 for each bout

If I got you right, the sum of all 25 should give 1 as a result, right?

Can you give me just one real example of how the first five factors are appearing?

The best predictor by far is the difference in Elo points. It is followed by career winning percentage, difference in head-to-head records, size difference, weight difference, and winning percentage throughout the basho.

If in the end the conclusion is that the outcome of the prediction depends on "aggregated" factor like these ELO points, then why it was a failure to use these ELO points alone for predictions, instead of embedding them in such more sophisticated system, that leads back to their values? Is it just the height and weight?

I used several different types of predictions which I won't go into in this post (maybe upon request).

Do the different analysis leading to different results? And, by the way, why you chose to use exactly the "stepwise multiple regression" stuff. What makes it so suitable for this specific task?

And last - it seems that the results of this analysis could be easiest used for Seki-toto entries, and, probably, for the other daily games. Do you intend some "enhancement" that will make it useful for your pre-basho picks?

Posted
I hope you don't lose the fun, however - I was having the impression that you in the end enjoy spending those 2 hours a day for your picks.

Spending two or more hours on your picks can be quite distressing when there is a higher workload in the job, or when family and friends demand attention. Moreover, much of the two hours went into assembling those data that are now just at my fingertips. For instance, Elo points or head-to-heads are now directly read out from my files, rather than having to be manually entered from the Doitsubase.

This analysis provides me with a value between 0 and 1 for each bout

If I got you right, the sum of all 25 should give 1 as a result, right?

Can you give me just one real example of how the first five factors are appearing?

OK, here is a short introduction into how regression analysis works. You have one variable (often called criterion or dependent variable) that you want to have predicted (in this case it is the bout outcome). Then you have the 25 variables that might or might not predict the outcome (called the predictors or independent variables). Regression analysis identifies a numerical value for each predictor (the so-called regression weight). If you want to make a prediction, you need a regression equation. The basic form is

Y = I + w1 * X1 + w2 * X2...

Y is the value you want to predict (the bout outcome). I is the so-called intercept. It is also calculated through the regression analysis. The intercept in my analyses is very close to 0.500. This means that without any influence of the predictors the winning probability for a rikishi is exactly in the middle between 0 (a loss) and 1 (a win). w1, w2, and so on are the regression weights. X, X2, etc. are the actual values of the variable. A concrete example will be provided below.

There are several requirements that need to be fulfilled for such multiple regression analysis to work properly. Actually, I violated against one of the principles (the dependent variables should be continuous, but in the case of bout outcomes only values 0 for a loss and 1 for a win exist). Another requirement is that the independent variables should not be correlated. Results can be severely skewed if this principle is violated. That is why I was using the stepwise method. In the standard method, all variables are thrown into the mix simultaneously. In the stepwise method the analysis uses several steps. In the first step, the very best predictor will be identified (that would be the Elo points difference, according to my results). This variable can explain a small part of the overall patterns of sumo results (called the explained variance). In the next step the remaining 24 variables are used to account for all the remaining variance that wasn't explained through the first predictor. If there is a variable that can explain a significant part of the variance, it will also be used, and a third step ensues. The analysis will go on until no significant predictor can be found. It all works a little bit like a casting show...

The multiple regression analysis then yields an increment, and the regression weights for all significant variables.

Now here is the concrete example. For reasons that you might understand, I will not give away the weights for the overall analysis since many people could use these data and get the same results that took me 200 hours to gain. However, I will show you the actual results for one particular analysis, viz. the currently best prediction for Juryo bouts on nakabi:

I have entered all Juryo bouts on all day 8's between Haru 2002 and Kyushu 2008, and these are the values given by the multiple regression analysis:

The increment is 0.5362. There are two significant predictors. The first one is the Elo point gain in the last year. The regression weight for this predictor is 0.0004. The second significant variable is the head-to-head record, having a regression weight of 0.0308 (you can already see from this example that regression weights do no meaningfully add up, their size only has a relative meaning).

Now let's take a real bout from Kyushu 2008, day 8 in Juryo, e.g. the bout between Tamawashi and Toyozakura.

Before this day, Tamawashi had a strength rating of 1912 points, Toyozakura had a strength rating of 1845. One year before, Tamawashi had a value of 1743 and Toyozakura of 1868. That is, Tamawashi had gained 169 points in the last year, and Toyozakura had lost 23 points within last year. Hence, the value (X1) for this variable would be 169 - (-23) = 192 points (the relative gain of Tamawashi over Toyozakura). Their overall head-to-head before this bout was 3-0 in Tamawashi's favor. X2 would therefore be 3 - 0 = 3 points.

The winning probability (Y) for Tamawashi would then be

Y = 0.5362 + (0.0004 * 192) + (0.0308 * 3) = 0.7054

In other words, Tamawashi would have a 70.54% chance of winning the bout. Which is kind of bad, since Tamawashi actually lost (Sign of approval...).

The best predictor by far is the difference in Elo points. It is followed by career winning percentage, difference in head-to-head records, size difference, weight difference, and winning percentage throughout the basho.

If in the end the conclusion is that the outcome of the prediction depends on "aggregated" factor like these ELO points, then why it was a failure to use these ELO points alone for predictions, instead of embedding them in such more sophisticated system, that leads back to their values? Is it just the height and weight?

That's the nice thing about stepwise multiple regression. It always shows you if an additional predictor should be included or not. My predictions only using Elo points simply weren't as accurate as with additional variables.

I used several different types of predictions which I won't go into in this post (maybe upon request).

Do the different analysis leading to different results?

They surely do. Originally I thought that a detailed prediction (that is, separated regression equations for Juryo on day 1 than for Juryo on day 2 etc.) would yield better results, but it turned out that a global prediction (using the six overall significant predictors for all bouts in a basho) is currently better. This has to do with the sample size. The prediction for, say, Juryo day 8, is based on "only" 600 bouts, and there is still too much random noise to yield a good prediction. The global prediction, however, is based on more than 20000 bouts, and much of the noise is eliminated.

And last - it seems that the results of this analysis could be easiest used for Seki-toto entries, and, probably, for the other daily games. Do you intend some "enhancement" that will make it useful for your pre-basho picks?

Actually, I intend to do something similar for pre-basho games. But until now I have not even thought about potential predictors, let alone starting to write programs that capture the necessary data...

Posted
If you want to make a prediction, you need a regression equation. The basic form is

Y = I + w1 * X1 + w2 * X2...

Y is the value you want to predict (the bout outcome). I is the so-called intercept. It is also calculated through the regression analysis. The intercept in my analyses is very close to 0.500. This means that without any influence of the predictors the winning probability for a rikishi is exactly in the middle between 0 (a loss) and 1 (a win). w1, w2, and so on are the regression weights. X, X2, etc. are the actual values of the variable. A concrete example will be provided below.

But what if there is a non-linear correlation between independent and dependent variable? I'm just asking because, you know, your main predictor (rating difference) just happens to be a non-linear function to the expected result by definition of the ELO system...

Posted
But what if there is a non-linear correlation between independent and dependent variable? I'm just asking because, you know, your main predictor (rating difference) just happens to be a non-linear function to the expected result by definition of the ELO system...

Drats, that's correct... Unfortunately, my statistics knowledge is too limited to know how I should accommodate for that. Any idea? Or do you know how robust multiple regression is with respect to some violations?

Posted

Can someone point me to a thread that defines the ELO system? The only ELO I know is Electric Light Orchestra ...

I noticed the height/weight thing wasn't as most people expected when I put together my head-to-head page for benchsumo. I used that myself for a while ... unsurprisingly, when I let everyone have the data easily, I slid down the banzuke ...

Warm regards,

Kofuji

Posted
Can someone point me to a thread that defines the ELO system? The only ELO I know is Electric Light Orchestra ...

When in doubt, Wiki...

As for Doitsuyama's sumo-specific implementation, see here (archive.org copy). I assume that Randomitsuki uses more or less the same thing, although perhaps not the same standard deviation value (?).

Posted
But what if there is a non-linear correlation between independent and dependent variable? I'm just asking because, you know, your main predictor (rating difference) just happens to be a non-linear function to the expected result by definition of the ELO system...

Drats, that's correct... Unfortunately, my statistics knowledge is too limited to know how I should accommodate for that. Any idea? Or do you know how robust multiple regression is with respect to some violations?

I would take the independent variable "rating difference" as definite component of the final equation and isolate it from the regression analysis. This means in practice that the dependent variable of your analysis should then be "difference between bout outcome and ELO prediction".

Posted
Now let's take a real bout from Kyushu 2008, day 8 in Juryo, e.g. the bout between Tamawashi and Toyozakura.

Before this day, Tamawashi had a strength rating of 1912 points, Toyozakura had a strength rating of 1845. One year before, Tamawashi had a value of 1743 and Toyozakura of 1868. That is, Tamawashi had gained 169 points in the last year, and Toyozakura had lost 23 points within last year. Hence, the value (X1) for this variable would be 169 - (-23) = 192 points (the relative gain of Tamawashi over Toyozakura). Their overall head-to-head before this bout was 3-0 in Tamawashi's favor. X2 would therefore be 3 - 0 = 3 points.

The winning probability (Y) for Tamawashi would then be

Y = 0.5362 + (0.0004 * 192) + (0.0308 * 3) = 0.7054

In other words, Tamawashi would have a 70.54% chance of winning the bout. Which is kind of bad, since Tamawashi actually lost (In a state of confusion...).

It seems strange to me that the increment is not exactly 0.5000. Let's take your example, but instead predict the winning probability for Toyozakura. Then X1 = -192 points and X2 = -3 points. This gives a winning percentage of 36,70%. If you add this to the 70,54% chance that Tamawashi wins, the sum exceeds 100%. That is not possible unless you allow both of them to win. The increment should be exactly 0.5000 to make the sum of chances 100%.

Posted
It seems strange to me that the increment is not exactly 0.5000. Let's take your example, but instead predict the winning probability for Toyozakura. Then X1 = -192 points and X2 = -3 points. This gives a winning percentage of 36,70%. If you add this to the 70,54% chance that Tamawashi wins, the sum exceeds 100%. That is not possible unless you allow both of them to win. The increment should be exactly 0.5000 to make the sum of chances 100%.

Actually, the intercept reflects the random probability that the East side wins over the West side, all other things being equal. As Toyozakura was on the West side, his intercept would have been 46.38%. Of course, there is no theoretical reason to believe that the East side is generally better than the West side (even when rank differences are accounted for). But multiple regression does not know such stuff, it just seeks for the best numbers to represent and explain a given pattern. That is why the performance for predicting past bouts (i.e. multiple regression) is higher than the actual performance of predicting future bouts.

Predicting past bouts has given me incredibly good numbers of close to 61% Sekitori bouts correct. That would be better than the best players of the world. However, there is no reason to believe that multiple regression will predict 61% correct in the future because you cannot expect random noise (like 53.62% rather than 50%) to replicate into the future.

  • 2 weeks later...
Posted
But what if there is a non-linear correlation between independent and dependent variable? I'm just asking because, you know, your main predictor (rating difference) just happens to be a non-linear function to the expected result by definition of the ELO system...

Drats, that's correct... Unfortunately, my statistics knowledge is too limited to know how I should accommodate for that. Any idea? Or do you know how robust multiple regression is with respect to some violations?

One standard procedure is to include squared values of the independent variable, thus y = a1*X1 + a2*(X1^2) +... If a2 is insignificant, you can drop the square again (=a linear appromiation is appropriate).

May I ask whether either you or Doitsu have the strenght ratings published somewhere? I'd be very interested in how they evolve basho per basho. For example, where Baruto is currently compared to the Ozeki.

Posted
May I ask whether either you or Doitsu have the strenght ratings published somewhere? I'd be very interested in how they evolve basho per basho. For example, where Baruto is currently compared to the Ozeki.

The data are unpublished. Anyway, according to my ratings Baruto is already on par with Chiyotaikai, Kaio, and Kotooshu. Kotomitsuki is some 60 points away, Harumafuji some 160 points.

Posted
May I ask whether either you or Doitsu have the strenght ratings published somewhere? I'd be very interested in how they evolve basho per basho. For example, where Baruto is currently compared to the Ozeki.

The data are unpublished. Anyway, according to my ratings Baruto is already on par with Chiyotaikai, Kaio, and Kotooshu. Kotomitsuki is some 60 points away, Harumafuji some 160 points.

It should be mentioned that the ratings of Baruto, Chiyotaikai, Kaio and Kotooshu are borderline ozeki material, more like good sekiwake ratings. Kotomitsuki is in ozeki territory with Harumafuji scratching yokozuna level (maybe helped by a temporary boost, maybe not - we'll see).

Posted
The data are unpublished. Anyway, according to my ratings Baruto is already on par with Chiyotaikai, Kaio, and Kotooshu. Kotomitsuki is some 60 points away, Harumafuji some 160 points.

I'm curious what you use for the different factors? After seeing this thread, I implemented my own ELO calcs, but I'm not quite happy with the K-factors I have... But I don't yet have the time (or skill) to refine it in depth.

Either way, I just plan to use it to call bouts I'm clueless on.

Posted

Best read whan drunk I presume?

why do we have 3 Germans diiscussing mathematical variables on a given Wednesday under a particular Moon on a Barometer reading of X (if X is comparable to a tangent equivalent to the mass of ...) of - a sumou bout prediction prog?

Especially on such gems as

difference in body height

difference in career winning percentages

difference in winning percentages of the corresponding division

difference in winning percentages throughout the last year

difference in winning percentages for corresponding technical preferences (computed through a method described in this old thread

difference in winning percentages for corresponding body weight (e.g. if the higashi rikishi is heavier, the difference will be between the historical winning percentage of the higashi rikishi against lighter opponents, and the winning percentage of the nishi rikishi against heavier opponents)

difference in winning percentages for corresponding body size (using the same method as with j))

difference in Elo points (strength rating)

difference in winning percentage during the basho

difference in winning or losing streak length

difference of wins in the last five bouts

can I please buy a Skoda?

Posted (edited)
The data are unpublished. Anyway, according to my ratings Baruto is already on par with Chiyotaikai, Kaio, and Kotooshu. Kotomitsuki is some 60 points away, Harumafuji some 160 points.

I'm curious what you use for the different factors? After seeing this thread, I implemented my own ELO calcs, but I'm not quite happy with the K-factors I have... But I don't yet have the time (or skill) to refine it in depth.

Either way, I just plan to use it to call bouts I'm clueless on.

To those who'd like to know - the K factor is the weight assigned to a single bout. A very commonly used K factor is 20. It represents the sum of potential point gains for the two competing parties. Let's say that two rikishi A and B of equal strength are facing each other. If rikishi A is winning, he'll gain 10 points and B loses 10 points. If B is winning, he'll gain 10 points, and A will lose the same amount. 10 + 10 = 20.

Now in very lopsided bouts, say Hakuho vs. Hokutoriki, a K factor of 20 might mean that Hakuho gains only 2 points when he is winning (with Hokutoriki losing 2), whereas Hokutoriki would win 18 points if he emerged victorious (with Hakuho losing 18). 2+18 =20.

Doitsuyama was the first to implement Elo-based rikishi ratings. He used it to predict only sekitori ratings. AFAIK he was using a K of 20, but a higher value for shin-Juryo to account for their sometimes stellar rise through the ranks.

Zentoryu implemented Elo ratings for rikishi from all divisions. IIRC he also used a K of 20 for sekitori bouts, but a K of 40 for Makushita and below (to account for the fact that lower divisions guys have only seven bouts per basho).

In my own humble attempts I worked with a K of 20 throughout all divisions. But I was often thinking of using a higher K value. Sometimes I am annoyed of how slowly the system adapts to sudden changes. For instance, most shin-Juryo and shin-Makuuchi actually perform much stronger than my Elo ratings seem to indicate.

Edited by Randomitsuki
Posted

In banking observed dependent variable is often binary (will the loan default or not?), yet the similar models are constructed for probability of default. Then however not linear, rather logistic regression is used, that is resulting in a function separating two regions in multidimensional space with almost sure 1 and almost sure 0. The interesting things then happen in region between, where probabilities of default are meaningful, like 0.1%, 2% etc.

As regards stepwise selection of predictors, my practice shows that slight changes of regression parameters (level of confidence, when the predictor enters model; slight perturbances of predictors (i.e. replacing X by strongly correlated X'); level of confidence when the predictors leaves model) can have strong impact on the final model. Besides its predictive power also the composition of selected predictors might change significantly. That said, I suppose you have tried many models and selected the one with most natural sense and with sufficiently good predictive power. Am I right?

If you have done so much work with data, have you tried to run regression for any subsets of data separately, e.g. for past 3 years, 3 years before, etc.?

Anyway, your analysis have one advantage in comparison with banking models. If any predictor and its role in the banking model leaks, its predictive power is usually quickly deteriorated because of model gaming. In ozumo probably nobody would try to game your model. And of course... (I am not worthy...)

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...