Jump to content

Recommended Posts

Posted

I know of scgroup which has day-to-day information (but seems a tad off at least with basho numbers, if not more) for sekitori from 1994 or so. It has kimarite, rank, and all that fun jazz. However, it still seems incredibly lacking to me (not for lack of effort, and not criticizing), as in there is such a wealth of information that could be available, or should be available, but I have the sad feeling is not.

Does anyone know of databases that contain the following information:

  • Shikona changes (along with which banzuke the change took effect, along with prior shikona and new shikona)
  • Date of birth of all sekitori rikishi (or lower division) going back at least as far as the scgroup data goes back
  • Weights of the rikishi in each tournament during that period (by basho, not top weight, or whatnot)
  • The head-to-head data that they always show during the sumo broadcasts (though I realize that I could figure that out with the data provided, so this is really not that necessary)
  • Heights of every rikishi (by basho if it changes, I s'pose)
  • Injury times, durations, and type (elbow, wrist, back, whatever) - this is probably the least available and least likely to be accurate info, but it'd be a major boon to analyzing trends
  • Favoured techniques, such as whether someone is migi-yotsu, or an oshi-man, or whatnot
  • Basho required to become a sekitori
  • Oyakatas (including changes with dates) and the style of sumo of that rikishi, etc.

For those of you familiar with baseball statistics, there is SABR (Society for American Baseball Research) or somesuch. It'd be nice to have all that data to create SJSR (Society for Japanese Sumo Research), though it wouldn't be as catchy in creating a word like 'Sabermetrics' as 'Sjsrmetrics' doesn't exactly roll off the tongue.

I would love to be able to sift through data to determine things like:

  • After knee surgery, how long does the average rikishi take to recover?
  • How much does it impact his career stats?
  • How much does aging affect a rikishi's ability to compete?
  • Do certain rikishi perform noticeably better in a specific venue?
  • Are heavier rikishi more likely to get injured?
  • What injuries are most common by age or by size?
  • Are specific injuries more likely to occur in regards to certain styles of sumo?
  • Is speed up the banzuke correlated to injury likelihood and general durability?
  • Are certain styles of sumo more represented in makuuchi?
  • Are those styles also heavily represented in the lower ranks?
  • Does an oyakata have a visible impact on the success of the rikishi under him?

And once I started answering some of those questions, I'm sure more would come up. But the problem is data. There seems to be a severe lack of data in contrast to the amount of potential data that could be taken. If some of this data doesn't exist, wouldn't it be interesting to start gathering information so that in 10 years we could answer some of these questions? Are some of the data points I'd want to get even able to be found?

It would be wonderful to be able to provide data-based conclusions about new rikishi, to analyze injury risk, to determine which rikishi a newcomer most resembles up to that point, and other analysis. But I need some help. Am I the only one interested in this?

(the only real metric I've seen was the one on Yokozuna strength in this forum, that was pretty impressive in itself, but I want more)

Posted (edited)

I'd suggest starting with Moti's site. Won't answer all of your questions, but a great starting point and lots of links to keep you going.

The Sumo Colosseum has lots of good stuff, as does Sumo "who" . Someone already mentioned Hakkeyoi! to you in another thread. It's a fine resource. Each of these places have links to more great stuff, and a little snooping around SF here, the SML archives on Banzuke.com will bring many more great sites to your attention.

Happy hunting (Laughing...)

Edited by Otokonoyama
Posted

All of those sites I've visited at one time or the other, but none manages to actually provide the data in a usable format. I mean, for the purposes that they're designed they work great. Dichne's site is wonderful at tracking certain rikishi. Hakkeyoi is great for certain other types of searches (specifically involving the banzuke), but they aren't too compatible, nor do they really attempt to offer a big picture so much as specified information of interest.

To try to coordinate the information to try to provide some sort of big picture would be insanely difficult, and probably not worth the time due to the huge gaps involved and continuuing problems involved (the constant re-synching of data in different formats, the processor cycles spent filling in the gaps like a giant logic problem, etc.).

I find it incredibly difficult to believe that there's no repository of this stuff in Japanese even. I would assume that some otaku hidden deep in some darkened hole of a room has been recording this information in Excel since history began, I just haven't been able to find a site with my searches.

And while some of my questions' answers are hinted at by the data there, it's nowhere NEAR conclusive, or a large enough sample to determine correlation, or potential other factors, because the data is, for the most part, pretty cherry-picked.

There are currently hundreds of rikishi in sumo, from young to old, thin to fat, yotsu to oshi, and competing in thousands of bouts a year. It isn't quite on the level of baseball, but I can live with that. What I have trouble with is the lack of data. I'll try to do some more searches upon my return and see if I can come up with something.

Check out a site like the Hardball Times to see the magic you can perform with good data.

Posted
I find it incredibly difficult to believe that there's no repository of this stuff in Japanese even. I would assume that some otaku hidden deep in some darkened hole of a room has been recording this information in Excel since history began, I just haven't been able to find a site with my searches.

Have patience my friend. As we speak, a fantastic, unprecedented database is being honed and made ready. And no, it isn't done by me. I'm too lacking in all fields needed.

But patience, sir, and you will be rewarded big time.

Posted
Am I the only one interested in this?

I can answer this question..... YES!

No he isn't. Just for the sake of it, anyone who IS interested, please add your signature.

I am. Very interested.

Posted

Do we have an ETA on this database? And will it have a pre-established front-end (like Hakkeyoi.net), or will we be granted access to the raw data (

Posted
(I will have patience, I promise, you just need patience for my patience!)

Quote of the day! (Laughing...) (hey, where has that :rofl: smilie gone??)

@Kintamayama: *signature added*

Posted

You might know these websites.....

I do not know thses HP helps you or not, but....

http://www.fsinet.or.jp/~sumo/sumo.htm

Kiroku no Tamatebako

http://park11.wakwak.com/~tsubota/door1.html

Sumo hyouronka no page

http://shivare.hp.infoseek.co.jp/ozumo/index.html

銀河大角力協会

http://gans01.fc2web.com/

oozumou hoshitorihyou

The person who makes "oozumou hoshitorihyou" told me, he went to Kokuritsu Kokkai-toshokan (one of the biggest Library in Japan) so many times and got information. He spend huge time for it.

http://gans01.blog70.fc2.com/

↑ This is his web-blog. He is introduceing sumo articles of Meiji and Taisho period.

The person who makes "Sumo hyouronka no page" is still figuring out some information, because some of data are very old and written in old-Japanese lanuage, or some parts are missing. Some datas are totally missing because of the War.

http://www.ndl.go.jp/en/index.html

Kokuritsu kokkai toshokan

You can find most of information from this library.

Posted
Do we have an ETA on this database? And will it have a pre-established front-end (like Hakkeyoi.net), or will we be granted access to the raw data (
Posted
Just for the sake of it, anyone who IS interested, please add your signature.

Here it goes:

(Laughing...) (Laughing...) (Blushing...) ;-) :-P

Posted

Gacktoh, those sites contain MASSIVE amounts of information, sweet Jesus!

I need to find a job that will pay me to compile all that data into spreadsheets, or beg and plead the site owners to send me the raw data.

Kintamayama, the 'front-end' is the thing that allows you to sift through the data. It can be very limiting (as in you pick a rikishi, a tournament, and a day, and it tells you if he won or lost) to being rather complex (like hakkeyoi.net), but it limits the ways in which you can manipulate the data.

If you have the raw data on the other hand, you can make your own front-end. I know how to work excel reasonably well, so I can pop the data into a few spreadsheets, and find whatever it is I want with a little bit of elbow-grease and some creativity.

In short, for the regular user, a front-end is very enabling, as it allows people to fiddle with the data relatively easily, and tends to make the results pretty-accessible and easy to understand.

On the other hand, the raw data is better for people who want to go beyond the front-end, to find things that the interface doesn't allow them to search for.

So I hope that he gives access to his database for the rest of us sumo fans who want more more MORE data to fiddle with, and who can create methods of sorting that data to answer questions nobody else has asked.

Posted
As we speak, a fantastic, unprecedented database is being honed and made ready.

I was going to create on my new sumo site (not online yet) a sumo database, but after hearing this I think I will wait to see this new database

Posted
Does anyone use mixi? I'm sure there must be a sumo group in that.

i'm on mixi, one of the sumo groups im in has almost 6000 members

i don't know if my japanese as quite to the level of figuring out these statistical questions ( i don't even understand what your talking about in english half the time) but i'll give it a shot and post a topic, see if anybody replies

the problem with mixi is that there are so many people that the amount of replies soon becomes unwieldly and tough to review them all, especially in japanese (Sign of approval...)

Posted

I'm on mixi as well, but the prior problems exist.

My Japanese is middling at best when it comes to reading, which causes huge problems when trying to sift through massive blocks of chatter, especially with internet lingo, even if I use rikaichan or another tool to aid me in my pursuit of truth, justice, and what the dickens that Kanji is. But were you to ask the questions above, if I may give it a shot as to what sort of Japanese you'd use (please feel free to correct the incorrect Japanese which is bound to come up):

大相撲のデータベースが作りたいですので、インフォメーションを探しています。データベースのようにデータがある方がいいですけれども、組織があるだけでいいです。このデータがあれば、教えてくれて下さいませんか?

[*]四股名変更 【前の四股名、後の四股名、新しい四股名が初めて使った場所とか】

[*]力士の誕生日【関取が一番必要けど、序の口から幕下もあればお願いします】、今の時代だけでは無く、昔のデータも

[*]力士の体重【毎場所の体重、それで太った方と痩せた方が見えるようになるから】

[*]NHK大相撲放送と同じような力士対力士の星取り【例えば、豊ノ島対安馬、0-6とか、でも関取だけではなく、幕下とかのデータのみあればお願いします】

[*]力士の身長【若い力士の身長が変われば、何場所で何センチ高くなったとかのデータもあればお願いします】

[*]力士の傷害やけが【身体のどこ、何のけが、なった日にち、手術したかしてない、したらどの手術した、手術の日にち】

[*]力士の良く使う技【左よつとか押し相撲とか

Posted
Am I the only one interested in this?

I can answer this question..... YES!

No he isn't. Just for the sake of it, anyone who IS interested, please add your signature.

I am. Very interested.

I'm interested too. I've been starting to collect my own data but I don't have the time or training to put together the kind of package described.

Also, what is mixi?

I've got to get my wordtank repaired so I can do some proper reading practice again if this data is available only on japanese sites.

Posted

This announcement made me drool all over the keyboard.

I still have some Python scripts for a kimarite analysis that I wanted to post here. I never got around to finishing the job. Now I wonder if I should wait for that super-hyper-mega-database thing...

Posted
  • 四股名変更 【前の四股名、後の四股名、新しい四股名が初めて使った場所とか】
  • 力士の誕生日【関取が一番必要けど、序の口から幕下もあればお願いします】、今の時代だけでは無く、昔のデータも
  • 関取まで何場所がかかった
  • 力士の傷害やけが【身体のどこ、何のけが、なった日にち、手術したかしてない、したらどの手術した、手術の日にち】
  • 親方変更【前の親方、次の親方、変更の日にち】

You can find these data from this website, Kiroku no Tamatebako. Go this page Sekitori meikan, most of rsekitori are listed by AIUEO oder. For example, Chiyonofuji, go "ち", and click "千代の富士 貢" http://www.fsinet.or.jp/~sumo/profile/1/19750901.htm

About rikishi's birthday.....

http://ameblo.jp/mononofu-sumo/

This weblog is one of the best.

[*]力士の良く使う技【左よつとか押し相撲とか

Posted

So much data... I am envious of the free time of some of these webmasters...

Dear God -- anyone want to sponsor me to make sumo statistics full-time? I swear to God I would be good at it. Thank you SO MUCH Gacktoh -- that information is stellar!

Posted

I really must learn SQL, but I have no use for it outside of sumo and baseball statistics. That seems like a waste, no? But oh so tempting...

Posted
I really must learn SQL, but I have no use for it outside of sumo and baseball statistics. That seems like a waste, no? But oh so tempting...

There's nothing much to learn. It's the most simple machine language in the world. Takes You less than a day to cover the basics, and You rarely need more.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...