Octofuji

Sumo DB and computing

Recommended Posts

It was interesting to see the recent discussion on the forum about various programming languages.

I've been wondering for a while how the regular stats contributors query the database.

Is it by writing a script to parse the HTML that comes back from the relevant Sumo DB web pages?

(I'm assuming there's not an API anywhere that people are using, as I think I would have picked this up in passing, but I thought it was worth asking on the off chance).

Share this post


Link to post
Share on other sites

I use my VisualBasic system for downloading news (&more) pages/sites with a sumodb filter subroutine to extract the data from the HTML code of the text-only pages.

Edited by Akinomaki
  • Thanks 1

Share this post


Link to post
Share on other sites

If you came from that discussion, you probably already know that I use Python to scrape; more specifically requests combined with URL manipulation logic and BeautifulSoup4 to do whatever I want to do. IDE of choice is PyCharm and I just output to text or csv files, but I might move to VSCode if I can figure out how to work it (in terms of support for languages, it seems to be the replacement for Eclipse, which was my first proper IDE). 

The problem I have so far is that before I took over the PDY thread, I didn't have a defined use case to scrape the DB for, so queries were written on an ad hoc basis. It's been a dream of mine to replicate the DB to be able to perform more powerful lookups and analysis like what Andy on Tachiai does, but I haven't had the time for it lately and was stuck on the data structure to adopt before that. 

  • Thanks 1

Share this post


Link to post
Share on other sites

Thank you both. It's good to know for sure that there's not an API that I'm missing.

I will try something similar with PHP/PhpStorm (BeautifulSoup does look nice, but I don't know any Python).

At the moment I don't have a particular use case in mind, other than tracking cross-basho wrestler streaks. But perhaps something else will occur to me once I get into the data.

 

Share this post


Link to post
Share on other sites

It would be nice to have an API but I think that's a lot of dev work and possibly hosting bandwidth that would not be fair to expect Doitsuyama and co to take on. And that's before considering whether the number of users of that API, on a regular and not just one-off basis, would be high enough to justify that work in the first place. 

Share this post


Link to post
Share on other sites

Yes all good points. It's astonishing that the database exists (and is maintained) at all. My initial thoughts were creating and hosting an API myself but then reality struck and I starting worrying about the above concerns, not to mention the difficulty of keeping a replicated DB in sync.

 

Share this post


Link to post
Share on other sites

I've written an API (in golang) along with an initial dashboard (React) for sumo data. The inital dataset is based off of sumodb, but it automagically updates based on the NSK site from now on, plus any manual tweaks that might be needed.

I want to add a lot more endpoints, but haven't got time right now. Give it a try, let me know if you find any problems or issues, or if you have specific data you want an endpoint for.

On 28/03/2023 at 13:22, Seiyashi said:

It would be nice to have an API but I think that's a lot of dev work and possibly hosting bandwidth that would not be fair to expect Doitsuyama and co to take on. And that's before considering whether the number of users of that API, on a regular and not just one-off basis, would be high enough to justify that work in the first place. 

 

  • Like 2
  • Thanks 2

Share this post


Link to post
Share on other sites

I'll definitely look into it when I've the time. Might not be for a while, though.

  • Like 1

Share this post


Link to post
Share on other sites
6 hours ago, thatsumoguy said:

I've written an API (in golang) along with an initial dashboard (React) for sumo data. The inital dataset is based off of sumodb, but it automagically updates based on the NSK site from now on, plus any manual tweaks that might be needed.

I want to add a lot more endpoints, but haven't got time right now. Give it a try, let me know if you find any problems or issues, or if you have specific data you want an endpoint for.

 

image.png.7cfbd27ad8a9724ba12820e413604383.png

the selection text for Basho and Division are swapped, functionality is fine though. Appreciate the effort - although I don't foresee needing it any time soon.

  • Like 1

Share this post


Link to post
Share on other sites

If you open https://www.sumo-api.com/api-guide directly you get {"error":"NOT_FOUND"}, you have to open the page clicking on another page

Nice work, I hope you can continue with it.

 

I need of course both Japanese and English data to create my picture lists and have kensho and spirited rikishi shown in alphabet form

Edited by Akinomaki
  • Like 1

Share this post


Link to post
Share on other sites
13 hours ago, Akinomaki said:

If you open https://www.sumo-api.com/api-guide directly you get {"error":"NOT_FOUND"}, you have to open the page clicking on another page

Nice work, I hope you can continue with it.

I need of course both Japanese and English data to create my picture lists and have kensho and spirited rikishi shown in alphabet form

Yes, i'm aware of that problem. I should have a fix for it soon. As for Japanese names, all currently active Rikishi have the Kanji Shikona, but this isn't always returned. Is there an endpoint where you need it but it isn't included?

Share this post


Link to post
Share on other sites
14 hours ago, Yarimotsu said:

image.png.7cfbd27ad8a9724ba12820e413604383.png

the selection text for Basho and Division are swapped, functionality is fine though. Appreciate the effort - although I don't foresee needing it any time soon.

Ah I thought I fixed that, but I must have been too hasty and fixed the wrong component - d'oh.

Share this post


Link to post
Share on other sites

Well I'm glad I asked the question now, I didn't really expect a positive answer though! This is really good.

Does automagically mean every day during the basho?

Also, is the `rikishId` your own identifier? If there was a way to include the ID used on sumodb (e.g. http://sumodb.sumogames.de/Rikishi.aspx?r=11927 for Terunofuji) that would be the icing on the cake.

Share this post


Link to post
Share on other sites
21 minutes ago, Octofuji said:

Well I'm glad I asked the question now, I didn't really expect a positive answer though! This is really good.

Does automagically mean every day during the basho?

Also, is the `rikishId` your own identifier? If there was a way to include the ID used on sumodb (e.g. http://sumodb.sumogames.de/Rikishi.aspx?r=11927 for Terunofuji) that would be the icing on the cake.

Yes, it updates the match results during the basho, with at most 1 minute between the result and my database being updated, provided the NSK link doesn't flake out.

The sumodb id is in my database (since I used it as the base), I can make it available if that's useful.

Edit: that sumodb id only exists on the rikishi object itself, everywhere else is my own id. There would be a way to look up other details off the back of the that original ID, but it might be costly (computationally).

Edited by thatsumoguy
  • Like 1

Share this post


Link to post
Share on other sites
2 minutes ago, thatsumoguy said:

Yes, it updates the match results during the basho, with at most 1 minute between the result and my database being updated, provided the NSK link doesn't flake out.

The sumodb id is in my database (since I used it as the base), I can make it available if that's useful.

Excellent!

I think the sumo DB rikishi ID would be useful, especially if people want to generate formatted posts with links to the SumoDB rikishi page.

Share this post


Link to post
Share on other sites

I've had a play around with the basho service and it's very responsive and nice to work with. Once I've got some client code I'm happy with I'll share it on here.

One minor suggestion for the future - it would be nice to indicate the yusho winner in the basho API. I don't think there's any way to work this out from the data in the case of a playoff. (It's not relevant for what I'm doing though).

 

  • Like 1

Share this post


Link to post
Share on other sites
29 minutes ago, Octofuji said:

I've had a play around with the basho service and it's very responsive and nice to work with. Once I've got some client code I'm happy with I'll share it on here.

One minor suggestion for the future - it would be nice to indicate the yusho winner in the basho API. I don't think there's any way to work this out from the data in the case of a playoff. (It's not relevant for what I'm doing though).

 

Glad it's proving useful. Yes Yusho, and special prizes etc. is on my list of thing to do. You might also notice that for historical banzuke, the individual match results aren't there in the records field. I need to run a script to generate this for everything before 202303.

  • Like 1

Share this post


Link to post
Share on other sites
On 01/04/2023 at 23:53, thatsumoguy said:

Glad it's proving useful. Yes Yusho, and special prizes etc. is on my list of thing to do. You might also notice that for historical banzuke, the individual match results aren't there in the records field. I need to run a script to generate this for everything before 202303.

Thanks, that would be really good. Will the banzuke update each day of future bashos as well as the torikumi? So that, for example, on Day 4 of the basho you will see the first four results for everyone?

Share this post


Link to post
Share on other sites

Also, how do you manage to distinguish, for lower ranked wrestlers, between absences (not fusen) and days when they just have no bouts scheduled? For example for Tochiseiryu who went 1-3-3 last month.

Edit: I can see that sumo.or.jp makes this distinction while the sumo DB doesn't.

Edited by Octofuji
Answered own question!

Share this post


Link to post
Share on other sites
2 hours ago, Octofuji said:

Edit: I can see that sumo.or.jp makes this distinction while the sumo DB doesn't.

Yes exactly, when I grab the data from the NSK site there is a clear difference in the data between an absence and a non-scheduled day. This is actually proving a problem for back generating the records however... I'm working on that. Once this is sorted I will develop a dashboard to show the hoshitori for each division, and the endpoints will contain the data too.

Yes, everything should stay up to date for all future tournaments. I monitor what is going on when possible and I actually noticed some mistakes in some records from the last basho, which I am correcting.

Share this post


Link to post
Share on other sites

I haven't read through this comprehensively, but well done @thatsumoguy. I got frustrated with NSK's blocking of javascript/refusal to accept api's and bailed.

  • Like 1
  • Thanks 1

Share this post


Link to post
Share on other sites
5 minutes ago, Godango said:

I haven't read through this comprehensively, but well done @thatsumoguy. I got frustrated with NSK's blocking of javascript/refusal to accept api's and bailed.

Yeah, sometimes my requests were ok, othertimes not... but their site always had the results, so there must be a way to make it more reliable. So I copied the exact request from the browser, trimmed out the unnecessary parts and made sure my code made that request... et voila results from their ajax call every time.

  • Like 2

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now