Doitsuyama

Sumo Reference Updates

Recommended Posts

So I'm just stating for a fact that I'm going to make the data accessible and queryable on a more reliable basis. That said, if you are getting advertiser income through it then I have no desire to chip away at that. Any suggestions as the site maintainer for how I can improve the reliability situation without harming the sumodb's income?

  • Like 2

Share this post


Link to post
Share on other sites

Not just bots, the browser can also only access 1 thing at a time. Opening a page I have no win/loss marks, they are images and not all are loaded. Reload after some time has something else that isn't loaded, looking at another rikishi page quickly if the first one is not what I wanted gives me the 403 error, and the 404 one also occurs frequently.

  • Like 1

Share this post


Link to post
Share on other sites
Posted (edited)

A few people have mentioned the shiroboshi/kuroboshi markers not loading for them, and also getting 403 errors when they've definitely not accessed more than 6 pages in 10 seconds, but neither have happened to me yet, though I've not been heavily using the site.

I still occasionally get the 404 error like previously when the site goes "down" or whatever is happening (was last down for me around midnight, so around 20 hours ago), but it's much better than it was last week (though it was basically down the whole week so anything is an improvement!)

EDIT: OK, now I'm getting 403 errors, and this was the first page I'd accessed in hours, so there's something a little weird going on there.

Edited by Hidenotora
  • Like 1

Share this post


Link to post
Share on other sites

Its definetely happenning to me, its getting annoying. I use a university wifi and because it is an IP used by a bunch of people, to use the site properly or I need to boot up the VPN on my computer or change to mobile, which SumoDB is definetely not optimized for it.

Share this post


Link to post
Share on other sites

I started a YouTube channel about sumo a couple months ago because I had a lot of ideas stemming off the data in sumodb, and especially the query system (which doesn't work with the Internet Archive, although the rest of the site appears to). I would 100% donate to such a cause, not only for the communal good but because I have personal reasons to want it routinely accessible.

Share this post


Link to post
Share on other sites
On 22/04/2022 at 12:30, Kaitetsu said:

Its definetely happenning to me, its getting annoying. I use a university wifi and because it is an IP used by a bunch of people, to use the site properly or I need to boot up the VPN on my computer or change to mobile, which SumoDB is definetely not optimized for it.

I don't know your exact experience, of course, but rest assured using university wifi is at most a tiny slice of any problems you're having. I'm currently out of town, which means I went from a gaming desktop on a hardwired FiOS connection to a laptop of similar spec using two different home wifis (one FiOS, one cable), and the connection issues are the same in all three places.

  • Like 1

Share this post


Link to post
Share on other sites
On 20/04/2022 at 00:15, Hoshotakamoto said:

So I'm just stating for a fact that I'm going to make the data accessible and queryable on a more reliable basis. That said, if you are getting advertiser income through it then I have no desire to chip away at that. Any suggestions as the site maintainer for how I can improve the reliability situation without harming the sumodb's income?

Really, if the site is frequently down, how much ad income is he getting anyway? It might be that the best way to help is to simply do what you can to make this information more reliably available, and if he finds a way to keep sumodb up consistently, be willing to take yours down so he gets that ad income back. 

Share this post


Link to post
Share on other sites
6 hours ago, Sumo Spiffy said:

Really, if the site is frequently down, how much ad income is he getting anyway? It might be that the best way to help is to simply do what you can to make this information more reliably available, and if he finds a way to keep sumodb up consistently, be willing to take yours down so he gets that ad income back. 

I don't really understand how the site works (other than it won't load the front page for me at this moment), but it says it is "hosted by our sponsor adplorer" and then I don't see any ads. So I don't know if anyone is actually benefiting from hosting it, but I didn't want to just come out with a slap in the face and be like "i'm going to steal all your traffic now that I have all the same data".

Share this post


Link to post
Share on other sites
1 hour ago, Hoshotakamoto said:

I don't really understand how the site works (other than it won't load the front page for me at this moment), but it says it is "hosted by our sponsor adplorer" and then I don't see any ads. So I don't know if anyone is actually benefiting from hosting it, but I didn't want to just come out with a slap in the face and be like "i'm going to steal all your traffic now that I have all the same data".

It's very respectable of you to not want to screw him over, but unless he's said something to the contrary, do we even know if he cares whether or not anyone does this? If it's this difficult to keep the site running reliably, maybe it's more trouble than he wants to deal with and he only does as much as he can because there's no similar alternative. I only just found this forum and some other places where sumo conversations take place, so I don't know how much people talk to each other and/or him.

Share this post


Link to post
Share on other sites
Posted (edited)

Forking a copy of the Sumo Reference data has been done (without consent, AFAIK) quite a few times over the years - I think the most notable copy still up is the Japanese-only one at http://大相撲.jp/. The problem as always is that any new maintainer will have to invest the same effort into keeping the contents up to date as Doitsuyama does, with all the attendant side effects. (For instance, at the aforementioned site the rikishi IDs only match those of the DB up to the date of the forking.) Additionally the quality of the data will invariably diverge because error corrections are unlikely to be made on all forks.

Edited by Asashosakari
  • Like 4

Share this post


Link to post
Share on other sites

Wakanoumi's given name is correctly given as Shunpei on the English side of the DB, but the Japanese data mistakenly has it as すんぺい (Sunpei) for both shikona that he has used.

  • Thanks 1

Share this post


Link to post
Share on other sites
Posted (edited)
On 21/04/2022 at 18:40, Akinomaki said:

Not just bots, the browser can also only access 1 thing at a time. Opening a page I have no win/loss marks, they are images and not all are loaded. Reload after some time has something else that isn't loaded, looking at another rikishi page quickly if the first one is not what I wanted gives me the 403 error, and the 404 one also occurs frequently.

Performance at least was much better now, pages are loading very fast for me. But I see that one page (especially the first after a while) may load several elements like images. So I changed it from 6 requests in 10 seconds to 20 requests in 20 seconds. I can change that of course again if needed, and if performance doesn't suffer.

Edited by Doitsuyama
  • Like 3
  • Thanks 7

Share this post


Link to post
Share on other sites

Was about to comment that after the implementation of the initial 6/10 rate limiting, the UX for the DB tended to be: load into DB landing page, immediately click onto link that the user wants to visit (e.g. banzuke, torikumi, kabu), then immediately get 403ed because it was too fast. That hasn't been a problem since about mid basho, so I assume that was corrected with the new 20/20 rate limit?

Share this post


Link to post
Share on other sites

The 403s are not an issue. If I get one, I just wait a short while and hit refresh.

Share this post


Link to post
Share on other sites
5 hours ago, Seiyashi said:

Was about to comment that after the implementation of the initial 6/10 rate limiting, the UX for the DB tended to be: load into DB landing page, immediately click onto link that the user wants to visit (e.g. banzuke, torikumi, kabu), then immediately get 403ed because it was too fast. That hasn't been a problem since about mid basho, so I assume that was corrected with the new 20/20 rate limit?

No, I just did that today. I think since about mid basho you hit the DB so frequently that ressources like images never left your cache and didn't need to be loaded.

  • Thanks 1

Share this post


Link to post
Share on other sites
Posted (edited)

The head-to-head between Kotokanyu and Kyokushuzan shows a total of 3-1 even though they've only met three times. The interim results after the first two matches are correctly listed as 1-0 and 2-0, but their third meeting shows up when queried for "head-to-head 3-0 before the bout", so Kotokanyu is going from 2-0 to 3-0 in phantom fashion somewhere.

Since I ran into this completely at random, I'm now left to wonder if there are any other such unnoticed discrepancies.

Edited by Asashosakari
  • Thanks 1

Share this post


Link to post
Share on other sites

This fusen bout between Nionoumi and Chiyooga got the wrong hoshitori markers.

 

Oshoma's yusho saw his Kyokai profile get the given name update, and they have it as Deki (でき) rather than Degi, so it looks like Nikkan had that part right after all:
 

On 07/09/2021 at 23:06, Yubinhaad said:
On 07/09/2021 at 02:52, Yubinhaad said:
On 07/09/2021 at 14:10, Yubinhaad said:

I saw that reading last night, but I assumed it was a typo. Looking at the heya media just now, they give it as the shorter version, おうしょうま.

 

I just now noticed that the heya also says the given name reading is Degi (でぎ), not Deki as reported by Nikkan. (Sigh...)

 

  • Thanks 1

Share this post


Link to post
Share on other sites

I was looking at some Ozeki from Niigata and found this interesting bit.

sXGKbgy.jpg

They all apparently lead to the same rikishi file or at least identical files.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now