a year ago
a while ago I talked to Brody if Railway allows scraping, the short answer is "we'll take down your project if we get a take down notification".
but still, can I take the risk of hosting it in Railway? like will you guys remove it immediately or only if the project gets a notification?
asking this bcs im having problem with another provider, not related to the project itself but billing stuff, and i really dont want to go to GCP or AWS for this.
35 Replies
a year ago
N/A
a year ago
twitter scraper?
a year ago
no, the football thing
a year ago
is your bot a good bot
a year ago
if it does respect robots.txt? no, but clearly i wont ddos the site with requests
a year ago
would the robots.txt rules disallow your bot?
a year ago
it uses a user-agent: * to disallow the paths i want to go
a year ago
hmmmm
a year ago
and btw the website i want to scrape gets its data from a bigger company (by scraping)
a year ago
so they're scraping anyway
a year ago
is there really no API for the data you want?
a year ago
not from the official company
a year ago
its EA btw
a year ago
what's the max rps your bot would do
a year ago
it depends, for scraping i can reduce it to 10 or 30 rps (it only scrapes in intervals bigger than one hour so i can reduce it no problem). i also download some images.
there's a single specific endpoint that needs to be in realtime when my customer requests it but only premium users would have access to it
a year ago
and if im correct, most endpoints im getting the data from seems to be cached in Cloudflare anyway
a year ago
should be no harm then
a year ago
but yes we will not hesitate to take it down if we receive a report
a year ago
ok and another question, would my project get taken down or my whole account?
a year ago
whole team
a year ago
we’re walking the line of a big big if here
a year ago
if you’re not abusing the site, then they won’t issue a takedown request
a year ago
therefore, railway won’t take down your app
a year ago
If your app doesn’t respect the robots.txt, you’re taking a risk
a year ago
but that risk isn’t huge
a year ago
30 rps is a lot though, so up to you to implement mitigations
a year ago
Is there a reason why you need updates at that frequency?
a year ago
could you update at a lower frequency/get your data from a different source?
a year ago
if you know they’re getting their data from EA at a high frequency, why not do the same?
a year ago
30 rps is just the max peak it might get, pretty sure that most of the time it'll be below that
a year ago
there isn't any other source available that gets all the data I need, and yes for now the interval is about every 12 hours but I might decrease that if needed by any customer
a year ago
it needs auth and overall it's a mess to deal with it but in the future maybe
a year ago
and just being sure: yes I'm taking into account any request I made to the server to not overwhelm it
a year ago
so final answer, run it on railway
a year ago
!s
Status changed to Solved brody • 12 months ago