10 months ago
a while ago I talked to Brody if Railway allows scraping, the short answer is "we'll take down your project if we get a take down notification".
but still, can I take the risk of hosting it in Railway? like will you guys remove it immediately or only if the project gets a notification?
asking this bcs im having problem with another provider, not related to the project itself but billing stuff, and i really dont want to go to GCP or AWS for this.
35 Replies
10 months ago
N/A
10 months ago
twitter scraper?
10 months ago
no, the football thing
10 months ago
is your bot a good bot
10 months ago
if it does respect robots.txt? no, but clearly i wont ddos the site with requests
10 months ago
would the robots.txt rules disallow your bot?
10 months ago
it uses a user-agent: * to disallow the paths i want to go
10 months ago
hmmmm
10 months ago
and btw the website i want to scrape gets its data from a bigger company (by scraping)
10 months ago
so they're scraping anyway
10 months ago
is there really no API for the data you want?
10 months ago
not from the official company
10 months ago
its EA btw
10 months ago
what's the max rps your bot would do
10 months ago
it depends, for scraping i can reduce it to 10 or 30 rps (it only scrapes in intervals bigger than one hour so i can reduce it no problem). i also download some images.
there's a single specific endpoint that needs to be in realtime when my customer requests it but only premium users would have access to it
10 months ago
and if im correct, most endpoints im getting the data from seems to be cached in Cloudflare anyway
10 months ago
should be no harm then
10 months ago
but yes we will not hesitate to take it down if we receive a report
10 months ago
ok and another question, would my project get taken down or my whole account?
10 months ago
whole team
10 months ago
we’re walking the line of a big big if here
10 months ago
if you’re not abusing the site, then they won’t issue a takedown request
10 months ago
therefore, railway won’t take down your app
10 months ago
If your app doesn’t respect the robots.txt, you’re taking a risk
10 months ago
but that risk isn’t huge
10 months ago
30 rps is a lot though, so up to you to implement mitigations
10 months ago
Is there a reason why you need updates at that frequency?
10 months ago
could you update at a lower frequency/get your data from a different source?
10 months ago
if you know they’re getting their data from EA at a high frequency, why not do the same?
10 months ago
30 rps is just the max peak it might get, pretty sure that most of the time it'll be below that
10 months ago
there isn't any other source available that gets all the data I need, and yes for now the interval is about every 12 hours but I might decrease that if needed by any customer
10 months ago
it needs auth and overall it's a mess to deal with it but in the future maybe
10 months ago
and just being sure: yes I'm taking into account any request I made to the server to not overwhelm it
10 months ago
so final answer, run it on railway
10 months ago
!s
Status changed to Solved brody • 10 months ago