so remember my scraper project thingy?

a while ago I talked to Brody if Railway allows scraping, the short answer is "we'll take down your project if we get a take down notification".
but still, can I take the risk of hosting it in Railway? like will you guys remove it immediately or only if the project gets a notification?

asking this bcs im having problem with another provider, not related to the project itself but billing stuff, and i really dont want to go to GCP or AWS for this.

Solved

35 Replies

N/A


a year ago

twitter scraper?


no, the football thing


a year ago

is your bot a good bot


if it does respect robots.txt? no, but clearly i wont ddos the site with requests


a year ago

would the robots.txt rules disallow your bot?


it uses a user-agent: * to disallow the paths i want to go


a year ago

hmmmm


and btw the website i want to scrape gets its data from a bigger company (by scraping)


so they're scraping anyway


a year ago

is there really no API for the data you want?


not from the official company


its EA btw


a year ago

what's the max rps your bot would do


it depends, for scraping i can reduce it to 10 or 30 rps (it only scrapes in intervals bigger than one hour so i can reduce it no problem). i also download some images.

there's a single specific endpoint that needs to be in realtime when my customer requests it but only premium users would have access to it


and if im correct, most endpoints im getting the data from seems to be cached in Cloudflare anyway


a year ago

should be no harm then


a year ago

but yes we will not hesitate to take it down if we receive a report


ok and another question, would my project get taken down or my whole account?


a year ago

whole team


a year ago

we’re walking the line of a big big if here


a year ago

if you’re not abusing the site, then they won’t issue a takedown request


a year ago

therefore, railway won’t take down your app


a year ago

If your app doesn’t respect the robots.txt, you’re taking a risk


a year ago

but that risk isn’t huge


a year ago

30 rps is a lot though, so up to you to implement mitigations


a year ago

Is there a reason why you need updates at that frequency?


a year ago

could you update at a lower frequency/get your data from a different source?


a year ago

if you know they’re getting their data from EA at a high frequency, why not do the same?


30 rps is just the max peak it might get, pretty sure that most of the time it'll be below that


there isn't any other source available that gets all the data I need, and yes for now the interval is about every 12 hours but I might decrease that if needed by any customer


it needs auth and overall it's a mess to deal with it but in the future maybe


and just being sure: yes I'm taking into account any request I made to the server to not overwhelm it


a year ago

so final answer, run it on railway


a year ago

!s


Status changed to Solved brody about 1 year ago


Loading...