so remember my scraper project thingy?
passos
MODERATOROP

10 months ago

a while ago I talked to Brody if Railway allows scraping, the short answer is "we'll take down your project if we get a take down notification".
but still, can I take the risk of hosting it in Railway? like will you guys remove it immediately or only if the project gets a notification?

asking this bcs im having problem with another provider, not related to the project itself but billing stuff, and i really dont want to go to GCP or AWS for this.

Solved

35 Replies

passos
MODERATOROP

10 months ago

N/A


brody
EMPLOYEE

10 months ago

twitter scraper?


passos
MODERATOROP

10 months ago

no, the football thing


brody
EMPLOYEE

10 months ago

is your bot a good bot


passos
MODERATOROP

10 months ago

if it does respect robots.txt? no, but clearly i wont ddos the site with requests


brody
EMPLOYEE

10 months ago

would the robots.txt rules disallow your bot?


passos
MODERATOROP

10 months ago

it uses a user-agent: * to disallow the paths i want to go


brody
EMPLOYEE

10 months ago

hmmmm


passos
MODERATOROP

10 months ago

and btw the website i want to scrape gets its data from a bigger company (by scraping)


passos
MODERATOROP

10 months ago

so they're scraping anyway


brody
EMPLOYEE

10 months ago

is there really no API for the data you want?


passos
MODERATOROP

10 months ago

not from the official company


passos
MODERATOROP

10 months ago

its EA btw


brody
EMPLOYEE

10 months ago

what's the max rps your bot would do


passos
MODERATOROP

10 months ago

it depends, for scraping i can reduce it to 10 or 30 rps (it only scrapes in intervals bigger than one hour so i can reduce it no problem). i also download some images.

there's a single specific endpoint that needs to be in realtime when my customer requests it but only premium users would have access to it


passos
MODERATOROP

10 months ago

and if im correct, most endpoints im getting the data from seems to be cached in Cloudflare anyway


brody
EMPLOYEE

10 months ago

should be no harm then


brody
EMPLOYEE

10 months ago

but yes we will not hesitate to take it down if we receive a report


passos
MODERATOROP

10 months ago

ok and another question, would my project get taken down or my whole account?


brody
EMPLOYEE

10 months ago

whole team


adam
MODERATOR

10 months ago

we’re walking the line of a big big if here


adam
MODERATOR

10 months ago

if you’re not abusing the site, then they won’t issue a takedown request


adam
MODERATOR

10 months ago

therefore, railway won’t take down your app


adam
MODERATOR

10 months ago

If your app doesn’t respect the robots.txt, you’re taking a risk


adam
MODERATOR

10 months ago

but that risk isn’t huge


adam
MODERATOR

10 months ago

30 rps is a lot though, so up to you to implement mitigations


adam
MODERATOR

10 months ago

Is there a reason why you need updates at that frequency?


adam
MODERATOR

10 months ago

could you update at a lower frequency/get your data from a different source?


adam
MODERATOR

10 months ago

if you know they’re getting their data from EA at a high frequency, why not do the same?


passos
MODERATOROP

10 months ago

30 rps is just the max peak it might get, pretty sure that most of the time it'll be below that


passos
MODERATOROP

10 months ago

there isn't any other source available that gets all the data I need, and yes for now the interval is about every 12 hours but I might decrease that if needed by any customer


passos
MODERATOROP

10 months ago

it needs auth and overall it's a mess to deal with it but in the future maybe


passos
MODERATOROP

10 months ago

and just being sure: yes I'm taking into account any request I made to the server to not overwhelm it


brody
EMPLOYEE

10 months ago

so final answer, run it on railway


brody
EMPLOYEE

10 months ago

!s


Status changed to Solved brody 10 months ago


Loading...