is scraping allowed?

I'm planning on scraping some data from a few sites and Railway has always been vague on this matter, is it allowed? I really hope Railway allows it, I don't want to spin up a EC2 😭

12 Replies

N/A


a year ago

what sites are you scraping and did they give you permission?


football sites and no but their data is public


a year ago

data on twitter is public but that would be a gigantic no, football sites are fine, go for it, as long as you have respectable request rates and abide by robots.txt if applicable.


ok! thanks


just to be sure, if their robots.txt doesn't allow it then Railway is against it?

User-agent: Mediapartners-Google
Disallow:

User-agent: Googlebot
Disallow:

User-agent: AdsBot-Google
Disallow:

User-agent: Googlebot-Image
Disallow:

User-agent: *
Disallow: /*?*

cc @Brody (sorry for ping, idk if the thread pops again when closed)


a year ago

you gotta respect the robots.txt like any good robot would, we don't want to have to deal with takedown requests, though we will comply.


a year ago

and unfortunately "it's unlikely for you to be sent a takedown request" is not an excuse


a year ago

oh and you should also have an email in your UA so that web admins can email you to get put on a no crawl list


yeah I'm a bad robot so I'm guessing I should spin up this elsewhere. Could I at least host the database or the API in Railway?


a year ago

yeah I don't see any issue with that


Loading...