a year ago
I'm planning on scraping some data from a few sites and Railway has always been vague on this matter, is it allowed? I really hope Railway allows it, I don't want to spin up a EC2 ðŸ˜
12 Replies
a year ago
N/A
a year ago
what sites are you scraping and did they give you permission?
a year ago
football sites and no but their data is public
a year ago
data on twitter is public but that would be a gigantic no, football sites are fine, go for it, as long as you have respectable request rates and abide by robots.txt if applicable.
a year ago
ok! thanks
a year ago
just to be sure, if their robots.txt doesn't allow it then Railway is against it?
User-agent: Mediapartners-Google
Disallow:
User-agent: Googlebot
Disallow:
User-agent: AdsBot-Google
Disallow:
User-agent: Googlebot-Image
Disallow:
User-agent: *
Disallow: /*?*a year ago
cc @Brody (sorry for ping, idk if the thread pops again when closed)
a year ago
you gotta respect the robots.txt like any good robot would, we don't want to have to deal with takedown requests, though we will comply.
a year ago
and unfortunately "it's unlikely for you to be sent a takedown request" is not an excuse
a year ago
oh and you should also have an email in your UA so that web admins can email you to get put on a no crawl list
a year ago
yeah I'm a bad robot so I'm guessing I should spin up this elsewhere. Could I at least host the database or the API in Railway?
a year ago
yeah I don't see any issue with that