is scraping allowed?

2 years ago

N/A

2 years ago

what sites are you scraping and did they give you permission?

2 years ago

football sites and no but their data is public

2 years ago

data on twitter is public but that would be a gigantic no, football sites are fine, go for it, as long as you have respectable request rates and abide by robots.txt if applicable.

2 years ago

ok! thanks

2 years ago

just to be sure, if their robots.txt doesn't allow it then Railway is against it?

User-agent: Mediapartners-Google
Disallow:

User-agent: Googlebot
Disallow:

User-agent: AdsBot-Google
Disallow:

User-agent: Googlebot-Image
Disallow:

User-agent: *
Disallow: /*?*

2 years ago

cc @Brody (sorry for ping, idk if the thread pops again when closed)

2 years ago

you gotta respect the robots.txt like any good robot would, we don't want to have to deal with takedown requests, though we will comply.

2 years ago

and unfortunately "it's unlikely for you to be sent a takedown request" is not an excuse

2 years ago

oh and you should also have an email in your UA so that web admins can email you to get put on a no crawl list

2 years ago

yeah I'm a bad robot so I'm guessing I should spin up this elsewhere. Could I at least host the database or the API in Railway?