Error: "Failed to clone: authentication required"

ss-witBIZCLASS

8 months ago

I'm getting this error inconsistently, but at least daily. Redeploys fix it but still annoying to have to get the noisy notification email and rekick the build to get it to work.

Closed

15 Replies

brodyEMPLOYEE

8 months ago

Hello,

Can you link to a specific deployment that shows that error?

Status changed to Awaiting User Response railway[bot] • 8 months ago

ss-witBIZCLASS

8 months ago

https://railway.app/project/21b7eb49-e84c-40c9-9278-73e17ff7b318/service/7b0f1a97-a6b3-44bc-aaba-471e7cccdb7d?id=21876330-4a6b-4779-966c-04174bae3baa

Status changed to Awaiting Railway Response railway[bot] • 8 months ago

brodyEMPLOYEE

8 months ago

Thank you, I've raised this to the applicable people.

Status changed to Awaiting User Response railway[bot] • 8 months ago

ss-witBIZCLASS

8 months ago

Occurred again today
@ https://railway.app/project/21b7eb49-e84c-40c9-9278-73e17ff7b318/service/7b0f1a97-a6b3-44bc-aaba-471e7cccdb7d?id=ed5a2243-2f3a-410e-9410-9e2e4dcbe902

Notably seems to happen when many PRs get merged in at once kicking off lots of builds. We have lots of services that get kicked off simultaneously
-- maybe an API rate limit somewhere is being overwhelmed?

Status changed to Awaiting Railway Response railway[bot] • 8 months ago

brodyEMPLOYEE

8 months ago

That's very good insight, how many builds do you think are triggered when you see these errors?

Status changed to Awaiting User Response railway[bot] • 8 months ago

ss-witBIZCLASS

8 months ago

Looks like 26 at the current count (spread across a few projects, but all listening to the same github branch push!)

Status changed to Awaiting Railway Response railway[bot] • 8 months ago

brodyEMPLOYEE

8 months ago

Oh wow, that's a lot, it definitely sounds plausible that it could be a rate limit from GitHub.

Status changed to Awaiting User Response railway[bot] • 8 months ago

ss-witBIZCLASS

8 months ago

Yup, hence wanting to post the additional info

Fwiw while we could definitely do the work to reduce number of services, ideally this would Just Work so I don't have to worry about it.

Perhaps many services that are hooked up to the same github branch could dedupe their API calls? Or railway-side rate-limiting to delay requests that will overload the github api?

🫡

Status changed to Awaiting Railway Response railway[bot] • 8 months ago

brodyEMPLOYEE

8 months ago

Our engineers aren't aware of any Railway related rate limits here, but we will keep you updated as we investigate.

Status changed to Awaiting User Response railway[bot] • 8 months ago

Status changed to In Progress brody • 8 months ago

brodyEMPLOYEE

8 months ago

Update, it would be far from a simple fix on our side, we would like to ask you to stager the deploys.

ss-witBIZCLASS

8 months ago

Any way I can get some more info? Is it a particular rate limit I'm hitting? Is there an estimate for that rate limit so we can work around it?

My concern is that if the issue is related to too many service-builds being kicked off, we may end up with more and more services. Do we need to limit the number of services we have as well?

brodyEMPLOYEE

8 months ago

It's not a limit from our side, it would be from GitHub, unfortunately we don't have any numbers for this, as the limit isn't for your specific account, since the limits may be hit if a lot of other users are deploying on the platform at the same time.

ss-witBIZCLASS

8 months ago

It sounds like what you're saying is that if too many deploys are triggered at once on railway globally then the deploy system has an outage??

This is really confusing, are you sure there aren't workarounds? I don't want to "back-seat engineer" but automatically queueing/throttling requests so they don't overwhelm the GH limit or retrying failures seems more appropriate, seeing as this is a core feature of the platform and the outage happens on a near-daily occurrence, and affects everyone using the platform(?!)

Would really appreciate any more info, the fact that we can deploy easily and often is a core feature and huge value add of railway to us. It's ok if our deploy is delayed an extra few minutes due to getting temporarily throttled or needing to be auto-retried, but that's much different than having to manually do it, which is effectively an outage of the deploy system and happening daily in our eyes.

brodyEMPLOYEE

8 months ago

I'll continue to talk with the engineers, but this may be a bigger change than what we have the cycles for at the moment.

brodyEMPLOYEE

8 months ago

Hey there Sina, unfortunately we don't have the cycles right now to solution a fix since this is the only report we have seen.

I have ticketed it and will let you know when we pick the ticket up.

Status changed to Closed brody • 8 months ago