Incident 17 Feb ~01:15 UTC

I'm not seeing any discussions related to this in the Discord..

Around 1-2 hours ago, I was notified of 2 things:

  • > Hostname/IP does not match certificate's altnames: Host: unproxied-custom-domain-here. is not in the cert's altnames: DNS:*.up.railway.app

  • HTTP 520 responses for Cloudflare proxied custom domains

Looking further, there was also a spike in latency

This affected all of my services, across multiple projects, and all of the deployments were healthy w/o redeploys

The status checker I'm using here is hosted in AMS

Incident started around 01:15 UTC and ended around 01:25 UTC

Screenshot_20260217-045625.png
Screenshot_20260217-045701.png
Screenshot_20260217-045719.png

Solved

40 Replies

17 days ago

What do your http logs say?


Were all green, let me check again


Scrolling through the HTTP logs on a service that got the certificate issue, it's all 200s


17 days ago

Link


noticeable gap (utc+2 tzs)

Screenshot_20260217-050507.png

Attachments


linked, should be this one


To clearly clarify, this is no longer an issue



17 days ago

I'm not seeing anything on our end here.


in case missed


17 days ago

Yes, I've looked at our monitoring for the time you gave.


another example, cloudflare proxied

Screenshot_20260217-051204.png
Screenshot_20260217-052048.png


17 days ago

Not seeing anything on our end for that either.


I mean, could it be an observability issue? I don't think Cloudflare lies about 520s


17 days ago

I am honestly not sure.


Not a new issue either I believe

Screenshot_20260217-053634.png

Attachments


17 days ago

I'm also not seeing other users report this 🙁


"maybe I'm just not like other users"


think about the times where I called out an outage first (at least in the discord)!!


17 days ago

Haha I'm not sure what you want me to say to that.


didn't expect you to


I'm fine with leaving it at this, no SLA or critical services on my end


may be worth looking into (deeply) though..


17 days ago

I've looked, I don't see anything from any of our monitors.


what about now!


haayhappen
PRO

16 days ago

I'm also observing this (second time today a couple minutes ago) - first time was 8:20 UTC


mai1015
HOBBY

16 days ago

image.png

Attachments


mai1015
HOBBY

16 days ago

wow so serious the server down


16 days ago

^


futsy
HOBBY

16 days ago

Also seeing this as well


duxsec
HOBBY

16 days ago

Interested to see what this caused.


futsy
HOBBY

16 days ago

I've been seeing similar behaviour just now.

UTC timeline of the recent flapping I observed (triggered by my monitoring system; which is also monitoring services outside of Railway as well and only Railway was impacted:

08:21:35 — DOWN
08:22:56 — UP
08:24:58 — DOWN
08:26:18 — UP
09:40:46 — DOWN
09:44:44 — UP
09:58:27 — DOWN
09:59:47 — UP
10:01:53 — DOWN

During this window I did some diagnostics:
DNS resolution was normal and stable (single IPv4 A-record; no IPv6).

TCP connect to port 443 succeeded immediately.

The failure happened during TLS negotiation: curl sent the TLS ClientHello, then timed out waiting for the server’s ServerHello (connection established, but the TLS handshake response didn’t come back).

Shortly after, the alert cleared by itself. I re-ran the same diagnostics and TLS completed normally (TLS 1.3), ALPN negotiated HTTP/2, and the health request returned 200.

Note I am NOT behind CF proxy.


pepijn
PRO

16 days ago

Same here


16 days ago

"Same" and "this" meaning you all have seen a cert for a service domain instead of a custom domain?


Railway
BOT

16 days ago

Hello!

We've escalated your issue to our engineering team.

We aim to provide an update within 1 business day.

Please reply to this thread if you have any questions!

Status changed to Awaiting User Response Railway 16 days ago


16 days ago

Happy? 😆

image.png

Attachments



<:proud:800594749714071553>


15 days ago

Hello,

We have provisioned more capacity in regards to our edge network in the EU-West region. You should no longer see these blips of incorrect certificates.


haayhappen
PRO

15 days ago

Thank you.. how can this be caught automatically next time?


Status changed to Awaiting Railway Response Railway 15 days ago


15 days ago

We are going to alert on the graph I shared above, and long term, we are well underway on a complete rewrite of the network control plane that will outright solve for this, and solve for many other potential issues.


Status changed to Awaiting User Response Railway 15 days ago


Status changed to Solved brody 14 days ago


Loading...