17 days ago
I'm not seeing any discussions related to this in the Discord..
Around 1-2 hours ago, I was notified of 2 things:
> Hostname/IP does not match certificate's altnames: Host: unproxied-custom-domain-here. is not in the cert's altnames: DNS:*.up.railway.app
HTTP 520 responses for Cloudflare proxied custom domains
Looking further, there was also a spike in latency
This affected all of my services, across multiple projects, and all of the deployments were healthy w/o redeploys
The status checker I'm using here is hosted in AMS
Incident started around 01:15 UTC and ended around 01:25 UTC
Attachments
40 Replies
17 days ago
What do your http logs say?
17 days ago
Were all green, let me check again
17 days ago
Scrolling through the HTTP logs on a service that got the certificate issue, it's all 200s
17 days ago
Link
17 days ago
noticeable gap (utc+2 tzs)
Attachments
17 days ago
linked, should be this one
17 days ago
To clearly clarify, this is no longer an issue
17 days ago
Attachments
17 days ago
I'm not seeing anything on our end here.
17 days ago
in case missed
17 days ago
Yes, I've looked at our monitoring for the time you gave.
17 days ago
another example, cloudflare proxied
Attachments
17 days ago
Not seeing anything on our end for that either.
17 days ago
I mean, could it be an observability issue? I don't think Cloudflare lies about 520s
17 days ago
I am honestly not sure.
17 days ago
Not a new issue either I believe
Attachments
17 days ago
I'm also not seeing other users report this 🙁
17 days ago
"maybe I'm just not like other users"
17 days ago
think about the times where I called out an outage first (at least in the discord)!!
17 days ago
Haha I'm not sure what you want me to say to that.
17 days ago
didn't expect you to
17 days ago
I'm fine with leaving it at this, no SLA or critical services on my end
17 days ago
may be worth looking into (deeply) though..
17 days ago
I've looked, I don't see anything from any of our monitors.
16 days ago
what about now!
16 days ago
I'm also observing this (second time today a couple minutes ago) - first time was 8:20 UTC
Attachments
16 days ago
^
I've been seeing similar behaviour just now.
UTC timeline of the recent flapping I observed (triggered by my monitoring system; which is also monitoring services outside of Railway as well and only Railway was impacted:
08:21:35 — DOWN
08:22:56 — UP
08:24:58 — DOWN
08:26:18 — UP
09:40:46 — DOWN
09:44:44 — UP
09:58:27 — DOWN
09:59:47 — UP
10:01:53 — DOWN
During this window I did some diagnostics:
DNS resolution was normal and stable (single IPv4 A-record; no IPv6).
TCP connect to port 443 succeeded immediately.
The failure happened during TLS negotiation: curl sent the TLS ClientHello, then timed out waiting for the server’s ServerHello (connection established, but the TLS handshake response didn’t come back).
Shortly after, the alert cleared by itself. I re-ran the same diagnostics and TLS completed normally (TLS 1.3), ALPN negotiated HTTP/2, and the health request returned 200.
Note I am NOT behind CF proxy.
16 days ago
"Same" and "this" meaning you all have seen a cert for a service domain instead of a custom domain?
16 days ago
Hello!
We've escalated your issue to our engineering team.
We aim to provide an update within 1 business day.
Please reply to this thread if you have any questions!
Status changed to Awaiting User Response Railway • 16 days ago
16 days ago
Happy? 😆
Attachments
16 days ago
I win
16 days ago
<:proud:800594749714071553>
15 days ago
Hello,
We have provisioned more capacity in regards to our edge network in the EU-West region. You should no longer see these blips of incorrect certificates.
15 days ago
Thank you.. how can this be caught automatically next time?
Status changed to Awaiting Railway Response Railway • 15 days ago
15 days ago
We are going to alert on the graph I shared above, and long term, we are well underway on a complete rewrite of the network control plane that will outright solve for this, and solve for many other potential issues.
Status changed to Awaiting User Response Railway • 15 days ago
Status changed to Solved brody • 14 days ago

