Re: Service Instability and Lack of Support Response

maicou-andrade

PROOP

a month ago

Hello Railway Team,

I’m writing because the current situation has become unacceptable.

Multiple projects on your platform are experiencing severe instability, performance degradation, and operational issues. These are production environments and the impact is not theoretical — real work is blocked and teams are unable to move forward.

What is making this significantly worse is not only the outage itself, but the complete lack of accessible support and communication channels. There is no effective way to contact someone, receive updates, understand ETA, or even get acknowledgment that the issue exists.

Infrastructure problems happen. Every platform has incidents.

What is difficult to accept is the absence of accountability and communication during those incidents.

At this point, my confidence in Railway as a production platform has been seriously affected. Once services stabilize, I will begin evaluating and migrating projects one by one to another provider — not because incidents happened, but because of how they were handled.

I expect:

Clear communication about the current incident

Estimated recovery expectations

Better support accessibility

Transparency around reliability and incident management

I would appreciate an actual response instead of silence.

$20 Bounty

8 Replies

Railway

BOT

a month ago

We experienced a major service disruption starting at 02:25 UTC on May 20 that affected the entire platform, which was resolved by 07:57 UTC. A separate build performance issue is currently being monitored after a fix was deployed. All of your services are currently showing successful deployments. We hear your feedback on communication during incidents, and a post-mortem has been published with full details on what happened and what we are doing to prevent recurrence.

Status changed to Awaiting User Response Railway • about 1 month ago

Railway

We experienced a [major service disruption](https://status.railway.com/incident/I23M92U0) starting at 02:25 UTC on May 20 that affected the entire platform, which was resolved by 07:57 UTC. A separate build performance issue is currently being [monitored](https://status.railway.com/incident/KVZ1Z8GY) after a fix was deployed. All of your services are currently showing successful deployments. We hear your feedback on communication during incidents, and a [post-mortem](https://blog.railway.com/p/incident-report-may-19-2026-gcp-account-outage) has been published with full details on what happened and what we are doing to prevent recurrence.

maicou-andrade

PROOP

a month ago

No, I’m experiencing a connection issue with the MySQL database. Some services are not functioning properly.

The application is unable to communicate with MySQL, which is causing operational issues.

I’m currently experiencing problems on the following platforms:

bfcc0e9a-3f0b-449c-a2a0-a7d67e65b1e0

a72f7907-c3d8-45ea-8814-da1f543005f6

Status changed to Awaiting Railway Response Railway • about 1 month ago

Status changed to Open Railway • about 1 month ago

Railway

maicou-andrade

PROOP

a month ago

MySQL shows SUCCESS, but in practice it is inaccessible, even from within Railway itself. I’m going to force a restart of it.

Railway

maicou-andrade

PROOP

a month ago

????????????????????????????????????????????????????????????????????????????????????????????????????????????????

maicou-andrade

MySQL shows SUCCESS, but in practice it is inaccessible, even from within Railway itself. I’m going to force a restart of it.

maicou-andrade

PROOP

a month ago

?????????????????????????????????????????????????????????????????????????????????????????????????????w

maicou-andrade

PROOP

a month ago

Railway — Production Outage Report (URGENT)

Project: apiforge — a72f7907-c3d8-45ea-8814-da1f543005f6

Account: maicouandrade@msconsultoria.net.br (Pro plan)

Date: May 20, 2026

Impact: Production API down. My customer cannot access data right now.

I need someone from your team to look at this today. Multiple Railway-side problems are blocking my production system. None of these are caused by my code — they are platform issues.

1. MySQL data loss — `apiforge.queries` table is gone

Service: MySQL 5cd57c58-bb57-424c-910b-0a73e16682e4

The queries table existed and was serving 5,000+ rows at 13:30 BRT today. After a MySQL hiccup (see #2), every query now returns:

Table 'apiforge.queries' doesn't exist

I did not drop the table. No migrator ran today. Please check the persistent volume for this MySQL instance — did it detach, reset, or get replaced? If the data is gone, I need to know now so I can rebuild from schema.

2. MySQL was unreachable on internal network

At ~14:08 BRT, queries from the painel service (9845a292-f5b0-417e-a4b4-532f24796f3f) to mysql.railway.internal:3306 returned connect ETIMEDOUT after 11.5 seconds.

The MySQL deployment was reporting SUCCESS the whole time. Deployment status does not reflect actual database health. This is misleading.

A manual serviceInstanceRedeploy on the MySQL service recovered it after ~90 seconds.

3. TCP proxy dropping external MySQL connections

External clients connecting to shuttle.proxy.rlwy.net:55153 complete the TCP handshake but get the connection closed immediately during the MySQL protocol handshake. Driver error (mysql2):

Connection lost: The server closed the connection.

This caused my external Node.js service (running on a Windows VM behind a Cloudflare Tunnel) to crash repeatedly on startup until the wrapper gave up after 3 retries. Service has been down since.

Is there a rate limit, IP throttle, or active incident on the TCP proxy in the gru region today?

4. Build queue stalled

Deployments stuck in QUEUED for 10+ minutes (normal for this Next.js project: 2-3 minutes total). Example: deployment created 14:45:20 UTC was still QUEUED 10+ minutes later. Multiple deploys today affected.

What I need

Check the volume of MySQL service 5cd57c58-bb57-424c-910b-0a73e16682e4 — is data intact? Was it replaced/detached today?
Status of TCP proxy shuttle.proxy.rlwy.net:55153 — incident, throttle, or normal?
Status of build queue in gru today — degraded?
Confirm if there is an active incident affecting MySQL or proxies in South America today. If yes, please share the status page.

I will not be rotating my code or infrastructure further until I get clarity on these four points — I need to know what is on my side versus yours.

0x5b62656e5d

MODERATOR

a month ago

Have you tried redeploying your MySQL service? (Not restart)

0x5b62656e5d

Have you tried redeploying your MySQL service? (Not restart)

dimi4ik44

HOBBY

a month ago

This sounds even worse because of deploy issues. In my opinion not porfect idea

Welcome!

Railway — Production Outage Report (URGENT)

1. MySQL data loss — apiforge.queries table is gone

2. MySQL was unreachable on internal network

3. TCP proxy dropping external MySQL connections

4. Build queue stalled

What I need

1. MySQL data loss — `apiforge.queries` table is gone