Frozen app
nyllarm
PROOP

7 months ago

Hello,

On November 30, 2025, between 13:05-13:19 UTC, my Telegram bot

stopped responding to user messages. Messages were delivered (double

checkmarks in Telegram), but the bot did not process them. I had to

manually restart the service at 13:19 UTC.

This has never happened before.

Railway metrics during the incident:

- Memory: spiked to 3.2 GB before the incident

- Response Time: peaked at 20 seconds

- Request Error Rate: peaked at 6.5%

- After restart, Memory dropped to ~400 MB

PostgreSQL logs:

- Database remained stable throughout the incident (checkpoints

every 5 min)

- At restart time (13:19:38 UTC), 20 DB connections dropped

simultaneously ("Connection reset by peer")

- No deadlocks, lock timeouts, or slow queries in DB logs

Application logs:

- Last activity: 13:05:50 UTC

- After restart: 822 tasks marked as timed-out

- High activity before incident (many concurrent handler calls)

My questions:

1. Was the service killed by Railway due to resource limits (OOM

killer, CPU throttling)?

2. Does Railway have an automatic restart mechanism for unresponsive

services, and did it trigger in my case?

3. What is the memory limit for my plan, and was it exceeded (3.2

GB)?

Configuration:

- Service: Telegram Bot (Python, python-telegram-bot)

- Database: PostgreSQL (separate Railway service)

- Plan PRO

I would appreciate your help in determining the root cause of this

incident.

Thank you.

$30 Bounty

2 Replies

Railway
BOT

7 months ago

Hey there! We've found the following might help you get unblocked faster:

If you find the answer from one of these, please let us know by solving the thread!


7 months ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open brody 7 months ago


7 months ago

since you are on a Pro plan, you have access to 32GB ram/vcpu. it should automatically be there and it scales as your service needs additional cpu/ram

to answer your other questions:

railway does not kill processes, your bot likely has a misconfiguration somewhere which caused it to break. if your app happens to exceed requiring 32gb memory, then it can crash due to that, but 3.2 isn't close enough

Railway will restart crashed/failed processes if you enabled it (by default it is) https://docs.railway.com/guides/deployments#restart-policy

Your app has to crash, not be running but no longer responsive


Welcome!

Sign in to your Railway account to join the conversation.

Loading...