7 months ago
Hello,
On November 30, 2025, between 13:05-13:19 UTC, my Telegram bot
stopped responding to user messages. Messages were delivered (double
checkmarks in Telegram), but the bot did not process them. I had to
manually restart the service at 13:19 UTC.
This has never happened before.
Railway metrics during the incident:
- Memory: spiked to 3.2 GB before the incident
- Response Time: peaked at 20 seconds
- Request Error Rate: peaked at 6.5%
- After restart, Memory dropped to ~400 MB
PostgreSQL logs:
- Database remained stable throughout the incident (checkpoints
every 5 min)
- At restart time (13:19:38 UTC), 20 DB connections dropped
simultaneously ("Connection reset by peer")
- No deadlocks, lock timeouts, or slow queries in DB logs
Application logs:
- Last activity: 13:05:50 UTC
- After restart: 822 tasks marked as timed-out
- High activity before incident (many concurrent handler calls)
My questions:
1. Was the service killed by Railway due to resource limits (OOM
killer, CPU throttling)?
2. Does Railway have an automatic restart mechanism for unresponsive
services, and did it trigger in my case?
3. What is the memory limit for my plan, and was it exceeded (3.2
GB)?
Configuration:
- Service: Telegram Bot (Python, python-telegram-bot)
- Database: PostgreSQL (separate Railway service)
- Plan PRO
I would appreciate your help in determining the root cause of this
incident.
Thank you.
2 Replies
7 months ago
Hey there! We've found the following might help you get unblocked faster:
- 🧵 Python Telegram Bot Cannot Receive Messages
- 🧵 Can't Redeploy Postgre DB
- 🧵 Telegram Bot API Local Mode not using mounted volume for file storage
- 🧵 Unresponsive deployment after some hours
If you find the answer from one of these, please let us know by solving the thread!
7 months ago
This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.
Status changed to Open brody • 7 months ago
7 months ago
since you are on a Pro plan, you have access to 32GB ram/vcpu. it should automatically be there and it scales as your service needs additional cpu/ram
to answer your other questions:
railway does not kill processes, your bot likely has a misconfiguration somewhere which caused it to break. if your app happens to exceed requiring 32gb memory, then it can crash due to that, but 3.2 isn't close enough
Railway will restart crashed/failed processes if you enabled it (by default it is) https://docs.railway.com/guides/deployments#restart-policy
Your app has to crash, not be running but no longer responsive