a month ago
Hi Team,
We are experiencing critical issues with our cron jobs. One of the cron jobs configured to run every 30 minutes has not been executing properly — the service has been stuck in the “Starting container” state for the past 13 hours.
This issue is severely impacting our business operations.
Additionally, when we attempt to trigger the cron manually using the “Run Now” button, it often fails to execute. In some cases, it shows as “Running” after several minutes, while in others, it doesn’t run at all.
Could you please investigate and confirm:
The root cause of these failures, and
The expected timeline for resolution?
Your prompt assistance on this matter would be greatly appreciated.
Attachments
11 Replies
a month ago
Hey there! We've found the following might help you get unblocked faster:
If you find the answer from one of these, please let us know by solving the thread!
Status changed to Awaiting User Response Railway • about 1 month ago
a month ago
Hey Weblegs!
I looked over your workflows and the erroneous ones seemed to occur yesterday while we were experiencing a major outage of our backend system. That outage's incident page is: https://status.railway.com/cmhawy13n00ctxzoxqbcch0no.
During that incident our backend systems went down which is likely the reason for your cron's issues.
I investigated our internal logs as of late and it looks like all of your schedules are going along smoothly. If you encounter the cron being down and not firing again please let us know and I'll promptly look into!
a month ago
Thank you for addressing this issue. Could you please let us know how frequently this type of downtime occurs? Also, were any email notifications triggered during the outage so that we could be made aware of it in real time?
I hope the applications do not experience unexpected downtimes like this in the future.
Suggestion:
It would be great to have a feature that automatically runs any missed cron jobs once the system becomes available again. For example, if a cron is scheduled to run twice a day at 1:00 AM and 3:00 PM, and the system is down at 1:00 AM but restored at 2:00 AM, the 1:00 AM job should execute immediately upon recovery.
This feature could be configurable, allowing users to enable or disable it based on their preference.
Status changed to Awaiting Railway Response Railway • about 1 month ago
weblegs
Thank you for addressing this issue. Could you please let us know how frequently this type of downtime occurs? Also, were any email notifications triggered during the outage so that we could be made aware of it in real time?I hope the applications do not experience unexpected downtimes like this in the future.Suggestion:It would be great to have a feature that automatically runs any missed cron jobs once the system becomes available again. For example, if a cron is scheduled to run twice a day at 1:00 AM and 3:00 PM, and the system is down at 1:00 AM but restored at 2:00 AM, the 1:00 AM job should execute immediately upon recovery.This feature could be configurable, allowing users to enable or disable it based on their preference.
a month ago
For that incident no running services encountered any issues. All running deployments were still running.
It appears that crons were the only thing that experienced slight issues although most did not.
As for frequency, this type of issue is very uncommon and does not occur frequently.
Email notifications are sent out and you can monitor the status page of railway if you are curious of any current issues.
Status changed to Awaiting User Response Railway • about 1 month ago
a month ago
This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!
Status changed to Solved Railway • about 1 month ago
a month ago
Hi Railway Support Team,
We’re facing the same issue again.
Service Name: ASDA_CreateOrdersOnCAandDB_Node
Environment: Production
Timeline (IST):
Stuck in “Starting container” from Nov 8, 2025, 02:50:44 AM
Recovered only after manual redeploy at Nov 9, 2025, 09:00:47 PM
During this entire period, the cron job didn’t run, and no errors or alerts were shown. As a result, our scheduled job (every 20 minutes) silently failed for ~40 hours, causing real business impact. It’s exactly the same behavior we reported earlier. It’s very concerning that this hasn’t been addressed yet.
We need an immediate investigation and explanation for:
Why the container got stuck in “Starting” again.
Why no error, timeout, or alert was raised.
What is being done to prevent recurrence and improve monitoring/alerting.
Please treat this as a critical, recurring production issue and share a clear RCA and resolution timeline. We really can’t afford these silent failures.
Attachments
Status changed to Awaiting Railway Response Railway • 28 days ago
a month ago
Could I get a link to that cron execution having problems? Want to investigate more
Status changed to Awaiting User Response Railway • 27 days ago
20 days ago
Hello!
We're acknowledging your issue and attaching a ticket to this thread.
We don't have an ETA for it, but, our engineering team will take a look and you will be updated as we update the ticket.
Please reply to this thread if you have any questions!
13 days ago
This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!
Status changed to Solved Railway • 13 days ago
13 days ago
Hello,
An update: We have made several changes to our CRON scheduling system, and from our internal metrics, we have seen this make a massive improvement in CRON execution delay and have eliminated missed CRONs. Additionally, the Run Now button starts the job much faster.
But please let us know if this isn't the case for you!
Best,
Brody
Status changed to Awaiting User Response Railway • 13 days ago
13 days ago
🛠️ The ticket Scheduling Inaccuracy Issue has been marked as in progress.
10 days ago
✅ The ticket Scheduling Inaccuracy Issue has been marked as completed.
3 days ago
This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!
Status changed to Solved Railway • 3 days ago