Crons are Triggering but not starting the service

weblegs
PROOP

13 days ago

Hi Team,

We are experiencing critical issues with our cron jobs. One of the cron jobs configured to run every 30 minutes has not been executing properly — the service has been stuck in the “Starting container” state for the past 13 hours.

This issue is severely impacting our business operations.

Additionally, when we attempt to trigger the cron manually using the “Run Now” button, it often fails to execute. In some cases, it shows as “Running” after several minutes, while in others, it doesn’t run at all.

Could you please investigate and confirm:

  1. The root cause of these failures, and

  2. The expected timeline for resolution?

Your prompt assistance on this matter would be greatly appreciated.

Attachments

Awaiting User Response

7 Replies

Railway
BOT

13 days ago

Hey there! We've found the following might help you get unblocked faster:

If you find the answer from one of these, please let us know by solving the thread!


13 days ago

Can you please share link(s) to the impacted cron service(s)?


Status changed to Awaiting User Response Railway 13 days ago


12 days ago

Hey Weblegs!
I looked over your workflows and the erroneous ones seemed to occur yesterday while we were experiencing a major outage of our backend system. That outage's incident page is: https://status.railway.com/cmhawy13n00ctxzoxqbcch0no.
During that incident our backend systems went down which is likely the reason for your cron's issues.

I investigated our internal logs as of late and it looks like all of your schedules are going along smoothly. If you encounter the cron being down and not firing again please let us know and I'll promptly look into!


weblegs
PROOP

12 days ago

Thank you for addressing this issue. Could you please let us know how frequently this type of downtime occurs? Also, were any email notifications triggered during the outage so that we could be made aware of it in real time?

I hope the applications do not experience unexpected downtimes like this in the future.

Suggestion:
It would be great to have a feature that automatically runs any missed cron jobs once the system becomes available again. For example, if a cron is scheduled to run twice a day at 1:00 AM and 3:00 PM, and the system is down at 1:00 AM but restored at 2:00 AM, the 1:00 AM job should execute immediately upon recovery.

This feature could be configurable, allowing users to enable or disable it based on their preference.


Status changed to Awaiting Railway Response Railway 12 days ago


weblegs

Thank you for addressing this issue. Could you please let us know how frequently this type of downtime occurs? Also, were any email notifications triggered during the outage so that we could be made aware of it in real time?I hope the applications do not experience unexpected downtimes like this in the future.Suggestion:It would be great to have a feature that automatically runs any missed cron jobs once the system becomes available again. For example, if a cron is scheduled to run twice a day at 1:00 AM and 3:00 PM, and the system is down at 1:00 AM but restored at 2:00 AM, the 1:00 AM job should execute immediately upon recovery.This feature could be configurable, allowing users to enable or disable it based on their preference.

11 days ago

For that incident no running services encountered any issues. All running deployments were still running.
It appears that crons were the only thing that experienced slight issues although most did not.

As for frequency, this type of issue is very uncommon and does not occur frequently.
Email notifications are sent out and you can monitor the status page of railway if you are curious of any current issues.


Status changed to Awaiting User Response Railway 11 days ago


Railway
BOT

4 days ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway 4 days ago


weblegs
PROOP

17 hours ago

Hi Railway Support Team,

We’re facing the same issue again.

Service Name: ASDA_CreateOrdersOnCAandDB_Node
Environment: Production

Timeline (IST):

  • Stuck in “Starting container” from Nov 8, 2025, 02:50:44 AM

  • Recovered only after manual redeploy at Nov 9, 2025, 09:00:47 PM

During this entire period, the cron job didn’t run, and no errors or alerts were shown. As a result, our scheduled job (every 20 minutes) silently failed for ~40 hours, causing real business impact. It’s exactly the same behavior we reported earlier. It’s very concerning that this hasn’t been addressed yet.

We need an immediate investigation and explanation for:

  1. Why the container got stuck in “Starting” again.

  2. Why no error, timeout, or alert was raised.

  3. What is being done to prevent recurrence and improve monitoring/alerting.

Please treat this as a critical, recurring production issue and share a clear RCA and resolution timeline. We really can’t afford these silent failures.

Attachments


Status changed to Awaiting Railway Response Railway about 17 hours ago


3 hours ago

Could I get a link to that cron execution having problems? Want to investigate more


Status changed to Awaiting User Response Railway about 3 hours ago


Crons are Triggering but not starting the service - Railway Help Station