12 days ago
I have a feeling that multiple cron runs are active at the same time.
I have set my cron schedule to trigger every minute. The aim is that it checks for work and if there is none, it stops. Sometimes there is work and then it needs about 2 hours to complete.
I feel like multipe cron runs get triggered causing RAM to to pile up and crash the service.
I was under the impression that if an existing cron run exists, new cron triggers automtically skip.
It is a Python script that parses a large 30GB XML into parquet. When I run this with 1 sequential worker I don't get this service crash, but I would like to speed it up with multiple parallel workers. Towards the end (I believe when recombining the results) the service crashes, I believe due to out of memory.
6 Replies
Status changed to Open Railway • 12 days ago
12 days ago
Hello louisdeconinck,
so yes railway does skip new cron triggers if a previous run is still active, so multiple overlapping cron runs is not your problem
also worth knowing railway has a minimum interval of 5 minutes between cron runs so your every minute schedule isnt working the way you think
your actual issue is that the oom crash happening within a single run when your parallel workers recombine results at the end , 1 sequential worker no crash, parallel workers crash at the end your memory graph shows a steady climb then a sudden jump past 20gb right before the crash thats the recombination step eating all your ram at once
Hope this help you :)
12 days ago
Thanks for the info.
In this case I had my cron schedule at every 15 minutes. In the middle of the run I get "Starting Container" when the next run was scheduled, but the old run was still running. What does this log mean? I'm expecting that the cron trigger would be skipped.
You can see in the list of cron runs that I have a run every 15 minutes, even though it's just two runs that span 30min / 1 hour. Why does it show this as multiple runs every 15 minutes if it's just 2 longer spanning runs?
Are these just UI bugs?
12 days ago
Also, does Railway provide error message or logs on why a service crashed?
12 days ago
so looking at your screenshots the multiple runs every 15 minutes are not a ui bug those are genuinely separate cron executions each lasting around 14to15 minutes the numbers in parentheses like (1m), (16m), (30m), (1h) are just "time since that run completed" at the moment you took the screenshot, not a second duration the 3:15pm run with the red dot is your crashed run
on the "starting container" log appearing mid-run, that deployment is marked as "removed" which means a new deployment was pushed to replace it that "starting container" is railway booting the new deployment, not a new cron trigger. so not a bug either
on crash logs, yes railway shows them. the deploy logs of that removed deployment (image 1) is exactly where you find what happened before the crash just scroll to the very bottom of those logs on the removed deployment and you'll see the last output before it died
12 days ago
Thanks for looking into this.
I don't think that's correct. Both UI runs have the same id and contain the same logs, so I would think it's the same run even though the UI shows multiple runs.
At the end of the deploy logs I don't see any crash log from Railway itself, just my own logs.
Attachments
louisdeconinck
Thanks for looking into this. I don't think that's correct. Both UI runs have the same id and contain the same logs, so I would think it's the same run even though the UI shows multiple runs. At the end of the deploy logs I don't see any crash log from Railway itself, just my own logs. 
12 days ago
okay so the ui showing multiple runs is a display quirk on railway's side not actual concurrent runs which matches what you found yourself (same id, same logs)
on the missing crash log i think that's normal and expected when the linux kernel kills your process due to out of memory it sends sigkill, which is an immediate hard kill your app physically cannot catch sigkill so it never gets a chance to write a final log line the logs just stop that's not a railway bug, that's just how linux works
your red dot on the 3:15pm run plus your ram hitting the ceiling on the metrics graph is your evidence it was an oom kill not the deploy log