9 months ago
I have this cron that runs every 10 minutes to do a sync in a production system. It's very simple, it logs in via API, sends the sync request, and exits. Total of 45 lines but could probably be cut down to 10 lines if I really wanted.
It has worked for at least 3 months. It used to "crash" from time to time but 6 days ago it "crashed" and never ran again. I had originally moved the sync trigger out of the app and into a Railway function with a cron to make it more reliable so I didn't expect the cron to stop working.
I selected the project and service already so I guess you guys have the associated IDs. I can probably just make a change or redeploy to make it work (have not tried) but I won't touch it yet in case the logs or current state are useful to the Railway team to help prevent this in the future.
Any advice?
12 Replies
9 months ago
Hey Joseph, good call on keeping it in that state but it looks like the cron restarted and is now in a running state
I also saw some of your workloads were recently moved to metal but this cron wasn't one of them. I've reviewed those services, they're all looking healthy 
Unfortunately, I'm unable to check why the cron you mentioned might've crashed. We haven't had any reports on our end for crons crashing either, but I can definitely keep an eye out as I'll be on support rotation this week.
Do you have any more context to give on what could have led to the issue? It'll be super helpful in investigating or reproducing
Status changed to Awaiting User Response Railway • 9 months ago
9 months ago
Hey Chandrika, yeah unfortunately I couldn't wait too long before restarting so I deployed an update (just removed a comment) to restart the service about 20 minutes ago.
I don't really have any additional context as it's such a simple function that there isn't many moving parts to break haha. I can tell you that the "crashed" run doesn't have any logs at all but the successful ones do.
Now that I'm looking at it again, it does seem the cron isn't firing the function. It should run every 10 minutes but the last run was 20 minutes ago so that part is still not functioning correctly even though the last run was successful.
Status changed to Awaiting Railway Response Railway • 9 months ago
9 months ago
Could you recreate a new service (retain this one) and configure it up as it is now? Just wondering if that service has gotten stuck in some away
Status changed to Awaiting User Response Railway • 9 months ago
9 months ago
I duplicated the function service then changed the old service cron to be once every 24hr. It seems like the cron for the new function service (73044b00-7cf7-4e28-a928-2ec6c6778b32) is working so I'll probably delete the old service tomorrow, unless you want me to keep it up to help you troubleshoot.
Status changed to Awaiting Railway Response Railway • 9 months ago
9 months ago
Hey Joseph, glad to hear you're unblocked on this - very sorry you ran into this issue !
I appreciate your offer to keep the service up so we can take a look. Do you consent to our engineering team editing the cron schedule to attempt reproduction ? We'll remove your old service once we're done.
Thanks !
Status changed to Awaiting User Response Railway • 9 months ago
9 months ago
Yes your team can modify the cron schedule of the old service (88333b1d-f82a-49cb-be60-6cc0d662394d). It won't cause any disruption on my end.
Please let me know when to remove the old service or if you remove it yourself. This production/uat environments are clones, will removing the sync cron in production remove it from uat? I duplicated the cron in uat just in case.
Status changed to Awaiting Railway Response Railway • 9 months ago
9 months ago
it won't automatically - you'll be able to sync the change from the canvas once applied 
Status changed to Awaiting User Response Railway • 9 months ago
9 months ago
Got it! Let me know if I should resolve the thread or keep it open.
Status changed to Awaiting Railway Response Railway • 9 months ago
9 months ago
I'll mark as solved for now - feel free to reopen/reach out if you need anything :)
Status changed to Awaiting User Response Railway • 9 months ago
Status changed to Solved itsrems • 9 months ago
8 months ago
Related issue is going on now. The function keeps "crashing" when I haven't made any changes.
Project ID: 1832af6e-ee77-46f2-a94a-d649db392683
Service ID: 73044b00-7cf7-4e28-a928-2ec6c6778b32
Status changed to Awaiting Railway Response Railway • 9 months ago
8 months ago
It also runs every 20min when I have the cron set to 10min. Something is wrong with this service.
Attachments
8 months ago
The "crashing" was my fault I think, since the server was taking too long to respond and I guess it timed out the function. The cron thing was probably due to the troubles Railway had today, it seems to be back to every 10min once the deploy outages was resolved.
Status changed to Solved brody • 9 months ago
