Application keeps stopping without errors or failures
Anonymous
PROOP

6 months ago

Flask app - Everything deploys fine to our cron job instance, but randomly after some time our server instance will completely stop responding. No error messages from our logs nor from railway.

We have uptime robot sending periodic checks HTTPS request, yet still the server completely stops.

Solved

6 Replies

Railway
BOT

6 months ago

Hey there! We've found the following might help you get unblocked faster:

If you find the answer from one of these, please let us know by solving the thread!


Naive response:

Are you sure the recent error isn't related to your application crashing?

I see:

raceback (most recent call last):

File "/app/Leadoff/utils/cron.py", line 998, in background_task

refresh_crm_data()

File "/app/Leadoff/utils/cron.py", line 964, in refresh_crm_data

opportunity_update["amount"] = int(deal_amount)

ValueError: invalid literal for int() with base 10: '2539.8'


Status changed to Awaiting User Response Railway 6 months ago


Anonymous
PROOP

6 months ago

Interesting - I've seen errors like this but the server continues to run - is there a certain number of errors that would trigger some sort of shutdown? Errors will occur, so just wondering why this one would be "instance killing". Is there a better way to be handling errors that occur to not spin down the server.

Also - if this was the reason, why isn't it clear in the logs? I simply see our Cron Job heartbeat print statements and then nothing passed a certain time.

Thank you!


Status changed to Awaiting Railway Response Railway 6 months ago


I wouldn't know, the issue with OOMs/crashes is that you are never guaranteed a graceful shutdown. The only thing that we can do from the Railway side is see if:

1. We propagate the error correctly
2. We are logging

When I review our dashboards, nothing is telling me that we're dropping the ball here. (And believe me, we've dropped the ball.) Sorry I don't have much more for you here.


Status changed to Awaiting User Response Railway 6 months ago


Anonymous
PROOP

6 months ago

Appreciate the help - any idea why our health check would be intermittently getting a timeout? This seems really peculiar. I've looked at our usage statistics and it doesn't seem like we are anywhere near a max capacity that would cause delays.

Attachments


Status changed to Awaiting Railway Response Railway 6 months ago


christian
EMPLOYEE

6 months ago

I took a look at the logs for your last deployment from yesterday, and it doesn't have any 499s. Did you change anything in your app that could have addressed this?


Status changed to Awaiting User Response Railway 6 months ago


Railway
BOT

5 months ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway 5 months ago


Loading...