Server goes down randomly throughout the day
shuwenf
PROOP

2 years ago

Recently I realized the production railway server goes down randomly throughout the day and show 503 error. What's going on? Can someone take a look?

My project ID is 99e122f7-a96e-42ba-95aa-325cd3e66c82

...
        <h1 class="error-404">Nothing here... yet</h1>
        <h1 class="error-503">Application failed to respond</h1>

        <a href="https://railway.app"> Go to Railway </a>
Solved

71 Replies

brody
EMPLOYEE

2 years ago

do you have any logs for when you get the error page?

what kind of app?


shuwenf
PROOP

2 years ago

no logs, I just can't reach the server so there's no logging


shuwenf
PROOP

2 years ago

it's a fastapi backend


brody
EMPLOYEE

2 years ago

right but if your application was erroring and not responding, ideally there would be logs


shuwenf
PROOP

2 years ago

all of the 0s are downtimes

1247770943216947200


shuwenf
PROOP

2 years ago

It looks like it's railway that's not responding


shuwenf
PROOP

2 years ago

```

Nothing here… yet

Application failed to respond

    <a href="https://railway.app"> Go to Railway </a>
```

shuwenf
PROOP

2 years ago

No error from my app


brody
EMPLOYEE

2 years ago

that page is shown when your application doesn't respond


brody
EMPLOYEE

2 years ago

are these https requests? what am I looking at here?


shuwenf
PROOP

2 years ago

The server soetimes works and sometimes doesn't without any changes from my side


shuwenf
PROOP

2 years ago

Yup, this is in postman


shuwenf
PROOP

2 years ago

I'm calling the backend hosted on railway


brody
EMPLOYEE

2 years ago

do you have a custom domain?


brody
EMPLOYEE

2 years ago

I understand how it sounds but that does not rule out an issue with your app


brody
EMPLOYEE

2 years ago

it also does not rule out an issue with railway, but from experience it's more often an issue with the application


shuwenf
PROOP

2 years ago

Yes I have a custom domain


shuwenf
PROOP

2 years ago

How to debug if it's railway or my app? It works perfectly locally


shuwenf
PROOP

2 years ago

My other friends also have uptime problems with railway and have migrated to render


brody
EMPLOYEE

2 years ago

unfortunately working locally does not rule out an issue with the application either


brody
EMPLOYEE

2 years ago

do you have the edge proxy enabled?


shuwenf
PROOP

2 years ago

What's edge proxy? Is this from domain side (e.g. namecheap)


brody
EMPLOYEE

2 years ago

it would be in the service settings


shuwenf
PROOP

2 years ago

Should I enable this?

1247772704170315800


brody
EMPLOYEE

2 years ago

yes, but first, you said your domain provider was namecheap?


shuwenf
PROOP

2 years ago

yes


brody
EMPLOYEE

2 years ago

are you sure you are using the correct generated cname it gave you when you set it up?


shuwenf
PROOP

2 years ago

yes it's been assigend to this domain name for months


brody
EMPLOYEE

2 years ago

I'm sorry but that answer does not instill confidence, I would like to ask for confirmation


shuwenf
PROOP

2 years ago

yes I'm sure


brody
EMPLOYEE

2 years ago

you are using the generated cname, not the auto generated domain, correct?


shuwenf
PROOP

2 years ago

Yes


brody
EMPLOYEE

2 years ago

go ahead and enable the edge proxy


shuwenf
PROOP

2 years ago

Done, what should I do next?


brody
EMPLOYEE

2 years ago

wait and see if you continue to have issues


shuwenf
PROOP

2 years ago

When should I check back in? Just tried postman and still have the same issue


brody
EMPLOYEE

2 years ago

what's the state of your deployment


shuwenf
PROOP

2 years ago

deployed


brody
EMPLOYEE

2 years ago

I'm sorry but that's not a valid state


shuwenf
PROOP

2 years ago

1247776606336843800


brody
EMPLOYEE

2 years ago

yes, please tell me it's state


shuwenf
PROOP

2 years ago

What does that mean?


brody
EMPLOYEE

2 years ago

it's deployment state


shuwenf
PROOP

2 years ago

1247777018490257400


shuwenf
PROOP

2 years ago

1247777092410675200


shuwenf
PROOP

2 years ago

Completed?


brody
EMPLOYEE

2 years ago

your app has exited


brody
EMPLOYEE

2 years ago

this would not be a platform issue


shuwenf
PROOP

2 years ago

Where do you see that the app has exited


brody
EMPLOYEE

2 years ago

completed


shuwenf
PROOP

2 years ago

How do I fix it?


brody
EMPLOYEE

2 years ago

first, let me correct myself, the edge proxy is not going to help here, I had asked you to make that change without enough information from you.

second, since this is an issue with your application I would recommend implementing error handing everywhere and verbose logging to help you narrow down the issue.

remember, railway only ever runs your code as-is, so if it's exiting that's something your app is doing, not the platform


shuwenf
PROOP

2 years ago

What does it mean for the app to have exited?


shuwenf
PROOP

2 years ago

The app bugged out and shut down?


brody
EMPLOYEE

2 years ago

the app exited with a non error code for (at this time) an unknown reason


shuwenf
PROOP

2 years ago

Hmm I see, I'll look into it


shuwenf
PROOP

2 years ago

thanks


brody
EMPLOYEE

2 years ago

I wish you the best of luck in your debugging endeavour


shuwenf
PROOP

2 years ago

Is it possible it exceeded resource constraints? Is there some way to check for that?


brody
EMPLOYEE

2 years ago

you think your app could have exceeded 32gb of ram?


shuwenf
PROOP

2 years ago

We run a 100M parameter LLM model, 32b should be enough


brody
EMPLOYEE

2 years ago

what do your memory metrics look like?


shuwenf
PROOP

2 years ago

Hmm goes up quite high

1247781857668628500


brody
EMPLOYEE

2 years ago

have you received any emails from railway that state you ran out of memory?


shuwenf
PROOP

2 years ago

Nope these are the latest emails

1247782634894135300


brody
EMPLOYEE

2 years ago

then it doesn't seem like that's the issue


shuwenf
PROOP

2 years ago

What does completed mean? This deployment is "completed" instead of "active" but up and running, have healthy logs

1250342452603388000


Completed is when you exit using a 0 exit code. However, going to raise this to the team for investigation.



This thread has been escalated to the Railway team.

Status changed to Awaiting Railway Response angelo-railway over 1 year ago


melissa
PRO

2 years ago

It does look to me like the app is restarting if I'm reading logs correctly, but doesn't seem like it's due to OOM or CPU based on the metrics graphs.

Can you confirm if this log line prints when the app starts: "DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): huggingface.co:443". That could also explain why the status is being updated to "Completed". As Brody suggested, I would also encourage you to add some more verbose logging and error handling to help track down the issue, even starting with a clear debug line for when the app starts so it's easier to see when/if it restarts.


Status changed to Solved Railway over 1 year ago


Railway
BOT

2 years ago

This thread has been marked as solved automatically due to a lack of recent activity. Please open a new thread if you require further assistance. Thank you!


Loading...