Healthcheck Fails: {"message":"invalid host header"}

railay

HOBBYOP

a year ago

Hi, I am implementing health checks on some of my projects.

In my web app, I added an endpoint at /api/health which returns a simple dict {'status': 'ok'}. I deployed the code change to include this endpoint and I confirmed that I get a status 200 code from my app in a web browser.

But when I add /api/health to the project settings and redeploy, I get the following in the Build logs:

Attempt #1 failed with service unavailable. Continuing to retry for 4m49s

Attempt #2 failed with status 400: {"message":"invalid host header"}. Continuing to retry for 4m41s

Attempt #3 failed with status 400: {"message":"invalid host header"}. Continuing to retry for 4m39s

etc...

I'm not sure how to debug this. It looks like attempt #1, the service legitimately hadn't started, but on attempt #2, it was up and running. In the Deploy logs, I can see the app is getting the health check requests and returning a 400 error code.

Any ideas on how to work through this?

71 Replies

railay

HOBBYOP

a year ago

34108dc5-7795-469f-b853-2863bbedfa6f

railay

HOBBYOP

a year ago

The above is happening on my Python web app. But I am also implementing health checks on a Directus app, and the switch over is not seamless... it looks like the containers switch over before the health check is successful... the Directus instance becomes temporarily unreachable and I get content not found errors on the frontend (which queries Directus) until the Health check succeeds.

railay

HOBBYOP

a year ago

This is what I see in the Directus Build logs...

====================
Starting Healthcheck
====================
Path: /server/health
Retry window: 5m0s

Attempt #1 failed with service unavailable. Continuing to retry for 4m49s

Attempt #2 failed with service unavailable. Continuing to retry for 4m38s

[1/1] Healthcheck succeeded!

brody

EMPLOYEE

a year ago

Hello,

Perhaps this docs section will be of help? -

railay

HOBBYOP

a year ago

Hi Brody! 🙂

What you linked me to sounds like a similar symptom, but as far as I know, my Python app is not restricting any domains. This is a publicly accessible web app. And I can perform the health check from my web browser.

And the problem with my Directus project is different. The health check passes, but I’m pretty sure the containers switch before it passes.

brody

EMPLOYEE

a year ago

For the first error you showed, the healthcheck hostname section is what you want, for the service unavailable, you likely want this section -

railay

HOBBYOP

a year ago

Thanks Brody. You were right about the first issue. I added healthcheck.railway.app to my allowed_hosts and it resolved the issue. I still need to understand why, since it's a publicaly accessible project, but this was the issue... thanks!

Regarding the second issue, I'm confused. I have a PORT environment variable on the Directus project. This was present since before I created this ticket.

And eventually the health check does succeed. But the containers switch before the new container is fully started and before the health check passes.

I confirmed that the PORT value in the environment varaible (3000) matches the target port in Settings -> Public Networking

railay

HOBBYOP

a year ago

I mean, if the health check eventually succeeds, then the port setup is working, correct?

brody

EMPLOYEE

a year ago

how long are we talking before it succeeds?

railay

HOBBYOP

a year ago

Seems to vary a bit per deployment, but the last one shows about a minute. And the service actually goes down for a bit longer than that, because the containers switch before the health checks start

1360287766575059348

railay

HOBBYOP

a year ago

The last deployment took 5 attempts to succeed. The original example I shared above shows it succeeded on the 3rd attempt, so 30-ish seconds on that one

brody

EMPLOYEE

a year ago

oh that's completely normal

brody

EMPLOYEE

a year ago

just depends on how fast the application starts

railay

HOBBYOP

a year ago

Yup, agreed

railay

HOBBYOP

a year ago

Problem is Railway is serving the new container before it starts

railay

HOBBYOP

a year ago

So my frontend is broken for those 30-60 seconds

brody

EMPLOYEE

a year ago

why do you think its serving the new container

brody

EMPLOYEE

a year ago

The service has a volume right?

railay

HOBBYOP

a year ago

My Directus project becomes unavailable during that time. I can't access the CMS at all.

And my python web app makes API queries to Directus for content, and API calls from Python to Directus fail

railay

HOBBYOP

a year ago

Yes, Directus has a volume and it also connects to a Postgres DB hosted by Railway

brody

EMPLOYEE

a year ago

Yep then there will be downtime during the deployment, we do not support zero downtime deployments with volumes

railay

HOBBYOP

a year ago

Hmmm ok. I'm not sure what's actually being stored on the volume. Probably images, but not sure what else.

I forked a Directus community template to get up and running. I can solve for image storage via a CDN bucket, but don't know if I need to solve for anything else... maybe Directus settings/app config/etc.

railay

HOBBYOP

a year ago

But if I can't solve for that, then Directus on Railway won't work as a solution for me

railay

HOBBYOP

a year ago

I can't afford a minute of downtime everytime I do a deployment

brody

EMPLOYEE

a year ago

why does it take so long to start?