Healthcheck Fails: {"message":"invalid host header"}
railay
HOBBYOP

10 months ago

Hi, I am implementing health checks on some of my projects.

In my web app, I added an endpoint at /api/health which returns a simple dict {'status': 'ok'}. I deployed the code change to include this endpoint and I confirmed that I get a status 200 code from my app in a web browser.

But when I add /api/health to the project settings and redeploy, I get the following in the Build logs:

Attempt #1 failed with service unavailable. Continuing to retry for 4m49s

Attempt #2 failed with status 400: {"message":"invalid host header"}. Continuing to retry for 4m41s

Attempt #3 failed with status 400: {"message":"invalid host header"}. Continuing to retry for 4m39s

etc...

I'm not sure how to debug this. It looks like attempt #1, the service legitimately hadn't started, but on attempt #2, it was up and running. In the Deploy logs, I can see the app is getting the health check requests and returning a 400 error code.

Any ideas on how to work through this?

71 Replies

railay
HOBBYOP

10 months ago

34108dc5-7795-469f-b853-2863bbedfa6f


railay
HOBBYOP

10 months ago

The above is happening on my Python web app. But I am also implementing health checks on a Directus app, and the switch over is not seamless… it looks like the containers switch over before the health check is successful… the Directus instance becomes temporarily unreachable and I get content not found errors on the frontend (which queries Directus) until the Health check succeeds.


railay
HOBBYOP

10 months ago

This is what I see in the Directus Build logs…

====================
Starting Healthcheck
====================
Path: /server/health
Retry window: 5m0s

Attempt #1 failed with service unavailable. Continuing to retry for 4m49s

Attempt #2 failed with service unavailable. Continuing to retry for 4m38s

[1/1] Healthcheck succeeded!

brody
EMPLOYEE

10 months ago

Hello,

Perhaps this docs section will be of help? -


railay
HOBBYOP

10 months ago

Hi Brody! 🙂

What you linked me to sounds like a similar symptom, but as far as I know, my Python app is not restricting any domains. This is a publicly accessible web app. And I can perform the health check from my web browser.

And the problem with my Directus project is different. The health check passes, but I’m pretty sure the containers switch before it passes.


brody
EMPLOYEE

10 months ago

For the first error you showed, the healthcheck hostname section is what you want, for the service unavailable, you likely want this section -


railay
HOBBYOP

10 months ago

Thanks Brody. You were right about the first issue. I added [healthcheck.railway.app](healthcheck.railway.app) to my allowed_hosts and it resolved the issue. I still need to understand why, since it's a publicaly accessible project, but this was the issue… thanks!

Regarding the second issue, I'm confused. I have a PORT environment variable on the Directus project. This was present since before I created this ticket.

And eventually the health check does succeed. But the containers switch before the new container is fully started and before the health check passes.

I confirmed that the PORT value in the environment varaible (3000) matches the target port in Settings -> Public Networking


railay
HOBBYOP

10 months ago

I mean, if the health check eventually succeeds, then the port setup is working, correct?


brody
EMPLOYEE

10 months ago

how long are we talking before it succeeds?


railay
HOBBYOP

10 months ago

Seems to vary a bit per deployment, but the last one shows about a minute. And the service actually goes down for a bit longer than that, because the containers switch before the health checks start

1360287766575059500


railay
HOBBYOP

10 months ago

The last deployment took 5 attempts to succeed. The original example I shared above shows it succeeded on the 3rd attempt, so 30-ish seconds on that one


brody
EMPLOYEE

10 months ago

oh that's completely normal


brody
EMPLOYEE

10 months ago

just depends on how fast the application starts


railay
HOBBYOP

10 months ago

Yup, agreed


railay
HOBBYOP

10 months ago

Problem is Railway is serving the new container before it starts


railay
HOBBYOP

10 months ago

So my frontend is broken for those 30-60 seconds


brody
EMPLOYEE

10 months ago

why do you think its serving the new container


brody
EMPLOYEE

10 months ago

The service has a volume right?


railay
HOBBYOP

10 months ago

My Directus project becomes unavailable during that time. I can't access the CMS at all.

And my python web app makes API queries to Directus for content, and API calls from Python to Directus fail


railay
HOBBYOP

10 months ago

Yes, Directus has a volume and it also connects to a Postgres DB hosted by Railway


brody
EMPLOYEE

10 months ago

Yep then there will be downtime during the deployment, we do not support zero downtime deployments with volumes


railay
HOBBYOP

10 months ago

Hmmm ok. I'm not sure what's actually being stored on the volume. Probably images, but not sure what else.

I forked a Directus community template to get up and running. I can solve for image storage via a CDN bucket, but don't know if I need to solve for anything else… maybe Directus settings/app config/etc.


railay
HOBBYOP

10 months ago

But if I can't solve for that, then Directus on Railway won't work as a solution for me


railay
HOBBYOP

10 months ago

I can't afford a minute of downtime everytime I do a deployment


brody
EMPLOYEE

10 months ago

why does it take so long to start?


railay
HOBBYOP

10 months ago

Good question. Here are the logs from the last deployment.

It does look like some jobs are running twice. I see two logs that show Initializing bootstrap and Running migrations....

Three lines say Loaded extensions...

1360295133932683500


brody
EMPLOYEE

10 months ago

are you not using medim's template?


railay
HOBBYOP

10 months ago

No, I forked from this template



medim
MODERATOR

10 months ago

are those extensions not available on the directus marketplace?


railay
HOBBYOP

10 months ago

Yes, those 3 are.

Originally, I was planning to build a custom extension, but I think what I want to do can't be done by extension


railay
HOBBYOP

10 months ago

Why do you ask?


medim
MODERATOR

10 months ago

You can use the barebones template and download those extensions through the marketplace


medim
MODERATOR

10 months ago

If you need postgis you can replace the included postgresdb


medim
MODERATOR

10 months ago

websockets and everything else can be setup through env vars


railay
HOBBYOP

10 months ago

I don't think I need postgis, right now I'm just planning to use it for housing blog content, category structure, etc


medim
MODERATOR

10 months ago

Then I think the barebones template suffices


railay
HOBBYOP

10 months ago

And the benefit of installing the extensions via the marketplace is that the extensions don't have to be built during deployment? They would persist via the volume?


medim
MODERATOR

10 months ago

yes


medim
MODERATOR

10 months ago

actually now i'm curious if it does, because the template persists directus/data and extensions are stored in directus/extensions, the last time I tested this it did though.


medim
MODERATOR

10 months ago

here's the template link
https://railway.com/template/2fy758


railay
HOBBYOP

10 months ago

Ok. If I did want to make a custom/private Directus extension later, is there a way to install it via your template? i.e. an extension that is not on the Directus marketplace.

I think that was one of the reasons I didn't use your template… the repo I found was archived, so I couldn't fork it and make changes to the code


medim
MODERATOR

10 months ago

You can use the volume browser template to upload the extension in your directus volume


medim
MODERATOR

10 months ago

you can also now ssh into it and curl/clone/wget it, it uses alpine in the base image


medim
MODERATOR

10 months ago

but feel free to @ me if you have any issues


railay
HOBBYOP

10 months ago

Ok, thanks so much! I'll set it up and run some tests this afternoon to see if I can get this working to an uptime level that works for me


medim
MODERATOR

10 months ago

amazing! please let me know how it goes afterwards


railay
HOBBYOP

10 months ago

@Medim did a very quick test. I deployed a fresh version of your template, logged in and installed 2 extensions. I then redeployed the build.

Good news: the extensions persisted 🎉

Bad news: the Directus instance became unavailable at the Deploy step and remained unavailable for 15-25 seconds (did a few deployments to test)


railay
HOBBYOP

10 months ago

So… better than the other template I was using, but that's a lot of downtime every deployment


medim
MODERATOR

10 months ago

Hmmm, I don't remember that happening for me.. can you show the deployment logs?


railay
HOBBYOP

10 months ago

Sure, here ya go

1360310736546566400


railay
HOBBYOP

10 months ago

And here are the build logs

1360310836769587200


railay
HOBBYOP

10 months ago

Health check happens about 18 seconds after the first time stamp in the build Deploy logs


railay
HOBBYOP

10 months ago

But the instance does seem to become unavailable prior to the Mounting volume log line


medim
MODERATOR

10 months ago

tbh I think that's just directus initializing, does the old deployment gets removed while the new one is unavailable?


medim
MODERATOR

10 months ago

You could set a RAILWAY_DEPLOYMENT_OVERLAP_SECONDS env var to make the old deployment overlap for some extra time while the new one initializes? default value is 20


railay
HOBBYOP

10 months ago

The Railway UI shows the old container as active until the new one is fully deployed


railay
HOBBYOP

10 months ago

I set RAILWAY_DEPLOYMENT_OVERLAP_SECONDS to 30 seconds, deployed the change, and then did another deployment to test. The application fails to respond within seconds of clicking Redeploy

1360316615283052800


medim
MODERATOR

10 months ago

Try increasing the value, I wanna see what happens


railay
HOBBYOP

10 months ago

Sure, I'll set to 120 now and rerun. Here's a video from the last deployment, so you can see what I see. On the right side of the video is the Directus UI, I am hitting CMD + R every second during deployment, and you can see the UI fails to reload for at least 10 seconds due to no response

1360318102927053012


medim
MODERATOR

10 months ago

cc @Brody wasn't RAILWAY_DEPLOYMENT_OVERLAP_SECONDS used to prevent that?


medim
MODERATOR

10 months ago

this video is with RAILWAYDEPLOYMENTOVERLAP_SECONDS set to 30 seconds?


railay
HOBBYOP

10 months ago

Yes, var is set to 30 seconds in this video


railay
HOBBYOP

10 months ago

Ok, i set RAILWAY_DEPLOYMENT_OVERLAP_SECONDS to 120, and I got the exact same behavior as whats shown in the video


railay
HOBBYOP

10 months ago

And the new container went live in about the same amount of time, I definitely did not wait 120 seconds for a switchover


brody
EMPLOYEE

10 months ago

doesn't work when you have a volume


railay
HOBBYOP

10 months ago

@Medim sounds like there isn't a fix for this, since the template builds directly from the docker image, right?

Only other workaround I can think of is having two Directus instances… deploy changes/latest version to one and then switch the API URL on my frontend to point to the latest instance. But this is probably more work than I want to deal with, tbh


medim
MODERATOR

10 months ago

Yeah… that's a bit of a letdown.. I also thought of setting up S3 for file storage and removing the volume an then using the overlap env var


medim
MODERATOR

10 months ago

But it would need to be tested


railay
HOBBYOP

10 months ago

Yeah, devil is in the details on that kind of change. Ok, too bad… but thanks again for all your help with this!


railay
HOBBYOP

10 months ago

Yeah, devil is in the details on that kind of change. Ok, too bad… but thanks again for all your help with this!


Loading...