Health check failing after minor code change

AnonymousTRIAL

a year ago

I made a very minor text change, and now my deployment isn't working where as it was working fine yesterday. The deploy logs look normal, but my build logs show that my health check isn't working (it works locally).

0 Replies

AnonymousTRIAL

a year ago

N/A

AnonymousTRIAL

a year ago

My health check is /health which serves:

def health_check(request):
    return HttpResponse(status=200)

brodyEMPLOYEE

a year ago

what is the health check failing with?

AnonymousTRIAL

a year ago

Attempt #6 failed with service unavailable. Continuing to retry for 6m31s

brodyEMPLOYEE

a year ago

are you on the legacy or v2 runtime? check your service settings

AnonymousTRIAL

a year ago

legacy

AnonymousTRIAL

a year ago

Oh wait

AnonymousTRIAL

a year ago

It's V2?

brodyEMPLOYEE

a year ago

on the v2 runtime your app needs to listen on ::

AnonymousTRIAL

a year ago

Did it get auto-switched or something?

brodyEMPLOYEE

a year ago

might have

AnonymousTRIAL

a year ago

I don't love that. Would've been nice to know about a breaking change like this.

AnonymousTRIAL

a year ago

Is that safe for me to switch back to legacy?

AnonymousTRIAL

a year ago

And where would I find docs for the difference between legacy and V2?

angeloEMPLOYEE

a year ago

You can indeed switch to legacy.

brodyEMPLOYEE

a year ago

it has been mentioned in two places, what else would work best for you?

angeloEMPLOYEE

a year ago

We expect the legacy runtime to stay in place for as long as we get the expected behavior that our users need.

brodyEMPLOYEE

a year ago

brodyEMPLOYEE

a year ago

if the healthcheck issue is the only issue you face, I cannot recommend switching back to the legacy runtime

brodyEMPLOYEE

a year ago

fwiw the health check issue has been reported to the team

AnonymousTRIAL

a year ago

That's fair, but I'm not always looking at the changelog unless I'm interested to see what new features are available to me. I do get emails about the changelog with some basic bullet points, but it would've been nice to have in this email, or a separate email a message long the lines of:
"Starting 6/x/2024, all services will be switched from Legacy to V2, and here's what you need to do to before then:"

1250149322935373800

AnonymousTRIAL

a year ago

Ohhh. So it wasn't anticipated.

AnonymousTRIAL

a year ago

That's fair.

AnonymousTRIAL

a year ago

Ok. Thanks for the help! Will fix up my /health response 😄

brodyEMPLOYEE

a year ago

its a fair assumption that the changelogs would only include new features, and they do, but that also mention migration timelines and such for new features and new features always have the possibility to cause issues

AnonymousTRIAL

a year ago

True, but imo known breaking changes should be communicated more directly. I don't always have the time to read changelogs for all of the services I use. My project is a hobby project, so no bigs, but for the enterprise customers, that could put a snag in their work. Luckily the support here is really on top of things!

brodyEMPLOYEE

a year ago

i dont think this was known tbh, but i have no way to know for sure

brodyEMPLOYEE

a year ago

waiting to hear back from char on this issue

AnonymousTRIAL

a year ago

Yeah, in that case, it's a hiccup. And good on the Railway team for testing with Hobby accounts first so they can find these issues before they reach enterprise customers.

AnonymousTRIAL

a year ago

I'm still having issues getting this to work. I've add [::1]:$PORT to my gunicorn command. I've confirmed this working locally, but still having trouble with the health check

AnonymousTRIAL

a year ago

So it was gunicorn project.wsgi and now it's gunicorn -b 127.0.0.1:$PORT -b [::]:$PORT project.wsgi

brodyEMPLOYEE

a year ago

it needs to be :: not ::1

AnonymousTRIAL

a year ago

Ah, see, I tried that, but I get [ERROR] Connection in use: ('::', 65090)

angeloEMPLOYEE

a year ago

Dumb ask and unsure if you did this in the past, switching to Legacy confirmed will fix the issue? Wanna make sure our network engineer can do a proper repro.

AnonymousTRIAL

a year ago

I figured that there's already something running there.

brodyEMPLOYEE

a year ago

yes, check <#880575219541114940>

AnonymousTRIAL

a year ago

I'll give it a try here and confirm.

AnonymousTRIAL

a year ago

I don't have access to that channel.

angeloEMPLOYEE

a year ago

He is flagging me to another case 🙂

AnonymousTRIAL

a year ago

ahhh

angeloEMPLOYEE

a year ago

We just wanna have more languages to test runtime with hence why I ask.

angeloEMPLOYEE

a year ago

The more cases the better.

AnonymousTRIAL

a year ago

Sounds good! Yeah, I'll test and report back.

angeloEMPLOYEE

a year ago

And sorry to use you as a test pig, I can comp you the month since you are doing QA work.

brodyEMPLOYEE

a year ago

gunicorn -b [::]:$PORT project.wsgi

AnonymousTRIAL

a year ago

Yup! Tried that, and got the "Connection in use" error.

AnonymousTRIAL

a year ago

Much appreciated!

brodyEMPLOYEE

a year ago

deploy logs please -

angeloEMPLOYEE

a year ago

new role added

angeloEMPLOYEE

a year ago

comped, test away, let us know when you have recovered the healthcheck

AnonymousTRIAL

a year ago

Ah, looks like that only gets the newest logs. Here's what it shows:

[2024-06-11 19:11:19 +0000] [1] [INFO] Starting gunicorn 21.2.0

[2024-06-11 19:11:19 +0000] [1] [ERROR] Connection in use: ('::', 65090)

[2024-06-11 19:11:19 +0000] [1] [ERROR] Retrying in 1 second.

[2024-06-11 19:11:20 +0000] [1] [ERROR] Connection in use: ('::', 65090)

[2024-06-11 19:11:20 +0000] [1] [ERROR] Retrying in 1 second.

[2024-06-11 19:11:21 +0000] [1] [ERROR] Connection in use: ('::', 65090)

[2024-06-11 19:11:21 +0000] [1] [ERROR] Retrying in 1 second.

[2024-06-11 19:11:22 +0000] [1] [ERROR] Connection in use: ('::', 65090)

[2024-06-11 19:11:22 +0000] [1] [ERROR] Retrying in 1 second.

[2024-06-11 19:11:23 +0000] [1] [ERROR] Connection in use: ('::', 65090)

[2024-06-11 19:11:23 +0000] [1] [ERROR] Retrying in 1 second.

[2024-06-11 19:11:24 +0000] [1] [ERROR] Can't connect to ('::', 65090)

container event container died

brodyEMPLOYEE

a year ago

ill try to reproduce

brodyEMPLOYEE

a year ago

what version of gunicorn?

AnonymousTRIAL

a year ago

I changed my gunicorn command back to what it was, and flipped the runtime to Legacy and it deployed successfully. And that's including the minor code change mentioned in the original post.

AnonymousTRIAL

a year ago

21.2.0

angeloEMPLOYEE

a year ago

Gotcha- that seems to be enough, going to add this case on the Runtime V2 blockers in the root thread.

AnonymousTRIAL

a year ago

Great. Thanks!

brodyEMPLOYEE

a year ago

my start command is gunicorn -b [::]:$PORT main:app on the v2 runtime with the same gunicorn version you are using, so this new error doesnt look like a v2 vs legacy issue

1250176510388600800
1250176510619422700

AnonymousTRIAL

a year ago

But what would already be running on that port? 🤔

AnonymousTRIAL

a year ago

In my case.

brodyEMPLOYEE

a year ago

does your container run gunicorn and only gunicorn?

AnonymousTRIAL

a year ago

It runs a couple django commands before gunicorn. migrate and collectstatic

brodyEMPLOYEE

a year ago

can you provide the full command

AnonymousTRIAL

a year ago

python [manage.py](manage.py) migrate && python [manage.py](manage.py) collectstatic --noinput && gunicorn project.wsgi

brodyEMPLOYEE

a year ago

and what was the command when you got this error?

AnonymousTRIAL

a year ago

python manage.py migrate && python manage.py collectstatic --noinput && gunicorn -b [::]:$PORT grbot.wsgi

AnonymousTRIAL

a year ago

I may have had an extra -b 127.0.0.1:$PORT in there for IPv4.

AnonymousTRIAL

a year ago

Testing just [::] atm

brodyEMPLOYEE

a year ago

that would do it, gunicorn supports dual stack binding anyway so that wouldnt be needed, 127.0.0.1 would also be the incorrect address

AnonymousTRIAL

a year ago

Their documentation seems to suggest that you need to state both: https://docs.gunicorn.org/en/stable/settings.html#bind

AnonymousTRIAL

a year ago

and I'm assuming the correct address is 0.0.0.0?

angeloEMPLOYEE

a year ago

Yes, binding on 127.0.0.01 won't bind properly.

angeloEMPLOYEE

a year ago

But wondering why legacy did it.

AnonymousTRIAL

a year ago

I didn't have that for legacy. I was just adding it in because I assumed I needed it if I also needed to have IPv6. My bad.

AnonymousTRIAL

a year ago

Alright, well. It worked with python [manage.py](manage.py) migrate && python [manage.py](manage.py) collectstatic --noinput && gunicorn -b [::]:$PORT project.wsgi

AnonymousTRIAL

a year ago

on V2

brodyEMPLOYEE

a year ago

by default gunicorn binds to 0.0.0.0:$PORT so that would have worked for legacy

brodyEMPLOYEE

a year ago

as i suggested 🙂

AnonymousTRIAL

a year ago

Yup! For some reason I thought I tested that. Sorry about that.

brodyEMPLOYEE

a year ago

no worries

AnonymousTRIAL

a year ago

Guess this isn't a new bug then, Angelo! I apologize. New to messing with IPv6.

brodyEMPLOYEE

a year ago

it is a new bug

brodyEMPLOYEE

a year ago

you should not need to listen on ipv6 just for the health check to work

angeloEMPLOYEE

a year ago

Yea, if any behavior is different vs. old, its a bug.

angeloEMPLOYEE

a year ago

You did us a favor.

brodyEMPLOYEE

a year ago

technically solved

brodyEMPLOYEE

10 months ago

Update, health checks can now pass if your app only listens on 0.0.0.0 but if you have already changed it to :: there's no point in changing anything back as listening on :: has no known drawbacks.