gRPC response trailers being stripped on Railway
therobotcarlson
PROOP

3 months ago

For my gRPC services to talk, they need the response headers. Currently, I am seeing the following behavior:

Locally I see what I expect in my stack:

headers { ':status': 200, 'content-type': 'application/grpc', ... }
trailers { 'grpc-status': '0' }

running against my railway instance I see:

headers { ':status': 200, 'content-type': 'application/grpc', 'server': 'railway-edge', ... }

With no trailer.

If I ssh into one of my railway instances and run the same thing with the internal url, I see:

headers { ':status': 200, 'content-type': 'application/grpc', ... }
trailers { 'grpc-status': '5', 'grpc-message': 'Instance not found ...' }

So the problem must exist between the internal and external / internet facing networking.

Solved

79 Replies

therobotcarlson
PROOP

3 months ago

This was happening before the DNS outage. Started somewhere between the .well-known/ failure and the DNS outage.


therobotcarlson
PROOP

3 months ago

Just to clarify the severity, all of my services are currently unable to perform any kind of auth (this is my auth server) and so everything is effectively down as has been the case the whole day.


3 months ago

So we aren't allowing the trailer to pass?


therobotcarlson
PROOP

3 months ago

Yes, that appears to be the case, it gets blocked going "out"


therobotcarlson
PROOP

3 months ago

If I hit the external url from my local machine or when ssh'd into another instance on railway, I see the error.

When hitting the internal url from inside an instance on railway, I see the trailer.


3 months ago

Do you generate a trailer header field?


therobotcarlson
PROOP

3 months ago

Yes, as far as I can tell that happens since I get one when hitting the internal url


therobotcarlson
PROOP

3 months ago

For more context: This is also a recent occurrence -- I've had this service deployed for a few months and this is the first time this issue has appeared


3 months ago

What is the domain in question?


therobotcarlson
PROOP

3 months ago


3 months ago

Clear your cache and try again please.


therobotcarlson
PROOP

3 months ago

Running from my terminal -- no change


3 months ago

Gotcha. Would I be able to ask you for an MRE? I'd be more than happy to cover this outage with credits.


therobotcarlson
PROOP

3 months ago

This should return the trailer:

node -e "const http2=require('http2'); const c=http2.connect('https://login-api.truer.health'); const req=c.request({':method':'POST',':path':'/zitadel.session.v2.SessionService/ListSessions','content-type':'application/grpc','te':'trailers'}); req.on('response',h=>{console.log('headers',h);}); req.on('trailers',t=>{console.log('trailers',t); c.close();}); req.on('error',e=>{console.error(e); c.close();}); req.end(Buffer.from([0,0,0,0,0]));"

therobotcarlson
PROOP

3 months ago

If someone ssh's into the linked service and runs that command, it fails. If they change the domain to the internal url for the login-api, it succeeds. That particular request should return this:

trailers [Object: null prototype] {
  'grpc-message': 'auth header missing',
  'grpc-status': '16',
  Symbol(sensitiveHeaders): []
}

3 months ago

Do you have something we can deploy onto Railway ourselves?


therobotcarlson
PROOP

3 months ago

This is the image I have deployed: ghcr.io/zitadel/zitadel:v4.11.0


therobotcarlson
PROOP

3 months ago

It requires a DB, I could probably try making a quick template to make it simpler


therobotcarlson
PROOP

3 months ago

They also have a docker compose here, if that would help: https://raw.githubusercontent.com/zitadel/zitadel/main/apps/docs/content/self-hosting/deploy/docker-compose.yaml

I can spin that local version up, hit it with the same request, and get the trailers


3 months ago

A template would be wonderful if you could do that.


therobotcarlson
PROOP

3 months ago

I see lots of templates for Zitadel, but they are all under-configured seemingly. Will make one that duplicates my setup


therobotcarlson
PROOP

3 months ago

hmm. Template editor is glitching out and erased my changes when I tried to save. Separate, unrelated issue.


therobotcarlson
PROOP

3 months ago


3 months ago

Yes please!


therobotcarlson
PROOP

3 months ago

I booted up that template into a new project to confirm. Seeing the same issue there, so it should hopefully help in recreating the error.


therobotcarlson
PROOP

3 months ago

Should require 0 config


3 months ago

Dumb question, what would I curl to see a failure?


therobotcarlson
PROOP

3 months ago

I gave a node version above that is a bit more filtered, but here's a curl version that is more verbose:

curl -vk --http2 \
    -H "content-type: application/grpc" \
    -H "te: trailers" \
    --data-binary $'\x00\x00\x00\x00\x00' \
    https://login-api.truer.health/zitadel.session.v2.SessionService/ListSessions

therobotcarlson
PROOP

3 months ago

Should see something like this at the end:

< grpc-message: unary request has zero messages
< grpc-status: 12

therobotcarlson
PROOP

3 months ago

as in, we should see


3 months ago

Noted.


therobotcarlson
PROOP

3 months ago

Hi @Brody, I can see everyone has been working hard. Is this something being actively tackled or do I need to find a workaround for the near-term? We were going to be launching our product this weekend and this has essentially removed that as an option.


3 months ago

It is, but I had thought you found a work around, I am terribly sorry.


3 months ago

Could you try a Cloudflare tunnel?


3 months ago

(no markdown available for this content)


3 months ago

This won't touch fastly


therobotcarlson
PROOP

3 months ago

Not familiar, does this essentially route the internal traffic via cloudflare? So I'd need to setup my domains there?


3 months ago

Correct, no domain transfer, just a nameserver transfer.


therobotcarlson
PROOP

3 months ago

Great, that seems like a possibility. I'll take a look.


3 months ago

Gave you some credits that should cover your next invoice.


therobotcarlson
PROOP

3 months ago

Thank you, it's appreciated! The hard work is also appreciated <:salute:1137099685417451530>


3 months ago

Without Fastly -

curl --verbose --resolve "utilities-us-east.up.railway.app:443:66.33.22.11" "https://utilities-us-east.up.railway.app/trailers"

This response includes HTTP trailers.
&lt; x-checksum: 55d3116a90c2014f65580db7cf5c27e1
* Connection #0 to host utilities-us-east.up.railway.app left intact

With Fastly -

curl --verbose "https://utilities-us-east.up.railway.app/trailers"

This response includes HTTP trailers.
* Connection #0 to host utilities-us-east.up.railway.app left intact

3 months ago

Can you do the same for your site?


therobotcarlson
PROOP

3 months ago

Sorry, it's getting late and my brain is a bit fried. What am I seeing here? like appending the port and IP with the domain?


3 months ago

Sorry, 66.33.22.11 is our anycast Metal edge. Setting that in the curl request, as I have, bypasses Fastly's edge.


3 months ago

Just as a test, not as a solution.


therobotcarlson
PROOP

3 months ago

got it! testing now


therobotcarlson
PROOP

3 months ago

Still seems to be an issue:

curl -vk --http2 \
    -H "content-type: application/grpc" \
    -H "te: trailers" \
    --data-binary $'\x00\x00\x00\x00\x00' \
    https://login-api.truer.health/zitadel.session.v2.SessionService/ListSessions

Let me try inside an instance


3 months ago

You aren't using --resolve?


therobotcarlson
PROOP

3 months ago

oops, copied the wrong command into discord.

curl --verbose \
  --resolve "login-api.truer.health:433:66.33.22.11"\
    -H "content-type: application/grpc" \
    -H "te: trailers" \
    --data-binary $'\x00\x00\x00\x00\x00' \
    https://login-api.truer.health/zitadel.session.v2.SessionService/ListSessions

3 months ago

Well, now I'm confused. I don't even see the body here.


therobotcarlson
PROOP

3 months ago

wait. I'm dumb. typo -> 433 -> 443


therobotcarlson
PROOP

3 months ago

chatgpt caught it


therobotcarlson
PROOP

3 months ago

that works!


therobotcarlson
PROOP

3 months ago

&lt; grpc-message: unary request has zero messages
&lt; grpc-status: 12

therobotcarlson
PROOP

3 months ago

Guessing this means we can blame Fastly? 😄


3 months ago

More or less haha, I will let you know when we have a fix for this.


Railway
BOT

3 months ago

Hello!

We've escalated your issue to our engineering team.

We aim to provide an update within 1 business day.

Please reply to this thread if you have any questions!

Status changed to Awaiting User Response Railway 3 months ago


3 months ago

Full transparency, we may have to wait until next week to engage the Fastly folks, so please let me know if I can do anything to help you get the Cloudflare tunnel in place.


therobotcarlson
PROOP

3 months ago

Thanks, I'll be tackling that first thing tomorrow. I tried a few other routes to confirm -- did not pan out


3 months ago

Sounds good!


therobotcarlson
PROOP

3 months ago

As soon as I laid down to sleep I realized something I hadn't tried, linking one of the services up using internal routing, tested it and it worked. Still can't do grpc from local, which is not great and kills some dev cycles, but I can survive a weekend and the product will be usable for the soft launch 🥂


3 months ago

Oh, that's great. I was under the impression it needed to be callable publicly, so I am very happy to hear that.


therobotcarlson
PROOP

3 months ago

I had tried the internal routing before, but that test was bad and I thought it was because of a forced http -> https upgrade. I thought it was checking the cert and required the cert-backed public domain since the errors were SSL related


therobotcarlson
PROOP

3 months ago

I still will need it to be callable publicly, but the first release we're doing doesn't "require" it. Basically our whole production and staging e2e test / check suite is dead w/o it, which is not great long term


3 months ago

We wont keep trailer headers broken long term <:salute:1137099685417451530>


therobotcarlson
PROOP

3 months ago

Sounds great <:salute:1137099685417451530>


3 months ago

We are still engaging with Fastly engineers, and they are actively looking into what could cause the trailer headers to be dropped.


therobotcarlson
PROOP

3 months ago

Thank you for the update!


3 months ago

They have confirmed our findings and now will be looking into what can be done.


therobotcarlson
PROOP

3 months ago

Thanks for keeping me posted. Surprised this isn't more commonly an issue for them. I guess most people don't use grpc securely on the internet?


3 months ago

I mean, it's not even a question of gRPC, because anything HTTP/2 spec-specific, we never supported, given we demux HTTP/2 to HTTP/1.1, but trailer headers have been a thing since the 90s, so I am surprised that they don't support them, too.


therobotcarlson
PROOP

3 months ago

Ah, I'm unfamiliar with the history of them and their ubiquity. Only learned about this after this incident.

I guess SSE and other chunked streaming could and should be using them for similar reasons as grpc, makes sense.


3 months ago

I assume you are setup with a Cloudflare tunnel right?


therobotcarlson
PROOP

3 months ago

Not yet, have largely worked around the issues. The original production outage is no longer an issue. Was just problematic for some production tests we had, but I've modified them to make it no longer a concern.

This will cause issues later once we want to do grpc streaming to a mobile app, but that's not an immediate (next 2 month) priority. And if this was never solved, we could introduce a translation layer (grpc-web). It's just not optimal


3 months ago

Fastly hasn't given us any timelines, but I can't imagine a solution from them would be anywhere near two months away.


therobotcarlson
PROOP

3 months ago

Thanks for the honesty 😂 I'll be sure to document that in our timelines!


3 months ago

Of course, let me know if I can help with anything else in the meantime!


glorko
PRO

2 months ago

Okay hi guys in this cute chat - I now faced the same issue trying to make my Flutter app with gRPC Go backend. Hope for updates when there will be exact solution.

Trying workaround with TCP forwarding


Status changed to Awaiting Railway Response Railway about 2 months ago


glorko

Okay hi guys in this cute chat - I now faced the same issue trying to make my Flutter app with gRPC Go backend. Hope for updates when there will be exact solution. Trying workaround with TCP forwarding

2 months ago

Are you relying on trailer headers?


Status changed to Awaiting User Response Railway about 2 months ago


Railway
BOT

3 days ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway 3 days ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...