3 months ago
For my gRPC services to talk, they need the response headers. Currently, I am seeing the following behavior:
Locally I see what I expect in my stack:
headers { ':status': 200, 'content-type': 'application/grpc', ... }
trailers { 'grpc-status': '0' }running against my railway instance I see:
headers { ':status': 200, 'content-type': 'application/grpc', 'server': 'railway-edge', ... }With no trailer.
If I ssh into one of my railway instances and run the same thing with the internal url, I see:
headers { ':status': 200, 'content-type': 'application/grpc', ... }
trailers { 'grpc-status': '5', 'grpc-message': 'Instance not found ...' }So the problem must exist between the internal and external / internet facing networking.
79 Replies
This was happening before the DNS outage. Started somewhere between the .well-known/ failure and the DNS outage.
Just to clarify the severity, all of my services are currently unable to perform any kind of auth (this is my auth server) and so everything is effectively down as has been the case the whole day.
3 months ago
So we aren't allowing the trailer to pass?
If I hit the external url from my local machine or when ssh'd into another instance on railway, I see the error.
When hitting the internal url from inside an instance on railway, I see the trailer.
3 months ago
Do you generate a trailer header field?
Yes, as far as I can tell that happens since I get one when hitting the internal url
For more context: This is also a recent occurrence -- I've had this service deployed for a few months and this is the first time this issue has appeared
3 months ago
What is the domain in question?
3 months ago
Clear your cache and try again please.
3 months ago
Gotcha. Would I be able to ask you for an MRE? I'd be more than happy to cover this outage with credits.
This should return the trailer:
node -e "const http2=require('http2'); const c=http2.connect('https://login-api.truer.health'); const req=c.request({':method':'POST',':path':'/zitadel.session.v2.SessionService/ListSessions','content-type':'application/grpc','te':'trailers'}); req.on('response',h=>{console.log('headers',h);}); req.on('trailers',t=>{console.log('trailers',t); c.close();}); req.on('error',e=>{console.error(e); c.close();}); req.end(Buffer.from([0,0,0,0,0]));"If someone ssh's into the linked service and runs that command, it fails. If they change the domain to the internal url for the login-api, it succeeds. That particular request should return this:
trailers [Object: null prototype] {
'grpc-message': 'auth header missing',
'grpc-status': '16',
Symbol(sensitiveHeaders): []
}3 months ago
Do you have something we can deploy onto Railway ourselves?
It requires a DB, I could probably try making a quick template to make it simpler
They also have a docker compose here, if that would help: https://raw.githubusercontent.com/zitadel/zitadel/main/apps/docs/content/self-hosting/deploy/docker-compose.yaml
I can spin that local version up, hit it with the same request, and get the trailers
3 months ago
A template would be wonderful if you could do that.
I see lots of templates for Zitadel, but they are all under-configured seemingly. Will make one that duplicates my setup
hmm. Template editor is glitching out and erased my changes when I tried to save. Separate, unrelated issue.
I have lots of feedback about that experience. Feedback channel?
3 months ago
Yes please!
I booted up that template into a new project to confirm. Seeing the same issue there, so it should hopefully help in recreating the error.
3 months ago
Dumb question, what would I curl to see a failure?
I gave a node version above that is a bit more filtered, but here's a curl version that is more verbose:
curl -vk --http2 \
-H "content-type: application/grpc" \
-H "te: trailers" \
--data-binary $'\x00\x00\x00\x00\x00' \
https://login-api.truer.health/zitadel.session.v2.SessionService/ListSessionsShould see something like this at the end:
< grpc-message: unary request has zero messages
< grpc-status: 123 months ago
Noted.
Hi @Brody, I can see everyone has been working hard. Is this something being actively tackled or do I need to find a workaround for the near-term? We were going to be launching our product this weekend and this has essentially removed that as an option.
3 months ago
It is, but I had thought you found a work around, I am terribly sorry.
3 months ago
Could you try a Cloudflare tunnel?
3 months ago
(no markdown available for this content)
3 months ago
This won't touch fastly
Not familiar, does this essentially route the internal traffic via cloudflare? So I'd need to setup my domains there?
3 months ago
Correct, no domain transfer, just a nameserver transfer.
3 months ago
Gave you some credits that should cover your next invoice.
Thank you, it's appreciated! The hard work is also appreciated <:salute:1137099685417451530>
3 months ago
Without Fastly -
curl --verbose --resolve "utilities-us-east.up.railway.app:443:66.33.22.11" "https://utilities-us-east.up.railway.app/trailers"
This response includes HTTP trailers.
< x-checksum: 55d3116a90c2014f65580db7cf5c27e1
* Connection #0 to host utilities-us-east.up.railway.app left intactWith Fastly -
curl --verbose "https://utilities-us-east.up.railway.app/trailers"
This response includes HTTP trailers.
* Connection #0 to host utilities-us-east.up.railway.app left intact3 months ago
Can you do the same for your site?
Sorry, it's getting late and my brain is a bit fried. What am I seeing here? like appending the port and IP with the domain?
3 months ago
Sorry, 66.33.22.11 is our anycast Metal edge. Setting that in the curl request, as I have, bypasses Fastly's edge.
3 months ago
Just as a test, not as a solution.
Still seems to be an issue:
curl -vk --http2 \
-H "content-type: application/grpc" \
-H "te: trailers" \
--data-binary $'\x00\x00\x00\x00\x00' \
https://login-api.truer.health/zitadel.session.v2.SessionService/ListSessionsLet me try inside an instance
3 months ago
You aren't using --resolve?
oops, copied the wrong command into discord.
curl --verbose \
--resolve "login-api.truer.health:433:66.33.22.11"\
-H "content-type: application/grpc" \
-H "te: trailers" \
--data-binary $'\x00\x00\x00\x00\x00' \
https://login-api.truer.health/zitadel.session.v2.SessionService/ListSessions3 months ago
Well, now I'm confused. I don't even see the body here.
< grpc-message: unary request has zero messages
< grpc-status: 123 months ago
More or less haha, I will let you know when we have a fix for this.
3 months ago
Hello!
We've escalated your issue to our engineering team.
We aim to provide an update within 1 business day.
Please reply to this thread if you have any questions!
Status changed to Awaiting User Response Railway • 3 months ago
3 months ago
Full transparency, we may have to wait until next week to engage the Fastly folks, so please let me know if I can do anything to help you get the Cloudflare tunnel in place.
Thanks, I'll be tackling that first thing tomorrow. I tried a few other routes to confirm -- did not pan out
3 months ago
Sounds good!
As soon as I laid down to sleep I realized something I hadn't tried, linking one of the services up using internal routing, tested it and it worked. Still can't do grpc from local, which is not great and kills some dev cycles, but I can survive a weekend and the product will be usable for the soft launch 🥂
3 months ago
Oh, that's great. I was under the impression it needed to be callable publicly, so I am very happy to hear that.
I had tried the internal routing before, but that test was bad and I thought it was because of a forced http -> https upgrade. I thought it was checking the cert and required the cert-backed public domain since the errors were SSL related
I still will need it to be callable publicly, but the first release we're doing doesn't "require" it. Basically our whole production and staging e2e test / check suite is dead w/o it, which is not great long term
3 months ago
We wont keep trailer headers broken long term <:salute:1137099685417451530>
3 months ago
We are still engaging with Fastly engineers, and they are actively looking into what could cause the trailer headers to be dropped.
3 months ago
They have confirmed our findings and now will be looking into what can be done.
Thanks for keeping me posted. Surprised this isn't more commonly an issue for them. I guess most people don't use grpc securely on the internet?
3 months ago
I mean, it's not even a question of gRPC, because anything HTTP/2 spec-specific, we never supported, given we demux HTTP/2 to HTTP/1.1, but trailer headers have been a thing since the 90s, so I am surprised that they don't support them, too.
Ah, I'm unfamiliar with the history of them and their ubiquity. Only learned about this after this incident.
I guess SSE and other chunked streaming could and should be using them for similar reasons as grpc, makes sense.
3 months ago
I assume you are setup with a Cloudflare tunnel right?
Not yet, have largely worked around the issues. The original production outage is no longer an issue. Was just problematic for some production tests we had, but I've modified them to make it no longer a concern.
This will cause issues later once we want to do grpc streaming to a mobile app, but that's not an immediate (next 2 month) priority. And if this was never solved, we could introduce a translation layer (grpc-web). It's just not optimal
3 months ago
Fastly hasn't given us any timelines, but I can't imagine a solution from them would be anywhere near two months away.
Thanks for the honesty 😂 I'll be sure to document that in our timelines!
3 months ago
Of course, let me know if I can help with anything else in the meantime!
2 months ago
Okay hi guys in this cute chat - I now faced the same issue trying to make my Flutter app with gRPC Go backend. Hope for updates when there will be exact solution.
Trying workaround with TCP forwarding
Status changed to Awaiting Railway Response Railway • about 2 months ago
glorko
Okay hi guys in this cute chat - I now faced the same issue trying to make my Flutter app with gRPC Go backend. Hope for updates when there will be exact solution. Trying workaround with TCP forwarding
2 months ago
Are you relying on trailer headers?
Status changed to Awaiting User Response Railway • about 2 months ago
3 days ago
This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!
Status changed to Solved Railway • 3 days ago