After metal migration, Service Environment variable mismatching
drmarshall
PROOP

6 months ago

DNSException: 
 syscall: "getaddrinfo",
   errno: 4,
    code: "ENOTFOUND"

Following the automatic migration to metal, one of our services (which references a private URL, that's also also on metal) has begun to fail after a redeploy with above error.

I've tried redeploying the services, but it seems there is a DNS resolution issue for services, which are all on metal.

My redis service exposes a REDIS_PRIVATE_URL variable defined as redis://default:${{REDIS_PASSWORD}}@${{RAILWAY_PRIVATE_DOMAIN}}:6379 but when referencing this value on a second service, the evaluated value has redis's RAILWAY_TCP_PROXY_DOMAIN used in place of RAILWAY_PRIVATE_DOMAIN. This defies my expectations or I cannot see any ability to override or fix the issue. I presume this is a migration shim designed to help us, but I need to override it. How to fix this?

Solved

14 Replies

sarahkb125
EMPLOYEE

6 months ago

Hi there - I'm seeing the latest deployment as active. Could you please link the failed deployment for us to look into?


Status changed to Awaiting User Response Railway 6 months ago


drmarshall
PROOP

6 months ago


Status changed to Awaiting Railway Response Railway 6 months ago


Hey there DrMarshall,

I have triggered a new deployment that gone live, I am wondering if you can re-pro the DNSException error again. I have a gut feeling on that caused this issue but I want to see if we can re-produce it so it can confirm my suspicion.

Thanks,
Angelo


Status changed to Awaiting User Response Railway 6 months ago


drmarshall
PROOP

6 months ago

Let's focus any repro'ing in our staging env to prevent customer impacting interruptions/outages. Our jobs are long-running, so triggering redeploys impacts our end-user wait times.

Repro'ed here in staging: https://railway.com/project/26efef52-073b-4468-a124-8073e2678b0d/service/44f08ca4-5b7a-4604-8f2e-13288068518f?environmentId=de957125-5be9-4819-b8c0-ca4fd4bc6e43


Status changed to Awaiting Railway Response Railway 6 months ago


drmarshall
PROOP

6 months ago

In production, I can confirm that updating the REDIS_URL to ${{Redis-tUX2.REDIS_PRIVATE_URL}} and previewing the value (no deploy necessary), it is still pulling the "wrong" value (templating in RAILWAY_TCP_PROXY_DOMAIN instead of RAILWAY_PRIVATE_DOMAIN) in the evaluated value


drmarshall

Let's focus any repro'ing in our staging env to prevent customer impacting interruptions/outages. Our jobs are long-running, so triggering redeploys impacts our end-user wait times.Repro'ed here in staging: https://railway.com/project/26efef52-073b-4468-a124-8073e2678b0d/service/44f08ca4-5b7a-4604-8f2e-13288068518f?environmentId=de957125-5be9-4819-b8c0-ca4fd4bc6e43

drmarshall
PROOP

6 months ago

Curiously, in staging it does not appear the template value is wrong, but still seeing the DNSException – trying to append ?family=1 as perhaps the resolution is using ipv6 now?

Attachments


Railway
BOT

6 months ago

Hello!

We're acknowledging your issue and attaching a ticket to this thread.

We don't have an ETA for it, but, our engineering team will take a look and you will be updated as we update the ticket.

Please reply to this thread if you have any questions!


That would likely be it, with that said, I still wanna re-pro on my project to see if I can get the variable to point to the wrong value.

Give me a moment.


Status changed to Awaiting User Response Railway 6 months ago


drmarshall
PROOP

6 months ago

So, I can confirm the DNSException is resolved issue by using ?family=6 in the redis connection string (it seems like there is an ipv6 resolution happening under the hood?), but the mismatched variable certainly is a bug of sorts and made debugging the ipv6 resolution MUCH harder than it needed to be.


Status changed to Awaiting Railway Response Railway 6 months ago


Previewing the value is giving me a `.internal` value, is that the case now after your re-deployed? I am just very confused on how this could happen.

(If you can get a screenshot that would help too)

---

Re: IPv6, noted, yea, I am more concerned about the domain swap if there is one.


Status changed to Awaiting User Response Railway 6 months ago


drmarshall
PROOP

6 months ago

In production, after updating the Redis.REDIS_PRIVATE_URL, it seems like the downstream values have updated to .internal (no more mismatch).


Status changed to Awaiting Railway Response Railway 6 months ago


Status changed to Solved drmarshall 6 months ago


Railway
BOT

6 months ago

✅ The internal ticket Service pulling the incorrect reference variable has been marked as completed.


jr
EMPLOYEE

6 months ago

I think the variable mismatch was a UI/caching issue. I'll continue to look for any related problems, but the connection errors seem to be because of Redis and ipv6. Sorry for all the confusion here!


Status changed to Awaiting Railway Response Railway 6 months ago


Can you confirm that the issue is resolved after appending the value at the end of the URL?


Status changed to Awaiting User Response Railway 6 months ago


Status changed to Solved drmarshall 5 months ago


Loading...