Allow custom MTU for railway0 to fix hanging VPN/Subnet Router connections
qwerty
PROOP

2 months ago

The Problem:
Currently, Railway forces a fixed MTU of 1316 on the railway0 interface. When running a VPN such as Netbird or Tailscale (WireGuard) as a subnet router, the VPN adds its own encapsulation headers (overhead) to every data packet.

Downstream services connected to this subnet router inherit the 1316 MTU by default. However, since this encapsulated traffic cannot always be fragmented in a rootless environment, packets are dropped when they exceed the physical interface limit (Original Payload + WireGuard Headers > 1316). This causes large database queries (e.g., a SELECT with many rows) to freeze or hang indefinitely, while small packets (like pings) pass through without issue.

The Request:
Please allow users to manually configure a lower MTU (such as 1280 or 1200) for the railway0 interface within the service settings. This would provide the VPN tunnel enough "headroom" for its headers without hitting Railway’s network ceiling, ensuring stable and reliable connectivity for private subnet routing.
Under Review

0 Threads mention this feature

6 Replies

2 months ago

We have found this to be an issue only in environments with IPv4 and IPv6 support, and that can be disabled via the RAILWAY_DISABLE_DUAL_STACK_PRIVNETS=1 variable, setting this on the Tailscale service should solve the issues you mentioned.


qwerty
PROOP

2 months ago

Hi, thank you for the response. I tried setting the variable, but I can't verify if it works correctly because the routing table changes depending on the value.

When the variable is set to 1, the route is:

fd12:6e0b:a0f1::/64 via fd12::10 dev railnet0 src fd12:6e0b:a0f1:0:8000:d5:3b7e:b266 metric 1024 pref medium

When the variable is set to 0, the route is:

fd12:6e0b:a0f1:1::/64 via fd12::10 dev railnet0 src fd12:6e0b:a0f1:1:8000:d5:1691:7765 metric 1024 pref medium

The issue is that the target IP I’m trying to reach belongs to the fd12:6e0b:a0f1:1::/64 subnet. Therefore, when the variable is 1, I get a ping: connect: Network unreachable error because the system is looking for it in the :0: subnet instead of the :1: subnet.

Is this expected behavior for this variable?


2 months ago

Could you try setting that variable to 1 on all the services that you need to access?


qwerty
PROOP

2 months ago

Yes, setting the subnet router and the target services to RAILWAY_DISABLE_DUAL_STACK_PRIVNETS=1 works as a workaround.

However, this raises a couple of concerns:

  1. Inter-service Communication: Does this mean that to access services like a DB, we must force every service in the project to IPv6-only? Currently, services using dual-stack are on fd12:xxxx:xxxx:1::/64, while IPv6-only services are on fd12:xxxx:xxxx:0::/64, and there are no routes enabling communication between these two subnets.
  2. Side Effects: What specific considerations or limitations should we expect when disabling dual-stack? Are there any Railway features (like certain managed plugins or public edge networking) that might be impacted by moving away from the default dual-stack configuration?

2 months ago

  1. Subnet shouldn't matter, since you would want to be doing DNS lookups.
  2. Nope.

qwerty
PROOP

2 months ago

It seems that Dual Stack services and IPv6-only services are isolated from each other. In my testing, I found that they cannot communicate because they are placed on different subnets (:1::/64 vs :0::/64) without any internal routing between them. While names are resolved correctly using fd12::10, the issue lies in the communication with the resolved IPv6 address, not the DNS resolution itself.

This creates a significant blocker: if we need to enable a VPN/Subnet Router using the RAILWAY_DISABLE_DUAL_STACK_PRIVNETS=1 , we are forced to migrate every single service in our project to IPv6-only to maintain connectivity (I have attached several captures demonstrating this behavior).

Since you mentioned that using IPv6-only in all our services shouldn't break anything, we might adopt this as our current approach. However, this limitation should be carefully considered when managing Private Network infrastructure, as it effectively creates networking silos. Managing this at the individual service level is prone to error and can lead to complex networking failures that are difficult to debug, as connectivity now depends on a per-service environment variable rather than consistent infrastructure.

I initially thought the issue was strictly the interface MTU, but seeing that switching to IPv6-only resolves the hanging connections makes me wonder about the actual root cause. Could you clarify why this change fixes the connectivity issues? Also, are there plans for a permanent fix that doesn't require us to isolate our services into different network stacks?


Welcome!

Sign in to your Railway account to join the conversation.

Loading...