4 months ago
Hi Railway Support,
We're experiencing a critical issue where Node.js native fetch() (Undici) cannot make any outbound HTTP/HTTPS requests, while curl from the same container works perfectly fine.
Timeline
Yesterday and this morning: Everything worked correctly
Now: All Node.js fetch requests timeout
No code changes: We reverted to yesterday's working commit - still broken
Diagnostic Results
We created a diagnostic endpoint that tests multiple HTTP methods:
MethodResultLatencycurl (system)
Works129ms
Node.js http module
Works8828ms
Node.js https module (IPv4)
Works26ms
Node.js https module (IPv6)
Works26ms
Node.js native fetch() (Undici)
Timeout10000ms+
Key Findings
DNS resolution works - Both IPv4 and IPv6 addresses resolve correctly
CA certificates exist - /etc/ssl/certs/ca-certificates.crt is present
No proxy variables set - HTTP_PROXY, HTTPS_PROXY etc. are not configured
curl works - System-level HTTP works fine
Node.js https module works - Including with forced IPv4 (family: 4)
Only Undici/fetch fails - Native fetch() times out on every request
What We've Tried
Setting dns.setDefaultResultOrder('ipv4first') - didn't help
Configuring Undici global dispatcher with family: 4 - didn't help
Reverting to previous working commit - didn't help
Testing Node 20.18.0 vs 22.14.0 - both have the same issue
Raw Diagnostic Output
{
"nodeVersion": "v22.14.0",
"dnsOrder": "ipv4first",
"dnsLookup": {
"ipv4": "142.251.39.132",
"ipv6": "2a00:1450:400e:804::2004"
},
"caCertCheck": {
"exists": true,
"path": "/etc/ssl/certs/ca-certificates.crt"
},
"tests": {
"curl": { "success": true, "latencyMs": 129, "statusCode": 200 },
"httpPlain": { "success": true, "latencyMs": 8828, "statusCode": 200 },
"httpsWithVerify": { "success": true, "latencyMs": 26, "statusCode": 200 },
"httpsNoVerify": { "success": true, "latencyMs": 26, "statusCode": 200 },
"nativeFetch": { "success": false, "latencyMs": 10006, "error": "This operation was aborted" }
},
"diagnosis": "UNDICI ISSUE: Native fetch fails but https module works. Undici-specific problem."
}
Impact
This completely breaks our application's ability to communicate with external APIs (Algolia, Stripe, etc.) that use fetch() or libraries built on it.
Were there any infrastructure or networking changes on Railway today that could affect Undici/fetch specifically?
Is there a known issue or workaround for Undici not working in Railway containers?
Can you check if there are any network policies or configurations affecting our container?
Thank you for your help!
12 Replies
4 months ago
Hey there! We've found the following might help you get unblocked faster:
🧵 ETIMEDOUT connecting to [graph.facebook.com](graph.facebook.com) from Asia Southeast region
🧵 Persistent npm install failure due to 504 Gateway Timeouts on builde
If you find the answer from one of these, please let us know by solving the thread!
4 months ago
This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.
Status changed to Open noahd • 4 months ago
4 months ago
UPDATE: I've change the region and set NODE_OPTIONS=--dns-result-order=ipv4first but it didn't solve the problem, also I see that sometimes node fetch works correctly with different latency result (Tested by requesting 4 different services, sometimes receive all succeeded requests, on the second attempt all are failed), there is an example below:
{
"nodeVersion": "v20.18.0",
"timestamp": "2025-11-26T12:37:27.979Z",
"results": [
{
"name": "JSONPlaceholder",
"url": "https://jsonplaceholder.typicode.com/posts/1",
"success": false,
"error": "This operation was aborted",
"latencyMs": 10002
},
{
"name": "HTTPBin",
"url": "https://httpbin.org/get",
"success": false,
"error": "This operation was aborted",
"latencyMs": 10001
},
{
"name": "Google",
"url": "https://www.google.com",
"success": false,
"error": "This operation was aborted",
"latencyMs": 10001
},
{
"name": "Algolia",
"url": "https://<my_algolia_id>-dsn.algolia.net",
"success": true,
"status": 404,
"latencyMs": 929
}
],
"summary": {
"total": 4,
"success": 1,
"failed": 3
}
}4 months ago
hy
4 months ago
Thanks for the additional output — the intermittent behavior (curl/https works, but fetch fails after ~10s with “This operation was aborted”) aligns perfectly with what I’d expect from Undici hitting its connect timeout, combined with transient network/egress variability or socket/pool exhaustion.
Quick Fix
1. Set a global dispatcher to increase the connect timeout
npm i undici
// at the very top of your app (before any imports that call fetch)
import { setGlobalDispatcher, Agent } from 'undici';
// raise connect timeout to 20s (20000 ms)
setGlobalDispatcher(new Agent({ connectTimeout: 20000 }));
2. If a global dispatcher isn’t possible, use a per-call dispatcher:
import { Agent, fetch } from 'undici';
const agent = new Agent({ connectTimeout: 30000 });
const res = await fetch(url, { dispatcher: agent });
These approaches give you explicit control over connect timeouts and socket pooling instead of relying on Undici’s defaults.
4 months ago
If the quick fix doesn’t immediately resolve the issue, run this test script repeatedly to compare Node https vs fetch:
// test-fetch.js
import https from 'node:https';
import { fetch } from 'node:fetch'; // or import fetch from 'undici' if installed
import { performance } from 'node:perf_hooks';
async function testFetch(url){
const t0 = performance.now();
try {
const r = await fetch(url);
console.log('fetch OK', url, r.status, Math.round(performance.now()-t0));
} catch (e) {
console.log('fetch ERR', url, e?.code, Math.round(performance.now()-t0));
console.log(' name:', e?.name, 'message:', e?.message);
}
}
function testHttps(url){
const t0 = performance.now();
https.get(url, res => {
console.log('https OK', url, res.statusCode, Math.round(performance.now()-t0));
res.resume();
}).on('error', e => {
console.log('https ERR', url, e?.code, Math.round(performance.now()-t0));
});
}
(async () => {
const urls = [
'https://jsonplaceholder.typicode.com/posts/1',
'https://httpbin.org/get',
'https://www.google.com'
];
for (const u of urls) {
testHttps(u);
await testFetch(u);
}
})();
4 months ago
Other Helpful Checks / Mitigations
You tried
NODE_OPTIONS=--dns-result-order=ipv4first. Also consider reducing Node’s IPv6→IPv4 fallback wait:
import { setDefaultAutoSelectFamilyAttemptTimeout } from 'node:net';
setDefaultAutoSelectFamilyAttemptTimeout(100); // ms
If running high concurrency, limit concurrency or tune the Agent pool (connections, pipelining, etc.). Pool/ephemeral-port exhaustion under load can cause intermittent failures.
Short-term fallback: use Node
httpsoraxios/gotfor critical paths until the Undici Agent changes are deployed.
4 months ago
I hope setGlobalDispatcher(new Agent(...)) resolves the issue. If it still fails intermittently, paste the test-fetch.js output and exact error stacks — I’ll prepare an escalation for Railway (conntrack/NAT/egress checks + connection limit evidence).
4 months ago
Unfortunately setGlobalDispatcher(new Agent(...)) didn't help, I'm sending the output from the test script:
fetch ERR https://jsonplaceholder.typicode.com/posts/1 undefined 20469
name: TypeError message: fetch failed
fetch ERR https://httpbin.org/get undefined 20470
name: TypeError message: fetch failed
https OK https://jsonplaceholder.typicode.com/posts/1 200 46791
fetch OK https://www.google.com 200 15939
https OK https://www.google.com 200 15964
https OK https://httpbin.org/get 200 39074
Also yesterday I have configured the app with another cloud provider and have no problems
4 months ago
yo quick update , I think I found a clean fix for those random fetch() timeouts on Railway.
just drop this at the VERY top of your main file (before anything else) and restart:
import { setGlobalDispatcher, Agent } from 'undici';
setGlobalDispatcher(
new Agent({
connect: { timeout: 30000, family: 4 }, // 30s + force IPv4
connections: 5,
pipelining: 1
})
);
basically this forces IPv4 (avoids the weird IPv6/happy eyeballs crap), gives fetch a longer connect window, and keeps the pool tiny so Railway doesn’t choke on ports.
after u add it:
-restart the app
-run your test script a few times
send me to knew if it works
4 months ago
I'm sending the results from several attempts. I also added a per-call dispatcher to make sure the configuration you provided is correct. However, it still doesn't look good. Simple requests can take 20-30 seconds. The discrepancy between attempts is also strange - they can return a response in 1-2 seconds on one attempt, but then fail on the next.
Results:
2025-11-27T19:12:10.090893343Z [inf] https OK https://jsonplaceholder.typicode.com/posts/1 200 3078
2025-11-27T19:12:10.111517533Z [inf] fetch OK https://jsonplaceholder.typicode.com/posts/1 200 3093
2025-11-27T19:12:30.109810530Z [inf] https OK https://httpbin.org/get 200 19609
2025-11-27T19:12:30.109824607Z [inf] fetch OK https://httpbin.org/get 200 19761
2025-11-27T19:12:30.109836202Z [inf] fetch OK https://www.google.com 200 87
2025-11-27T19:12:30.109860799Z [inf] https OK https://www.google.com 200 100
2025-11-27T19:13:06.201954442Z [inf] https OK https://jsonplaceholder.typicode.com/posts/1 200 10449
2025-11-27T19:13:06.201967238Z [inf] fetch OK https://jsonplaceholder.typicode.com/posts/1 200 10450
2025-11-27T19:13:26.198842525Z [inf] https OK https://httpbin.org/get 200 10394
2025-11-27T19:13:26.198850723Z [inf] fetch OK https://httpbin.org/get 200 10440
2025-11-27T19:13:26.198856383Z [inf] fetch OK https://www.google.com 200 63
2025-11-27T19:13:26.198868695Z [inf] https OK https://www.google.com 200 150
2025-11-27T19:14:06.209562919Z [inf] https OK https://jsonplaceholder.typicode.com/posts/1 200 19
2025-11-27T19:14:06.209572037Z [inf] fetch OK https://jsonplaceholder.typicode.com/posts/1 200 25
2025-11-27T19:14:06.209577749Z [inf] https OK https://httpbin.org/get 200 269
2025-11-27T19:14:06.209583734Z [inf] fetch OK https://httpbin.org/get 200 377
2025-11-27T19:14:06.209589075Z [inf] fetch OK https://www.google.com 200 63
2025-11-27T19:14:06.209602061Z [inf] https OK https://www.google.com 200 72
2025-11-27T19:14:35.554277093Z [inf] fetch OK https://jsonplaceholder.typicode.com/posts/1 200 25087
2025-11-27T19:14:35.555945539Z [inf] https OK https://jsonplaceholder.typicode.com/posts/1 200 25093
2025-11-27T19:15:15.558893256Z [inf] fetch ERR https://httpbin.org/get undefined 30322
2025-11-27T19:15:18.441897508Z [inf] https OK https://httpbin.org/get 200 42837
2025-11-27T19:15:38.062463026Z [inf] https OK https://www.google.com 200 24510
2025-11-27T19:15:38.062472092Z [inf] fetch OK https://www.google.com 200 24514
2025-11-27T19:16:08.068400472Z [inf] https OK https://www.google.com 200 57
2025-11-27T19:16:08.068415711Z [inf] https OK https://httpbin.org/get 200 1473
2025-11-27T19:16:08.068420071Z [inf] https OK https://jsonplaceholder.typicode.com/posts/1 200 9
2025-11-27T19:16:08.068450046Z [inf] https OK https://jsonplaceholder.typicode.com/posts/1 200 12154
2025-11-27T19:16:08.068458545Z [inf] fetch OK https://jsonplaceholder.typicode.com/posts/1 200 12155
2025-11-27T19:16:08.068464294Z [inf] https OK https://httpbin.org/get 200 215
2025-11-27T19:16:08.068470086Z [inf] fetch OK https://httpbin.org/get 200 288
2025-11-27T19:16:08.068476188Z [inf] fetch OK https://www.google.com 200 65
2025-11-27T19:16:08.068488718Z [inf] https OK https://www.google.com 200 80
2025-11-27T19:16:08.068494431Z [inf] fetch OK https://jsonplaceholder.typicode.com/posts/1 200 12
2025-11-27T19:16:08.068500034Z [inf] https OK https://jsonplaceholder.typicode.com/posts/1 200 16
2025-11-27T19:16:08.068529883Z [inf] fetch OK https://httpbin.org/get 200 889
2025-11-27T19:16:08.068535525Z [inf] fetch OK https://www.google.com 200 47
2025-11-27T19:16:48.071042463Z [inf] fetch OK https://jsonplaceholder.typicode.com/posts/1 200 30324
2025-11-27T19:16:48.868952993Z [inf] fetch OK https://httpbin.org/get 200 10292
2025-11-27T19:16:48.868962940Z [inf] fetch OK https://www.google.com 200 67
2025-11-27T19:16:48.868978014Z [inf] https OK https://www.google.com 200 92
2025-11-27T19:16:49.298708533Z [inf] https OK https://httpbin.org/get 200 11061
4 months ago
so your logs are kinda wild - sometimes stuff works in like 50ms, sometimes it takes 30 seconds, sometimes it just dies. that's not normal at all. the fact that curl works fine but undici is freaking out tells me its something with how undici manages connections vs how the system-level stuff does it.
basically undici tries to be smart and reuse connections, keep pools open, do pipelining and all that. but when you're running in containerized environments, sometimes that gets messy with how the network layer handles connection tracking. its like undici is trying to juggle too many balls at once and dropping them
try this first
put this at the very top of your main file, like before literally anything else:
import { setGlobalDispatcher, Agent } from 'undici';
setGlobalDispatcher(
new Agent({
connect: {
timeout: 60000,
family: 4
},
connections: 1, // only 1 connection per host at a time
pipelining: 0, // turn off pipelining compltely
keepAliveTimeout: 1000, // dont keep connections alive so long
keepAliveMaxTimeout: 5000,
bodyTimeout: 60000,
headersTimeout: 60000
})
);
console.log('undici configured');the key thing here is connections: 1 and pipelining: 0. basically we're telling undici to chill out and stop trying to do fancy connection reuse stuff. yeah its slower in theory but if the alternative is timing out then who cares right
lmk what happens
4 months ago
Unfortunately, it doesn't make any difference. It seems the problem lies elsewhere, especially since I've had one application untouched for almost two weeks, and nothing happened until this problem a few days ago (the problem appeared on three services at the same time). I'd appreciate it if the Railway team could diagnose this, as it doesn't seem to be a problem directly in the code, especially since the application's functionality is now very urgent.