Node.js native fetch not working - connections timeout while curl works fine
zajoncm
PROOP

4 months ago

Hi Railway Support,

We're experiencing a critical issue where Node.js native fetch() (Undici) cannot make any outbound HTTP/HTTPS requests, while curl from the same container works perfectly fine.

Timeline

  • Yesterday and this morning: Everything worked correctly

  • Now: All Node.js fetch requests timeout

  • No code changes: We reverted to yesterday's working commit - still broken

Diagnostic Results

We created a diagnostic endpoint that tests multiple HTTP methods:

MethodResultLatencycurl (system)white_check_mark emoji Works129ms

Node.js http modulewhite_check_mark emoji Works8828ms

Node.js https module (IPv4)white_check_mark emoji Works26ms

Node.js https module (IPv6)white_check_mark emoji Works26ms

Node.js native fetch() (Undici)x emojiTimeout10000ms+

Key Findings

DNS resolution works - Both IPv4 and IPv6 addresses resolve correctly

CA certificates exist - /etc/ssl/certs/ca-certificates.crt is present

No proxy variables set - HTTP_PROXY, HTTPS_PROXY etc. are not configured

curl works - System-level HTTP works fine

Node.js https module works - Including with forced IPv4 (family: 4)

Only Undici/fetch fails - Native fetch() times out on every request

What We've Tried

white_check_mark emoji Setting dns.setDefaultResultOrder('ipv4first') - didn't help

white_check_mark emoji Configuring Undici global dispatcher with family: 4 - didn't help

white_check_mark emoji Reverting to previous working commit - didn't help

white_check_mark emoji Testing Node 20.18.0 vs 22.14.0 - both have the same issue

Raw Diagnostic Output

{

  "nodeVersion": "v22.14.0",

  "dnsOrder": "ipv4first",

  "dnsLookup": {

    "ipv4": "142.251.39.132",

    "ipv6": "2a00:1450:400e:804::2004"

  },

  "caCertCheck": {

    "exists": true,

    "path": "/etc/ssl/certs/ca-certificates.crt"

  },

  "tests": {

    "curl": { "success": true, "latencyMs": 129, "statusCode": 200 },

    "httpPlain": { "success": true, "latencyMs": 8828, "statusCode": 200 },

    "httpsWithVerify": { "success": true, "latencyMs": 26, "statusCode": 200 },

    "httpsNoVerify": { "success": true, "latencyMs": 26, "statusCode": 200 },

    "nativeFetch": { "success": false, "latencyMs": 10006, "error": "This operation was aborted" }

  },

  "diagnosis": "UNDICI ISSUE: Native fetch fails but https module works. Undici-specific problem."

}

Impact

This completely breaks our application's ability to communicate with external APIs (Algolia, Stripe, etc.) that use fetch() or libraries built on it.

Were there any infrastructure or networking changes on Railway today that could affect Undici/fetch specifically?

Is there a known issue or workaround for Undici not working in Railway containers?

Can you check if there are any network policies or configurations affecting our container?

Thank you for your help!

$10 Bounty

12 Replies


4 months ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open noahd 4 months ago


zajoncm
PROOP

4 months ago

UPDATE: I've change the region and set NODE_OPTIONS=--dns-result-order=ipv4first but it didn't solve the problem, also I see that sometimes node fetch works correctly with different latency result (Tested by requesting 4 different services, sometimes receive all succeeded requests, on the second attempt all are failed), there is an example below:

{
    "nodeVersion": "v20.18.0",
    "timestamp": "2025-11-26T12:37:27.979Z",
    "results": [
        {
            "name": "JSONPlaceholder",
            "url": "https://jsonplaceholder.typicode.com/posts/1",
            "success": false,
            "error": "This operation was aborted",
            "latencyMs": 10002
        },
        {
            "name": "HTTPBin",
            "url": "https://httpbin.org/get",
            "success": false,
            "error": "This operation was aborted",
            "latencyMs": 10001
        },
        {
            "name": "Google",
            "url": "https://www.google.com",
            "success": false,
            "error": "This operation was aborted",
            "latencyMs": 10001
        },
        {
            "name": "Algolia",
            "url": "https://<my_algolia_id>-dsn.algolia.net",
            "success": true,
            "status": 404,
            "latencyMs": 929
        }
    ],
    "summary": {
        "total": 4,
        "success": 1,
        "failed": 3
    }
}

bytekeim
PRO

4 months ago

hy


bytekeim
PRO

4 months ago

Thanks for the additional output — the intermittent behavior (curl/https works, but fetch fails after ~10s with “This operation was aborted”) aligns perfectly with what I’d expect from Undici hitting its connect timeout, combined with transient network/egress variability or socket/pool exhaustion.

Quick Fix

1. Set a global dispatcher to increase the connect timeout

npm i undici
// at the very top of your app (before any imports that call fetch)
import { setGlobalDispatcher, Agent } from 'undici';

// raise connect timeout to 20s (20000 ms)
setGlobalDispatcher(new Agent({ connectTimeout: 20000 }));

2. If a global dispatcher isn’t possible, use a per-call dispatcher:

import { Agent, fetch } from 'undici';
const agent = new Agent({ connectTimeout: 30000 });

const res = await fetch(url, { dispatcher: agent });

These approaches give you explicit control over connect timeouts and socket pooling instead of relying on Undici’s defaults.


bytekeim
PRO

4 months ago

If the quick fix doesn’t immediately resolve the issue, run this test script repeatedly to compare Node https vs fetch:

// test-fetch.js
import https from 'node:https';
import { fetch } from 'node:fetch'; // or import fetch from 'undici' if installed
import { performance } from 'node:perf_hooks';

async function testFetch(url){
  const t0 = performance.now();
  try {
    const r = await fetch(url);
    console.log('fetch OK', url, r.status, Math.round(performance.now()-t0));
  } catch (e) {
    console.log('fetch ERR', url, e?.code, Math.round(performance.now()-t0));
    console.log('  name:', e?.name, 'message:', e?.message);
  }
}

function testHttps(url){
  const t0 = performance.now();
  https.get(url, res => {
    console.log('https OK', url, res.statusCode, Math.round(performance.now()-t0));
    res.resume();
  }).on('error', e => {
    console.log('https ERR', url, e?.code, Math.round(performance.now()-t0));
  });
}

(async () => {
  const urls = [
    'https://jsonplaceholder.typicode.com/posts/1',
    'https://httpbin.org/get',
    'https://www.google.com'
  ];
  for (const u of urls) {
    testHttps(u);
    await testFetch(u);
  }
})();

bytekeim
PRO

4 months ago

Other Helpful Checks / Mitigations

  • You tried NODE_OPTIONS=--dns-result-order=ipv4first. Also consider reducing Node’s IPv6→IPv4 fallback wait:

import { setDefaultAutoSelectFamilyAttemptTimeout } from 'node:net';
setDefaultAutoSelectFamilyAttemptTimeout(100); // ms
  • If running high concurrency, limit concurrency or tune the Agent pool (connections, pipelining, etc.). Pool/ephemeral-port exhaustion under load can cause intermittent failures.

  • Short-term fallback: use Node https or axios/got for critical paths until the Undici Agent changes are deployed.


bytekeim
PRO

4 months ago

I hope setGlobalDispatcher(new Agent(...)) resolves the issue. If it still fails intermittently, paste the test-fetch.js output and exact error stacks — I’ll prepare an escalation for Railway (conntrack/NAT/egress checks + connection limit evidence).


zajoncm
PROOP

4 months ago

Unfortunately setGlobalDispatcher(new Agent(...)) didn't help, I'm sending the output from the test script:

fetch ERR https://jsonplaceholder.typicode.com/posts/1 undefined 20469

name: TypeError message: fetch failed

fetch ERR https://httpbin.org/get undefined 20470

name: TypeError message: fetch failed

https OK https://jsonplaceholder.typicode.com/posts/1 200 46791

fetch OK https://www.google.com 200 15939

https OK https://www.google.com 200 15964

https OK https://httpbin.org/get 200 39074

Also yesterday I have configured the app with another cloud provider and have no problems


bytekeim
PRO

4 months ago

yo quick update , I think I found a clean fix for those random fetch() timeouts on Railway.
just drop this at the VERY top of your main file (before anything else) and restart:

import { setGlobalDispatcher, Agent } from 'undici';

setGlobalDispatcher(
  new Agent({
    connect: { timeout: 30000, family: 4 }, // 30s + force IPv4
    connections: 5,
    pipelining: 1
  })
);

basically this forces IPv4 (avoids the weird IPv6/happy eyeballs crap), gives fetch a longer connect window, and keeps the pool tiny so Railway doesn’t choke on ports.

after u add it:

-restart the app

-run your test script a few times

send me to knew if it works


zajoncm
PROOP

4 months ago

I'm sending the results from several attempts. I also added a per-call dispatcher to make sure the configuration you provided is correct. However, it still doesn't look good. Simple requests can take 20-30 seconds. The discrepancy between attempts is also strange - they can return a response in 1-2 seconds on one attempt, but then fail on the next.

Results:

2025-11-27T19:12:10.090893343Z [inf] https OK https://jsonplaceholder.typicode.com/posts/1 200 3078

2025-11-27T19:12:10.111517533Z [inf] fetch OK https://jsonplaceholder.typicode.com/posts/1 200 3093

2025-11-27T19:12:30.109810530Z [inf] https OK https://httpbin.org/get 200 19609

2025-11-27T19:12:30.109824607Z [inf] fetch OK https://httpbin.org/get 200 19761

2025-11-27T19:12:30.109836202Z [inf] fetch OK https://www.google.com 200 87

2025-11-27T19:12:30.109860799Z [inf] https OK https://www.google.com 200 100

2025-11-27T19:13:06.201954442Z [inf] https OK https://jsonplaceholder.typicode.com/posts/1 200 10449

2025-11-27T19:13:06.201967238Z [inf] fetch OK https://jsonplaceholder.typicode.com/posts/1 200 10450

2025-11-27T19:13:26.198842525Z [inf] https OK https://httpbin.org/get 200 10394

2025-11-27T19:13:26.198850723Z [inf] fetch OK https://httpbin.org/get 200 10440

2025-11-27T19:13:26.198856383Z [inf] fetch OK https://www.google.com 200 63

2025-11-27T19:13:26.198868695Z [inf] https OK https://www.google.com 200 150

2025-11-27T19:14:06.209562919Z [inf] https OK https://jsonplaceholder.typicode.com/posts/1 200 19

2025-11-27T19:14:06.209572037Z [inf] fetch OK https://jsonplaceholder.typicode.com/posts/1 200 25

2025-11-27T19:14:06.209577749Z [inf] https OK https://httpbin.org/get 200 269

2025-11-27T19:14:06.209583734Z [inf] fetch OK https://httpbin.org/get 200 377

2025-11-27T19:14:06.209589075Z [inf] fetch OK https://www.google.com 200 63

2025-11-27T19:14:06.209602061Z [inf] https OK https://www.google.com 200 72

2025-11-27T19:14:35.554277093Z [inf] fetch OK https://jsonplaceholder.typicode.com/posts/1 200 25087

2025-11-27T19:14:35.555945539Z [inf] https OK https://jsonplaceholder.typicode.com/posts/1 200 25093

2025-11-27T19:15:15.558893256Z [inf] fetch ERR https://httpbin.org/get undefined 30322

2025-11-27T19:15:18.441897508Z [inf] https OK https://httpbin.org/get 200 42837

2025-11-27T19:15:38.062463026Z [inf] https OK https://www.google.com 200 24510

2025-11-27T19:15:38.062472092Z [inf] fetch OK https://www.google.com 200 24514

2025-11-27T19:16:08.068400472Z [inf] https OK https://www.google.com 200 57

2025-11-27T19:16:08.068415711Z [inf] https OK https://httpbin.org/get 200 1473

2025-11-27T19:16:08.068420071Z [inf] https OK https://jsonplaceholder.typicode.com/posts/1 200 9

2025-11-27T19:16:08.068450046Z [inf] https OK https://jsonplaceholder.typicode.com/posts/1 200 12154

2025-11-27T19:16:08.068458545Z [inf] fetch OK https://jsonplaceholder.typicode.com/posts/1 200 12155

2025-11-27T19:16:08.068464294Z [inf] https OK https://httpbin.org/get 200 215

2025-11-27T19:16:08.068470086Z [inf] fetch OK https://httpbin.org/get 200 288

2025-11-27T19:16:08.068476188Z [inf] fetch OK https://www.google.com 200 65

2025-11-27T19:16:08.068488718Z [inf] https OK https://www.google.com 200 80

2025-11-27T19:16:08.068494431Z [inf] fetch OK https://jsonplaceholder.typicode.com/posts/1 200 12

2025-11-27T19:16:08.068500034Z [inf] https OK https://jsonplaceholder.typicode.com/posts/1 200 16

2025-11-27T19:16:08.068529883Z [inf] fetch OK https://httpbin.org/get 200 889

2025-11-27T19:16:08.068535525Z [inf] fetch OK https://www.google.com 200 47

2025-11-27T19:16:48.071042463Z [inf] fetch OK https://jsonplaceholder.typicode.com/posts/1 200 30324

2025-11-27T19:16:48.868952993Z [inf] fetch OK https://httpbin.org/get 200 10292

2025-11-27T19:16:48.868962940Z [inf] fetch OK https://www.google.com 200 67

2025-11-27T19:16:48.868978014Z [inf] https OK https://www.google.com 200 92

2025-11-27T19:16:49.298708533Z [inf] https OK https://httpbin.org/get 200 11061


bytekeim
PRO

4 months ago

so your logs are kinda wild - sometimes stuff works in like 50ms, sometimes it takes 30 seconds, sometimes it just dies. that's not normal at all. the fact that curl works fine but undici is freaking out tells me its something with how undici manages connections vs how the system-level stuff does it.

basically undici tries to be smart and reuse connections, keep pools open, do pipelining and all that. but when you're running in containerized environments, sometimes that gets messy with how the network layer handles connection tracking. its like undici is trying to juggle too many balls at once and dropping them

try this first

put this at the very top of your main file, like before literally anything else:

import { setGlobalDispatcher, Agent } from 'undici';

setGlobalDispatcher(
  new Agent({
    connect: { 
      timeout: 60000, 
      family: 4       
    },
    
    
    connections: 1,           // only 1 connection per host at a time
    pipelining: 0,           // turn off pipelining compltely
    keepAliveTimeout: 1000,  // dont keep connections alive so long
    keepAliveMaxTimeout: 5000,
    
    bodyTimeout: 60000,
    headersTimeout: 60000
  })
);

console.log('undici configured');

the key thing here is connections: 1 and pipelining: 0. basically we're telling undici to chill out and stop trying to do fancy connection reuse stuff. yeah its slower in theory but if the alternative is timing out then who cares right

lmk what happens


zajoncm
PROOP

4 months ago

Unfortunately, it doesn't make any difference. It seems the problem lies elsewhere, especially since I've had one application untouched for almost two weeks, and nothing happened until this problem a few days ago (the problem appeared on three services at the same time). I'd appreciate it if the Railway team could diagnose this, as it doesn't seem to be a problem directly in the code, especially since the application's functionality is now very urgent.


Loading...