ETIMEDOUT connecting to graph.facebook.com from Asia Southeast region

tonytej

PROOP

5 months ago

Issue Description

Our Node.js application deployed on Railway (Asia Southeast region: asia-southeast1-eqsg3a) is experiencing constant ETIMEDOUT errors when attempting to connect to Meta's WhatsApp Cloud API at graph.facebook.com. The same code works perfectly in local development and was working in production until recently.

Impact

Severity: High - Core application functionality broken
Frequency: Constant failures (100% error rate currently)
Affected Service: Backend API service
Region: railway/asia-southeast1-eqsg3a (visible in response headers: X-Railway-Edge: railway/asia-southeast1-eqsg3a)

Environment Details

Platform: Railway
Region: Asia Southeast 1
Runtime: Node.js 22.21.0
Deployment: Successful build, fails at runtime
Target Host: graph.facebook.com (Meta/Facebook Graph API)
Target URL: https://graph.facebook.com/v21.0/{business-account-id}/message_templates

Error Details

Error Code: ETIMEDOUT

Error Type: TypeError: fetch failed with AggregateError cause

Description: TCP connection timeout when attempting to reach Meta's API servers

Complete Error Log

{
  "timestamp": "2025-10-25 14:27:06",
  "error": "Error fetching templates fetch failed",
  "name": "TypeError",
  "cause": {
    "code": "ETIMEDOUT"
  },
  "causeName": "AggregateError",
  "causeCode": "ETIMEDOUT",
  "url": "https://graph.facebook.com/v21.0/123123123/message_templates",
  "businessAccountId": "123123123",
  "hasAccessToken": true,
  "stack": "TypeError: fetch failed\n    at node:internal/deps/undici/undici:14900:13\n    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)\n    at async Object.listTemplates (file:///app/server/dist/lib/message-provider.js:157:32)\n    at async MessageService.listTemplates (file:///app/server/dist/services/message.service.js:508:27)\n    at async file:///app/server/dist/routes/messages.js:146:27"
}

Deployment Logs (Successful Build)

2025-10-25T10:35:47.349825076Z [inf]  
> @event-management/server@1.0.0 build /app/server
> tsc --build --force && tsc-alias

2025-10-25T10:35:54.396484170Z [inf]  copy /app/node_modules
2025-10-25T10:35:59.291071711Z [inf]  [92mBuild time: 134.90 seconds[0m

Build completes successfully. The application starts without errors.

Runtime Logs (Connection Failures)

2025-10-25 14:06:18:618 [error]: Error occurred:
{
  "error": "fetch failed",
  "stack": "Error: fetch failed\n    at Object.listTemplates (file:///app/server/dist/lib/message-provider.js:232:23)\n    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)\n    at async MessageService.listTemplates (file:///app/server/dist/services/message.service.js:508:27)\n    at async file:///app/server/dist/routes/messages.js:146:27",
  "path": "/messages/templates",
  "method": "GET",
  "user": "1321321323"
}

2025-10-25 14:27:06:276 [error]: Error fetching templates fetch failed
{
  "name": "TypeError",
  "cause": {
    "code": "ETIMEDOUT"
  },
  "causeName": "AggregateError",
  "causeCode": "ETIMEDOUT",
  "url": "https://graph.facebook.com/v21.0/123123123/message_templates",
  "businessAccountId": "123123123",
  "hasAccessToken": true
}

What We've Ruled Out

Environment variables: Confirmed all credentials are correctly configured

Code issues: Same code works in local development (returns 200 OK)

Authentication: hasAccessToken: true - credentials are present

Rate limiting: No 429 errors, well within Meta's API limits

Application timeout: Using 30-second AbortController timeout, but TCP connection times out before that

Network Connectivity Test Results

Local Development (macOS):
- fetch('https://graph.facebook.com/...') → 200 OK
- Response time: ~300ms
- Works consistently
Railway Production (asia-southeast1):
- fetch('https://graph.facebook.com/...') → ETIMEDOUT
- Connection times out before establishing TCP connection
- Fails consistently (was intermittent, now constant)

Timeline

Previously: Application worked correctly in production
Recently: Started experiencing intermittent failures
Currently: 100% failure rate - all requests timeout

Expected Behavior

Railway containers should be able to establish TCP connections to graph.facebook.com (Meta's public API) within reasonable timeout periods (~5-10 seconds).

Request for Support

Could you please investigate:

Network routing from asia-southeast1 region to Meta's API servers
DNS resolution for graph.facebook.com from Railway containers
Firewall rules that might be blocking connections to Meta/Facebook IP ranges
Regional network issues specific to asia-southeast1-eqsg3a

Could this be a Railway infrastructure/network issue as the same application code works in other environments?

$10 Bounty

7 Replies

Railway

BOT

5 months ago

Hey there! We've found the following might help you get unblocked faster:

If you find the answer from one of these, please let us know by solving the thread!

yeeet

PRO

5 months ago

hey, looking at these logs, this is probably a railway network issue if im not mistaken (someone else should look at this tho). your logs show its timing out (ETIMEDOUT) before TCP handshake, not DNS or app errors, so the packets arent actually reaching Meta's IPs.

one potential code solution you can do is in node 22+ start your app with NODE_OPTIONS=--dns-result-order=ipv4first to see if thatll fix it by forcing ipv4, you can also use Undici's Agent for HTTP/1.1 client

fra

HOBBYTop 10% Contributor

5 months ago

can you try accessing via ssh to the container and to do a curl to graph.facebook.com? in theory is you can't reach the endpoint from the container it should mean the issue is on the railway network (I suppose)

also, can you double check your IP is whitelisted in the fb api?

tonytej

PROOP

4 months ago

I'm still experiencing this issue intermittently. I can confirm that the same application (same repo, same branch) deployed on other platforms are not running into this issue. I've added the IP address to the Meta App IP whitelist but it would still fail.

passos

MODERATOR

4 months ago

I've seen several reports online regarding specific IP address issues. I would assume that Facebook implements some type of undocumented rate limit per IP. Are you able to switch your application's regions to another location to confirm this issue?

tonytej

PROOP

4 months ago

I have switched the applications region to another location and have been operating the application / making the above Graph API requests for about an hour but have not seen the 500 error. I changed it back to the Singapore region and I would get the 500 errors again sometimes. However, curling to graph.facebook.com in railway shell always succeeds. I am now trying to set `NODE_OPTIONS=--dns-result-order=ipv4first` and see if it helps.

tonytej

passos

MODERATOR

4 months ago

Remember that the railway shell command runs locally on your computer and not on Railway. Facebook is implementing rate limits by IP addresses, and unfortunately, Railway cannot do much about this situation.

If you wish to address this issue, you could potentially route that traffic through a proxy by using a service like Svix or HTTP proxies.