Consistent 12.4–12.6s TTFB on all origin-fetched routes; some 502s — project bc176ae4

joshuadeacon2005-code

PROOP

2 months ago

Subject: Constant 12.5s TTFB on all origin fetches, intermittent 502s

Every origin fetch through your edge takes a near-constant 12.4-12.6s TTFB, regardless of method, HTTP version, IPv4/IPv6, or which domain (custom or *.up.railway.app).

Server logs prove application processes each request in 0-5ms, so latency is entirely between Fastly edge and Railway upstream. Cached apex HTML serves in 0.4s, so the edge

works -- only origin fetches are broken.

Client service URL also intermittently returns 502 with same 12.6s TTFB.

Latency, consistent across many runs:

bloomleave.com/ GET 0.44s cached

api.bloomleave.com/api/health GET 12.56s

bloom-lmsclient-production.up.railway.app/ GET 12.62s 502

bloom-lmsserver-production.up.railway.app/api/health GET 12.51s

POST api.bloomleave.com/api/auth/login 12.46s

curl breakdown for api.bloomleave.com: DNS 0.002s, TCP 0.012s, TLS 0.044s, TTFB 12.56s.

Already ruled out:

- Application code (server log shows 0ms processing)

- Express bind (explicit 0.0.0.0)

- Stale custom-domain routing (deleted + recreated via GraphQL, fresh CNAME, no change)

- Missing service domain (added via serviceDomainCreate, same 12.5s)

- HTTP version, method, cache-busting query strings -- no difference

Please investigate:

1. Is the Fastly-to-Railway origin path degraded for this project or us-west2 today?

2. Why does the client bare service URL return 502 with 12.6s wait? Looks like connection retry.

3. Anything I can configure (healthcheck path, replicas, region) to improve routing? healthcheckPath is null on both services.

Reproduce:

curl -w 'TTFB %{time_starttransfer}s\n' -o /dev/null -s https://api.bloomleave.com/api/health

Solved

5 Replies

Status changed to Open Railway • 2 months ago

0x5b62656e5d

MODERATOR

2 months ago

Is serverless enabled?

0x5b62656e5d

Is serverless enabled?

joshuadeacon2005-code

PROOP

2 months ago

Confirmed via the GraphQL API on all three services in this project — sleepApplication is false on @bloom-lms/client, @bloom-lms/server, and Postgres. Serverless / sleep

mode is not enabled.

The 12.4s delay is observable on every request including immediately back-to-back ones (5 requests in 60 seconds, all 12.4-12.6s), so it's not consistent with

cold-start-from-sleep behaviour anyway, which would only affect the first request after idle.

Server-side request logs confirm each request is processed by my application in 0-5ms. The latency lives entirely between the edge (Fastly) and the upstream container.

0x5b62656e5d

MODERATOR

2 months ago

The team has been made aware of this and has resolved the issue. Response times should go back to normal very soon.

Status changed to Solved 0x5b62656e5d • 2 months ago

Status changed to Open 0x5b62656e5d • 2 months ago

0x5b62656e5d

MODERATOR

2 months ago

Active incident, actively being investigated by the team: https://status.railway.com/incident/L9HP750V

Status changed to Awaiting Railway Response brody • 2 months ago

brody

EMPLOYEE

2 months ago

This has now been resolved.

The latency was caused by an issue with our CDN provider Fastly, specifically affecting their KV store in the Asia region. Their incident report: https://www.fastlystatus.com/incident/378503

Apologies for the disruption.

Status changed to Awaiting User Response Railway • 2 months ago

Status changed to Solved 0x5b62656e5d • 2 months ago

Welcome!