3 months ago
Body
Project: humorous-possibility
Service: Adros-SaaS
Environment/Region: production, US East (Virginia)
Stack: Node.js 20 on Railway, Express, undici fetch (no custom dispatcher), offline JWT verification (no Auth round‑trips)
External dependency: Supabase REST (prod) jnprhvhkxggvqowmwppf.supabase.co
Issue
After a fresh deploy, everything works for ~10–20 minutes.
Then only outbound requests to Supabase REST begin to hang and end in AbortError timeouts (around 20–25s). Our own /api/health continues to respond immediately.
A redeploy “resets” the behavior for another 10–20 minutes.
Staging environment (pcqklbwsbnvcynprxboi.supabase.co) with the same code/region does NOT reproduce.
Representative logs (UTC)
2025‑09‑10 02:53:25: GET …/rest/v1/appointments?... ms=21844 code=20 error=“This operation was aborted”
2025‑09‑10 02:53:25: GET …/rest/v1/clinics?select=… ms=23691 code=20 error=“This operation was aborted”
Our lightweight ping to Supabase also fails during the incident: “[supabase] ping_error AbortError: This operation was aborted”
What we’ve ruled out
DB slowness on our side: we added the right indexes (appointments (clinic_id, starts_at), scheduled_messages partial (clinic_id, send_at) WHERE sent=false AND paused=false, patients (clinic_id, updated_at), clinic_professionals (clinic_id), tag_assignments indexes). EXPLAIN ANALYZE shows ms-level execution (e.g., appointments ~0.07 ms, scheduled_messages ~0.04 ms).
Client settings: removed custom undici dispatcher and aggressive keep‑alive; now using default undici fetch with per‑attempt timeout and short retry/backoff; added a lightweight circuit breaker to avoid cascades.
Suspicion
An infra/network path issue between our Railway egress and Supabase (edge/CDN/PostgREST pool/rate‑limit/NAT). The problem appears only in production and correlates with container uptime, not query plans.
Asks for Railway
1) Share the current egress IP(s) for this service and check if those IPs are hitting any outbound throttling/connection limits/timeouts to jnprhvhkxggvqowmwppf.supabase.co:443.
2) Look for signs of NAT/egress gateway idle-socket recycling or per-destination concurrency limits after ~10 minutes of uptime.
3) Check DNS resolution/route health from our node to that hostname around the provided timestamps; any spikes in TLS handshake failures or SYN timeouts?
4) Confirm if there’s any shared egress policy that could intermittently impact long‑lived services, and whether a static egress IP or different egress pool/region would help.
5) Provide recommended best practices for undici/Node networking on Railway in this scenario (keep‑alive expectations, retry patterns).
Notes
The same code and region works fine against our staging Supabase project.
Repo (if helpful): https://github.com/AndreBarros2/Adros-SaaS
Happy to provide additional logs (full lines with URLs/timestamps) or run targeted probes you suggest.
2 Replies
3 months ago
Hey there! We've found the following might help you get unblocked faster:
🧵 Subject: Connection Timeout with [smtp.gmail.com](smtp.gmail.com) on Port 465
🧵 Node.js Express backend where the application doesn't seem to detect Supabase credentials
If you find the answer from one of these, please let us know by solving the thread!
3 months ago
Apologies but this looks like an issue with the application level code. Due to volume, we can only answer platform level issues.
I've made this thread public so that the community might be able to help with you query.
Status changed to Awaiting User Response Railway • 3 months ago
3 months ago
This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.
Status changed to Open jake • 3 months ago
Status changed to Solved andrebarros2 • 3 months ago