My login/registration endpoints are often failing. This morning it failed at 7:00am. at 7:55am, and worked at 8:00am. 7:00am error: [baby-manager-production.up.railway.app/api/admin/auth/verify-password:1](http://baby-manager-production.up.railway.app/api/admin/auth/verify-password:1) Failed to load resource: net::ERR\_CONNECTION\_TIMED\_OUT 7:55am error: ui-vendor-BSqyCeUj.js:1 POST net::ERR\_NAME\_NOT\_RESOLVED The server logs don't show anything for these requests/

Failed to load resource: net::ERR_FAILED

keithmacinnis

HOBBYOP

4 months ago

My login/registration endpoints are often failing. This morning it failed at 7:00am. at 7:55am, and worked at 8:00am.

7:00am error: baby-manager-production.up.railway.app/api/admin/auth/verify-password:1 Failed to load resource: net::ERR_CONNECTION_TIMED_OUT

7:55am error: ui-vendor-BSqyCeUj.js:1 POST https://baby-manager-production.up.railway.app/api/admin/auth/verify-password net::ERR_NAME_NOT_RESOLVED

The server logs don't show anything for these requests/

$10 Bounty

8 Replies

darseen

HOBBYTop 1% Contributor

4 months ago

Maybe your service is sleeping when inactive, thus causing the connection timeout error. Can you check in your service settings that you don't have Enable Serverless option enabled?

keithmacinnis

HOBBYOP

4 months ago

😕

Attachments

image.png

keithmacinnis

HOBBYOP

4 months ago

Here's what opus thinks:

I've written a comprehensive plan to address the intermittent connection issues. The core problems are:

1. No retry logic - Your frontend fails immediately on any network hiccup

2. Railway cold starts - Server sleeps, first request times out while waking

3. No health checks - Railway can't verify server readiness

Before I finalize the plan, I have a question about your Railway setup:

How do you want to keep the Railway server warm to prevent cold starts?

❯ 1. External ping service (Recommended)

Free services like UptimeRobot ping /health every 5 min to prevent sleep

2. Railway always-on

Requires paid Railway plan, guarantees no cold starts

3. Just add retries

Accept cold starts, rely on client retry logic to handle them

darseen

HOBBYTop 1% Contributor

4 months ago

Can you provide more info about your tech stack, relevant logs, and network config?

keithmacinnis

HOBBYOP

4 months ago

Google Chrome logs:

7am: baby-manager-production.up.railway.app/api/admin/auth/verify-password:1 Failed to load resource: net::ERR_CONNECTION_TIMED_OUT

755am: baby-manager-production.up.railway.app/api/admin/auth/verify-password:1 Failed to load resource: net::ERR_CONNECTION_TIMED_OUT

800am: baby-manager-production.up.railway.app/api/admin/auth/verify-password:1 Failed to load resource: net::ERR_NAME_NOT_RESOLVED

8:05am success.

There's no server logs that go along with these ones.

Backend:baby-manager-production.up.railway.app (Express + Prisma + PostgreSQL on Railway).

Frontend

Host: Vercel
App: React 19 + Vite 7
Site:https://www.birdnestfamilies.com (also https://birdnestfamilies.com)
API calls: Direct from browser to https://baby-manager-production.up.railway.app (no Vercel proxy)

darseen

HOBBYTop 1% Contributor

4 months ago

Have you confirmed the issue happens to other users or just you? Because net::ERR_NAME_NOT_RESOLVED might happen because of local DSN/Network issues on your machine. Especially that the error occurred in chrome. You usually try to open the website in an incognito tab or flush your DNS in CMD to verify the issue is not related to what was mentioned. But if that's not the case, then sharing your metrics tab at the time of the incident, just to check CPU/Memory usage, and verify if the Event Loop was blocked by some operation, or maybe an OOM issue.

I find it interesting that you separated your website and backend on two different providers! You could've used Railway to host your react app along with the backend, thus benefiting from private network between services for lower latency and less egress costs. Or hosting the entire thing on Vercel and make the backend by using react server components/server actions (If your app does not use web sockets and only relies on restful api requests).

keithmacinnis

HOBBYOP

4 months ago

I use a Nokia Home 5G hotspot for my home internet and that might be the root. My cause for concern began after I put tracking on the landing page - and with that, noticed a 100% drop when using the sign up modal on the registration section, where the first call was made. I had a positive test testing with my iPhone's hotspot, and will try that a few more times to make sure.

Vercel for me has always been my default place for a front end app to start out. I am using web sockets, and I'm also building out an iOS version. It might be a good time to move the react app over here 🙂

darseen

HOBBYTop 1% Contributor

4 months ago

Yeah, well, at this point that's all I know about this issue with the provided info. If anything new arises from your testing, I'll definitely look into it. As for your website hosted on vercel, it's definitely better for you to move it here since you're using web sockets serverless is not an option for the backend. This way you'll reduce latency and cut on egress costs. Good luck!

Welcome!