App crashes without any valid reason
audacity-london
PROOP

9 days ago

I am running my application (nextjs+payload cms+postgre) on railway with 6vcpu and 12gb ram for long time. Since the last outage I started to face this issue but not sure that is the reason. For 5 days app was running without any issues. Today 2 times in 2 hours i got notification of it crashed.

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

----- Native stack trace -----

1: 0xe46bbe node::OOMErrorHandler(char const*, v8::OOMDetails const&) [next-server (v16.2.6)]

<--- Last few GCs --->

5: 0x1472853 [next-server (v16.2.6)]

2: 0x1243640 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [next-server (v16.2.6)]

3: 0x1243917 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [next-server (v16.2.6)]

6: 0x148b92a [next-server (v16.2.6)]

4: 0x1472825 [next-server (v16.2.6)]

[20:0x3f76f000] 645930 ms: Scavenge (interleaved) 2035.3 (2075.8) -> 2034.0 (2079.4) MB, pooled: 0 MB, 16.33 / 0.00 ms (average mu = 0.266, current mu = 0.242) allocation failure;

7: 0x148eaf8 [next-server (v16.2.6)]

[20:0x3f76f000] 648726 ms: Mark-Compact 2039.1 (2081.6) -> 2034.0 (2081.7) MB, pooled: 0 MB, 2781.70 / 0.01 ms (average mu = 0.081, current mu = 0.028) allocation failure; scavenge might not succeed

8: 0x1cf7681 [next-server (v16.2.6)]

Aborted

<--- JS stacktrace --->

And until now it was crashed, I just restarted the app and its working. But this happened 2 times today.

Wht his can be happening, how to we fix this? I see usages were not hitting limits

image.png

image.png

$20 Bounty

51 Replies

Railway
BOT

9 days ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open Railway 9 days ago


Try adding NODE_OPTIONS=--max-old-space-size=8192 into your service variables and redeploy. If you're using a Dockerfile, make sure to add ARG/ENV statements.


mansoorahmad653
FREETop 5% Contributor

9 days ago

check your Node's heap limit is it

too low


0x5b62656e5d

Try adding `NODE_OPTIONS=--max-old-space-size=8192` into your service variables and redeploy. If you're using a Dockerfile, make sure to add ARG/ENV statements.

audacity-london
PROOP

9 days ago

I added for 4096 as beginning, probably it was 2048 before. If doesn't help, I will increase that to 8192. I am using Railpack for build. Thats why I only added it into my service variables and deployed.

How much would this increase my cost? And how can I see the current exact limit for node?


mansoorahmad653

check your Node's heap limit is it too low

audacity-london
PROOP

9 days ago

Thank you, that looks like the reason. Is there a way to check current heap limit of node I have?


mansoorahmad653
FREETop 5% Contributor

9 days ago

some thing like that

console.log(

require('v8').getHeapStatistics().heap_size_limit / 1024 / 1024

)

or

NODE_OPTIONS=--max-old-space-size=8192

that


mansoorahmad653
FREETop 5% Contributor

9 days ago

its mostly in environment variables


audacity-london

Thank you, that looks like the reason. Is there a way to check current heap limit of node I have?

mansoorahmad653
FREETop 5% Contributor

9 days ago

check my msg


audacity-london
PROOP

9 days ago

Even after adding this, it throws 499 or 502 for the pages which I don't generate before I deploy. SSR/dynamic rendering getting stuck but this was not happening before, it started to happen suddently.


mansoorahmad653
FREETop 5% Contributor

9 days ago

the 502 error is mostly the port error


mansoorahmad653
FREETop 5% Contributor

9 days ago

and i think its the common error in the railway


mansoorahmad653
FREETop 5% Contributor

9 days ago

because I get also this 502 page error when deploying my project


mansoorahmad653

because I get also this 502 page error when deploying my project

audacity-london
PROOP

9 days ago

maybe we should redeploy the db instead of a restart?


You can try.


audacity-london

maybe we should redeploy the db instead of a restart?

mansoorahmad653
FREETop 5% Contributor

9 days ago

yes try to redepliy


As mentioned above, make sure the port your URL is mapped to is the same port your application is mapped to.


audacity-london

maybe we should redeploy the db instead of a restart?

mansoorahmad653
FREETop 5% Contributor

9 days ago

some time the port is the common issue


0x5b62656e5d

As mentioned above, make sure the port your URL is mapped to is the same port your application is mapped to.

audacity-london
PROOP

9 days ago

The issue is my system was working without any issues for months, right now when any of my pages which is needing DB connection, is failing.

But during the redeployment, its working as expected, able to generate pages by using the db connection.


mansoorahmad653

some time the port is the common issue

audacity-london
PROOP

9 days ago

The issue is my system was working without any issues for months, right now when any of my pages which is needing DB connection, is failing.

But during the redeployment, its working as expected, able to generate pages by using the db connection.


Is the issue solved then?


audacity-london

The issue is my system was working without any issues for months, right now when any of my pages which is needing DB connection, is failing. But during the redeployment, its working as expected, able to generate pages by using the db connection.

mansoorahmad653
FREETop 5% Contributor

9 days ago

Solved or not


audacity-london
PROOP

9 days ago

Still not solved, app is able to render and start in building phase, after that second any request I do through private db network is failing, getting 499 error.

Trying to distribute with external db url, if that works then I will see what can I do.

Even development local server (localhost:3000) is not able to connect right now to external db tho. But while creating a build its able to connect, fetch and deploy.


Keep in mind that private networking is not available during the build phase. I'd recommend moving them to the pre-deploy phase.


audacity-london

Still not solved, app is able to render and start in building phase, after that second any request I do through private db network is failing, getting 499 error. Trying to distribute with external db url, if that works then I will see what can I do. Even development local server (localhost:3000) is not able to connect right now to external db tho. But while creating a build its able to connect, fetch and deploy.

mansoorahmad653
FREETop 5% Contributor

9 days ago

which db are you using


0x5b62656e5d

Keep in mind that private networking is not available during the build phase. I'd recommend moving them to the pre-deploy phase.

audacity-london
PROOP

9 days ago

I have a build command for external network during build phase, currently even in localhost i get this:

Error: cannot connect to Postgres. Details: Connection terminated unexpectedly


mansoorahmad653

which db are you using

audacity-london
PROOP

9 days ago

postgresql, Error: cannot connect to Postgres. Details: Connection terminated unexpectedly


Are there any errors in your Postgres logs?


0x5b62656e5d

Are there any errors in your Postgres logs?

audacity-london
PROOP

9 days ago

Not sharing all but I see something like this:

2026-05-31 08:09:32.604 UTC [56] FATAL: connection to client lost

2026-05-31 08:09:32.604 UTC [56] STATEMENT: select "pages"."id", "pages"."full_path", "pages"."slug", "pages"."parent_id", "pages"."meta_image_id", "pages"."meta_canonical_url", "pages"."meta_structured_data", "pages"."meta_twitter_card", "pages"."meta_og_type", "pages"."updated_at", "pages"."created_at", "pages"."_status", "pages__blocks_content"."data" as "_blocks_content", "pages__blocks_fundingSteps"."data" as "_blocks_fundingSteps", "pages__blocks_pageTitle"."data" as "_blocks_pageTitle", "pages__blocks_fourGridImage"."data" as "_blocks_fourGridImage", "pages__blocks_successfulTraders"."data" as "_blocks_successfulTraders", "pages__blocks_pricingTable"."data" as "_blocks_pricingTable", "pages__blocks_globalBlockRef"."data" as "_blocks_globalBlockRef", "pages__blocks_textImageSection"."data" as "_blocks_textImageSection", "pages__blocks_textImageBullet"."data" as "_blocks_textImageBullet", "pages__blocks_faq"."data" as "_blocks_faq", "pages__blocks_reviewRatings"."data" as "_blocks_reviewRatings", "pages__blocks_awardWinningProp"."data" as "_blocks_awardWinningProp", "pages__blocks_imageDivider"."data" as "_blocks_imageDivider", "pages__blocks_doubleCTA"."data" as "_blocks_doubleCTA", "pages__blocks_welcomeBanner"."data" as "_blocks_welcomeBanner", "pages__blocks_competitionBanner"."data" as "_blocks_competitionBanner", "pages__blocks_trustedReviews"."data" as "_blocks_trustedReviews", "pages__blocks_exclusiveBenefits"."data" as "_blocks_exclusiveBenefits", "pages__blocks_payoutCertificates"."data" as "_blocks_payoutCertificates", "pages__blocks_howItWorks"."data" as "_blocks_howItWorks", "pages__blocks_slidingStories"."data" as "_blocks_slidingStories", "pages__blocks_freeTrialBanner"."data" as "_blocks_freeTrialBanner", "pages__blocks_successCountUp"."data" as "_blocks_successCountUp", "pages__blocks_successStoryGrid"."data" as "_blocks_successStoryGrid", "pages__blocks_livePrices"."data" as "_blocks_livePrices", "pages__blocks_competitionHero"."data" as "_blocks_competitionHero", "pages__blocks_competitionPrizes"."data" as "_blocks_competitionPrizes", "pages__blocks_competitionTimeline"."data" as "_blocks_competitionTimeline", "pages__blocks_contentSideNav"."data" as "_blocks_contentSideNav", "pages__blocks_contactSupport"."data" as "_blocks_contactSupport", "pages__blocks_reachOutSupport"."data" as "_blocks_reachOutSupport", "pages__blocks_explorePrograms"."data" as "_blocks_explorePrograms", "pages__blocks_aboutUsNumbers"."data" as "_blocks_aboutUsNumbers", "pages__blocks_introduceCompany"."data" as "_blocks_introduceCompany", "pages__blocks_slidingLogos"."data" as "_blocks_slidingLogos", "pages__blocks_ourValues"."data" as "_blocks_ourValues", "pages__blocks_benefitCards"."data" as "_blocks_benefitCards", "pages__blocks_shakingBoxes"."data" as "_blocks_shakingBoxes", "pages__blocks_keyEvents"."data" as "_blocks_keyEvents", "pages__blocks_commissionCards"."data" as "_blocks_commissionCards", "pages__blocks_commisionBanners"."data" as "_blocks_commisionBanners", "pages__blocks_dxTradeHero"."data" as "_blocks_dxTradeHero", "pages__blocks_tradingPlatformsHero"."data" as "_blocks_tradingPlatformsHero", "pages__blocks_tradingPlatformsWhy"."data" as "_blocks_tradingPlatformsWhy", "pages__blocks_tradingPlatformsShowcase"."data" as "_blocks_tradingPlatformsShowcase", "pages__blocks_dxTradeWhyChoose"."data" as "_blocks_dxTradeWhyChoose", "pages__blocks_dxTradeWhyUse"."data" as "_blocks_dxTradeWhyUse", "pages__blocks_comparisonTable"."data" as "_blocks_comparisonTable", "pages__blocks_privacyRequest"."data" as "_blocks_privacyRequest", "pages__blocks_tradingGuidelines"."data" as "_blocks_tradingGuidelines", "pages__blocks_knowledgeCenter"."data" as "_blocks_knowledgeCenter", "pages__blocks_tocCompetition"."data" as "_blocks_tocCompetition", "pages__blocks_scalingPlans"."data" as "_blocks_scalingPlans", "pages__blocks_scalingUpToDate"."data" as

connection to client lost is most critical part I believe?


Did your service run into OOM during this query? Or is the OOM solved now?


mansoorahmad653
FREETop 5% Contributor

9 days ago

The problem is happening because your app is losing the database connection during runtime, not during build. Postgres shows “connection to client lost,” which means your Node.js app is disconnecting while a query is still running. This usually happens when the server crashes, restarts, or the request takes too long and gets killed (which also explains the 499 errors). Your Payload CMS query is very heavy with many joins and data blocks, so it may be slowing down and triggering timeouts. After the Railway outage, there is likely also some instability with database connections or pooling. Overall, it’s not just memory, but a mix of slow heavy queries, connection pool issues, and requests being cut off before they finish.


mansoorahmad653
FREETop 5% Contributor

9 days ago

The best fix is to make your database connection stable and reduce how heavy your queries are. Right now your app is likely opening too many database connections and running very large Payload CMS queries that take too long to finish. This causes requests to time out and get cut off, which is why you see 499 errors and “connection to client lost” in Postgres. You should use a single shared database pool instead of creating new connections each time, reduce query depth and unnecessary data fetching in Payload, and make sure slow queries are limited or optimized. Once the connections are stable and the queries are lighter, the crashes and disconnections should stop.


0x5b62656e5d

Did your service run into OOM during this query? Or is the OOM solved now?

audacity-london
PROOP

9 days ago

I can't test because its not being deployed since it cant connect to db now.. I really need some professional help from railway team I believe. This system is running with many deployments for last 5 months. Today its not working suddenly. Trying to use an old db backup from 2 days ago, but getting stuck in db connection part

image.png

Attachments


mansoorahmad653
FREETop 5% Contributor

9 days ago

couple of days ago my deployment is stuck so I delete that deployment and redeploy my whole project again from start and its work


mansoorahmad653

couple of days ago my deployment is stuck so I delete that deployment and redeploy my whole project again from start and its work

audacity-london
PROOP

9 days ago

Its a company website, unfortunately I can not do that.


audacity-london

I can't test because its not being deployed since it cant connect to db now.. I really need some professional help from railway team I believe. This system is running with many deployments for last 5 months. Today its not working suddenly. Trying to use an old db backup from 2 days ago, but getting stuck in db connection part ![image.png](https://station-server.railway.com/attachments/att_01ksyhz13bexzty313p8mw9qpc)

mansoorahmad653
FREETop 5% Contributor

9 days ago

if you want then try to setup each and every thing from start again


audacity-london

Its a company website, unfortunately I can not do that.

mansoorahmad653
FREETop 5% Contributor

9 days ago

ohhh


I'd try running pg_dump on your Postgres service via the console (Enable priority boarding in https://railway.com/account/feature-flags), and download the dump. Then, create a new environment and deploy your services there and see if it works. You can then use the console again on the newly created Postgres service to restore the database with pg_restore. Keep in mind that this will cause a spike in usage costs as you are deploying a new copy of everything.


audacity-london

I can't test because its not being deployed since it cant connect to db now.. I really need some professional help from railway team I believe. This system is running with many deployments for last 5 months. Today its not working suddenly. Trying to use an old db backup from 2 days ago, but getting stuck in db connection part ![image.png](https://station-server.railway.com/attachments/att_01ksyhz13bexzty313p8mw9qpc)

mansoorahmad653
FREETop 5% Contributor

9 days ago

on of the issue is

that your app is trying to pre-build a lot of pages from the database, but the database queries are too slow or too heavy. Because of that, Next.js gives up after 60 seconds per page and retries, which eventually breaks the build process.


0x5b62656e5d

I'd try running `pg_dump` on your Postgres service via the console (Enable priority boarding in https://railway.com/account/feature-flags), and download the dump. Then, create a new environment and deploy your services there and see if it works. You can then use the console again on the newly created Postgres service to restore the database with `pg_restore`. Keep in mind that this will cause a spike in usage costs as you are deploying a new copy of everything.

audacity-london
PROOP

9 days ago

I tried to use a old copy, it throws this, what can I do about this?

You reached the start of the range

May 31, 2026, 11:34 AM

Mounting volume on: /var/lib/containers/railwayapp/bind-mounts/1b3210fb-5b06-44cb-867d-ed80048a997d/vol_c0h6dsv6zdq9rblo

Starting Container

wrapper: removing stale /var/lib/postgresql/data/pgdata/postmaster.pid (no postgres running at container start)

Certificate will not expire

pgbackrest: volume 878 MiB; sized wal-drop=87 MiB queue-max=439 MiB

pgbackrest: restore-gate WAL_RECOVER_FROM_BUCKET= POSTGRES_RECOVERY_TARGET_TIME= PG_VERSION=present PG_CONTROL=present RESTORED_MARKER=missing PGDATA=/var/lib/postgresql/data/pgdata

PostgreSQL Database directory appears to contain a database; Skipping initialization

2026-05-31 08:35:41.057 UTC [16] FATAL: database files are incompatible with server

2026-05-31 08:35:41.057 UTC [16] DETAIL: The data directory was initialized by PostgreSQL version 17, which is not compatible with this version 18.4 (Debian 18.4-1.pgdg13+1).

Mounting volume on: /var/lib/containers/railwayapp/bind-mounts/1b3210fb-5b06-44cb-867d-ed80048a997d/vol_c0h6dsv6zdq9rblo

2026-05-31 08:35:42.887 UTC [15] FATAL: database files are incompatible with server

2026-05-31 08:35:42.887 UTC [15] DETAIL: The data directory was initialized by PostgreSQL version 17, which is not compatible with this version 18.4 (Debian 18.4-1.pgdg13+1).

Certificate will not expire

pgbackrest: volume 878 MiB; sized wal-drop=87 MiB queue-max=439 MiB

pgbackrest: restore-gate WAL_RECOVER_FROM_BUCKET= POSTGRES_RECOVERY_TARGET_TIME= PG_VERSION=present PG_CONTROL=present RESTORED_MARKER=missing PGDATA=/var/lib/postgresql/data/pgdata

PostgreSQL Database directory appears to contain a database; Skipping initialization

Mounting volume on: /var/lib/containers/railwayapp/bind-mounts/1b3210fb-5b06-44cb-867d-ed80048a997d/vol_c0h6dsv6zdq9rblo

Certificate will not expire

pgbackrest: volume 878 MiB; sized wal-drop=87 MiB queue-max=439 MiB

pgbackrest: restore-gate WAL_RECOVER_FROM_BUCKET= POSTGRES_RECOVERY_TARGET_TIME= PG_VERSION=present PG_CONTROL=present RESTORED_MARKER=missing PGDATA=/var/lib/postgresql/data/pgdata

PostgreSQL Database directory appears to contain a database; Skipping initialization


audacity-london
PROOP

9 days ago

When i remove SSR rendering, now its not opening website at all since its dependent on database and theres no SSR page..

My db is not responding, please help!


Click on your database, go to settings, edit the source image, and change the number to 17. Make sure to disable auto updates as well.


audacity-london
PROOP

9 days ago

Stucked here.. gonna redeploy. When I do calls from my terminal i can read db but my app can't, how can this be possible? theres no code change literally

image.png

Attachments


Did you change the image version to 17?


0x5b62656e5d

Did you change the image version to 17?

audacity-london
PROOP

9 days ago

Yes, and redeployed. Now db is up, should i try to connect?


Yes...


audacity-london
PROOP

9 days ago

I started to use new db in build phase and runtime phase.

ERROR: Error: cannot connect to Postgres. Details: password authentication failed for user "postgres"

In build phase it worked as expected:

DATABASE_URI=postgresql://postgres:[HIDINGONPURPOSE]@zephyr.proxy.rlwy.net:10644/railway pnpm run build

30s

payload-test@0.1.0 build /app

next build

▲ Next.js 16.2.6 (Turbopack)

  • Experiments (use with caution):

    · serverActions

⚠ The "middleware" file convention is deprecated. Please use "proxy" instead. Learn more: https://nextjs.org/docs/messages/middleware-to-proxy

Creating an optimized production build ...

✓ Compiled successfully in 16.3s

Running TypeScript ...

Finished TypeScript in 9.6s ...

Collecting page data using 19 workers ...

Generating static pages using 19 workers (0/4) ...

Generating static pages using 19 workers (1/4)

Generating static pages using 19 workers (2/4)

Generating static pages using 19 workers (3/4)

✓ Generating static pages using 19 workers (4/4) in 368ms

Finalizing page optimization ...

Bun in runtime it throws Error: cannot connect to Postgres. Details: password authentication failed for user "postgres"

Why can this be happening? using new db created with you

image.png

Attachments


audacity-london
PROOP

9 days ago

I used external for both, just to make it work at least for now. Then I was planning to switch private networking db url but during build it works, during runtime it does not..


Is your Postgres throwing the authentication error in its logs?


0x5b62656e5d

Is your Postgres throwing the authentication error in its logs?

audacity-london
PROOP

9 days ago

Yes, can see this:

2026-05-31 08:58:19.580 UTC [2681] FATAL: password authentication failed for user "postgres"

2026-05-31 08:58:19.580 UTC [2681] DETAIL: Connection matched file "/var/lib/postgresql/data/pgdata/pg_hba.conf" line 128: "host all all all scram-sha-256"

2026-05-31 08:58:19.608 UTC [2682] FATAL: password authentication failed for user "postgres"

2026-05-31 08:58:19.608 UTC [2682] DETAIL: Connection matched file "/var/lib/postgresql/data/pgdata/pg_hba.conf" line 128: "host all all all scram-sha-256"

2026-05-31 08:58:20.469 UTC [2684] FATAL: password authentication failed for user "postgres"

2026-05-31 08:58:20.469 UTC [2684] DETAIL: Connection matched file "/var/lib/postgresql/data/pgdata/pg_hba.conf" line 128: "host all all all scram-sha-256"

2026-05-31 08:58:20.504 UTC [2685] DETAIL: Connection matched file "/var/lib/postgresql/data/pgdata/pg_hba.conf" line 128: "host all all all scram-sha-256"


Try this:

  1. Disable all public networking on the database if you have any, as the following steps will disable user authentication
  2. SSH into your database service (right click your service and select Copy SSH Command)
  3. Run this command: sed -i 's/host all all all scram-sha-256/host all all ::\/0 trust/' /var/lib/postgresql/data/pgdata/pg_hba.conf (This will bypass user authentication)
  4. Redeploy your database
  5. SSH again, and run the command psql
  6. Run ALTER USER postgres with password '<PASSWORD>'; where <PASSWORD> is the value of the variable PGPASSWORD in your Railway dashboard
  7. Type exit
  8. Run sed -i 's/host all all ::\/0 trust/host all all all scram-sha-256/' /var/lib/postgresql/data/pgdata/pg_hba.conf (This will re-enable user authentication)
  9. Redeploy your database

audacity-london
PROOP

8 days ago

I did that, still not connecting to db after build phase. Runtime is failing but theres no change in code, I will try to rollback to 4 days ago commit with 4 days ago db and update you back..


audacity-london
PROOP

8 days ago

Its working back to normal with new db service we created, i just mounted our db to that db service. What could be possible that?


Status changed to Open passos 5 days ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...