I am trying not to run to support for every error but it seems like I have exhausted everything I can think of on this one. I had the deployment working prior to the Custom Domain being added (DNS through Cloudflare). Here are the steps I took already: * PORT is respected and configured to use the ENV variable * HOST is configured to use 0.0.0.0 * I have disabled CORS on my machine to allow all sources for the time being * I have disabled all Proxy on the Cloudflare setup * I have followed all guidance I can find on related errors Can't for the life of me understand what I am missing. I am sure it is dumb and obivous but it is not to me at this point.

All 502 Errors after adding Custom Domain

Status changed to Open Railway • about 2 months ago

ilyass012

FREE

2 months ago

hey,

one thing that's easy to miss ; the port tied to your custom domain in railway might not match the one your app is actually running on. go to your service settings, remove the custom domain completely, then re add it and make sure you select the correct port when doing so. also double check that the port under your custom domain matches the one under your railway-generated domain

toddcornett

PROOP

2 months ago

Ok, attempting to reduce the complexity of the situation. I first tried your suggestion but same result and issue.

So I removed the Custom Domain just to eliminate any potential issues with that or Cloudflare. Still same issue and results.

ilyass012

hey, one thing that's easy to miss ; the port tied to your custom domain in railway might not match the one your app is actually running on. go to your service settings, remove the custom domain completely, then re add it and make sure you select the correct port when doing so. also double check that the port under your custom domain matches the one under your railway-generated domain

toddcornett

PROOP

2 months ago

Ok, attempting to reduce the complexity of the situation. I first tried your suggestion but same result and issue.

ilyass012

FREE

2 months ago

okay , can you share your deploy logs? ,i think theres usually a clear error in there

toddcornett

PROOP

2 months ago

Not much there:

Attachments

logs.177429...

ilyass012

FREE

2 months ago

so the container starts and kicks off node dist/src/main but there's no "listening on port" message after that , which means your app is either crashing silently on startup or never actually binding to the port. can you check your main.ts file and confirm that when you call app.listen() you are passing process.env.PORT and 0.0.0.0 explicitly? something like app.listen(process.env.PORT, '0.0.0.0'). if the port is hardcoded or the host is missing that would explain everything

darseen

HOBBYTop 1% Contributor

2 months ago

One thing that could possibly be causing this issue is if you mapped a root domain (e.g., yourdomain.com) instead of a subdomain (e.g. www.yourdomain.com or api.yourdomain.com), Cloudflare automatically flattens the CNAME into an A record. If you are trying to use the root domain, switch your setup to a subdomain (like www or api), then add the subdomain in Railway, and create the corresponding CNAME in Cloudflare.

toddcornett

PROOP

2 months ago

I have dropped all of Custom Domain stuff from the configuration and I have confirmed that locally I am seeing logs like this:

15:10 $ node dist/src/main

2026-03-23T22:10:41.607Z info: [NestFactory] Starting Nest application...

2026-03-23T22:10:41.607Z info: [InstanceLoader] DatabaseModule dependencies initialized

2026-03-23T22:10:41.607Z info: [InstanceLoader] QueueModule dependencies initialized

2026-03-23T22:10:41.608Z info: [InstanceLoader] BullBoardModule dependencies initialized

But I am still getting 502 errors from Railway. Completely baffled now...

ilyass012

so the container starts and kicks off node dist/src/main but there's no "listening on port" message after that , which means your app is either crashing silently on startup or never actually binding to the port. can you check your main.ts file and confirm that when you call app.listen() you are passing process.env.PORT and 0.0.0.0 explicitly? something like app.listen(process.env.PORT, '0.0.0.0'). if the port is hardcoded or the host is missing that would explain everything

toddcornett

PROOP

2 months ago

Also, this is in my main.ts:

    const port = process.env.PORT ? parseInt(process.env.PORT) : 4000;
    const host = `0.0.0.0`;
    await app.listen(port, host);

ilyass012

FREE

2 months ago

your main.ts is fine ; but look at the difference , locally you see all the nestjs module initialization logs (Databasemodule, QueueModule etc.), but on railway the logs stop completely right after node dist/src/main with none of that. your app is crashing on railway before it even reaches the initialization phase. can you check your railway service variables? are all the env vars your app needs actually set there (db connection, queue config, etc.)? that silent crash pattern is usually a missing or misconfigured env var on railway

ilyass012

your main.ts is fine ; but look at the difference , locally you see all the nestjs module initialization logs (Databasemodule, QueueModule etc.), but on railway the logs stop completely right after node dist/src/main with none of that. your app is crashing on railway before it even reaches the initialization phase. can you check your railway service variables? are all the env vars your app needs actually set there (db connection, queue config, etc.)? that silent crash pattern is usually a missing or misconfigured env var on railway

toddcornett

PROOP

2 months ago

BEFORE locally, it ran but the logs were shortened as you pointed out. Working through some configuration changes that were hijacking the logs and committed them.

NOW the same command works great and dumps the logs locally (both in the Console and in a rotating log file). I have now removed the rotating log file and only maintain the console output yet it still will not show anything of value in the Deploy Logs.

I am so confused because the ENV Vars are all there, the values are populated correctly, and locally it is very chatty so I am stuck on the next steps to debug.

darseen

One thing that could possibly be causing this issue is if you mapped a root domain (e.g., `yourdomain.com`) instead of a subdomain (e.g. `www.yourdomain.com` or `api.yourdomain.com`), Cloudflare automatically flattens the CNAME into an A record. If you are trying to use the root domain, switch your setup to a subdomain (like `www` or `api`), then add the subdomain in Railway, and create the corresponding CNAME in Cloudflare.

toddcornett

PROOP

2 months ago

I have completely stripped out the Custom Domain at this point to just get it working but I will come back to this when I am ready to add it back in. Appreciate the thought

toddcornett

BEFORE locally, it ran but the logs were shortened as you pointed out. Working through some configuration changes that were hijacking the logs and committed them. NOW the same command works great and dumps the logs locally (both in the Console and in a rotating log file). I have now removed the rotating log file and only maintain the console output yet it still will not show anything of value in the Deploy Logs. I am so confused because the ENV Vars are all there, the values are populated correctly, and locally it is very chatty so I am stuck on the next steps to debug.

ilyass012

FREE

2 months ago

quick question , after you committed those changes, did you trigger a new deploy on railway? railway won't pick up new code automatically unless it's connected to your repo and you pushed, or you manually triggered a redeploy. if railway is still running the old build it would explain why the logs are still silent there even though everything works locally now

ilyass012

quick question , after you committed those changes, did you trigger a new deploy on railway? railway won't pick up new code automatically unless it's connected to your repo and you pushed, or you manually triggered a redeploy. if railway is still running the old build it would explain why the logs are still silent there even though everything works locally now

toddcornett

PROOP

2 months ago

Good thought, I am connected to my GitHub repo so I see the commit message for the changes in the Railway so I know it is pulling the right hash. Though considering the depths of the search, I am going to delete all of the deployments and have it just repull and deploy fresh right now.

toddcornett

Good thought, I am connected to my GitHub repo so I see the commit message for the changes in the Railway so I know it is pulling the right hash. Though considering the depths of the search, I am going to delete all of the deployments and have it just repull and deploy fresh right now.

toddcornett

PROOP

2 months ago

Unfortunately same thing with this reference ID:

Request ID:

xwxuxhnGSIaY-prmnpoFkQ

toddcornett

PROOP

2 months ago

What I cant figure out is that it was running without an issue last week. I locally updated some packages including NestJS framework and then committed them and this happened (also tried the Custom Domain as well). Very little was done other than that. I had another issue around the watcher package within Railway after the updates as well so I am thinking its the updates but I am not seeing errors showing that.

ilyass012

FREE

2 months ago

okay that's a key detail , it broke right after the nestjs package updates. since your deploy logs are still silent on railway (no initialization logs) but work fine locally, the update likely introduced something that fails specifically in railway's environment.

two things to check now:

first, go to your http logs tab on railway, find a 502 entry and share what's in the responseDetails and upstreamAddress fields , that will show us exactly why railway's proxy can't reach your app

second, check if your package-lock.json was committed alongside the update , if not, railway might be resolving different package versions than what you have locally and that could explain why it works locally but not on railway

ilyass012

okay that's a key detail , it broke right after the nestjs package updates. since your deploy logs are still silent on railway (no initialization logs) but work fine locally, the update likely introduced something that fails specifically in railway's environment. two things to check now: first, go to your http logs tab on railway, find a 502 entry and share what's in the responseDetails and upstreamAddress fields , that will show us exactly why railway's proxy can't reach your app second, check if your package-lock.json was committed alongside the update , if not, railway might be resolving different package versions than what you have locally and that could explain why it works locally but not on railway

toddcornett

PROOP

2 months ago

Ok, attached are the logs of one of the 502 errors:

{ "requestId": "JdZoG7EpR4mlHgvbLPU1MQ", "timestamp": "2026-03-23T23:23:12.096621969Z", "method": "GET", "path": "/docs", "host": "mentha-backend-staging.up.railway.app", "httpStatus": 502, "upstreamProto": "", "downstreamProto": "HTTP/2.0", "responseDetails": "Retried single replica", "totalDuration": 8, "upstreamAddress": "", "clientUa": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/146.0.0.0 Safari/537.36", "upstreamRqDuration": 6, "txBytes": 4682, "rxBytes": 782, "srcIp": "45.26.36.200", "edgeRegion": "us-west2", "upstreamErrors": "[{\"deploymentInstanceID\":\"8e39ff99-2e7e-434f-b05c-1dc8099b695d\",\"error\":\"connection refused\",\"duration\":1},{\"deploymentInstanceID\":\"8e39ff99-2e7e-434f-b05c-1dc8099b695d\",\"error\":\"connection refused\",\"duration\":3},{\"deploymentInstanceID\":\"8e39ff99-2e7e-434f-b05c-1dc8099b695d\",\"error\":\"connection refused\",\"duration\":2}]" }

ilyass012

okay that's a key detail , it broke right after the nestjs package updates. since your deploy logs are still silent on railway (no initialization logs) but work fine locally, the update likely introduced something that fails specifically in railway's environment. two things to check now: first, go to your http logs tab on railway, find a 502 entry and share what's in the responseDetails and upstreamAddress fields , that will show us exactly why railway's proxy can't reach your app second, check if your package-lock.json was committed alongside the update , if not, railway might be resolving different package versions than what you have locally and that could explain why it works locally but not on railway

toddcornett

PROOP

2 months ago

I am using pnpm and I have deleted and removed the pnpm-lock.yaml file and had it regenerated as part of all of this debugging. Just deleted it and regenerated it and it is the exact same as whats in the repo.

ilyass012

FREE

2 months ago

upstreamAddress is empty and you're getting connection refused on every retry so your app is not binding to any port on railway at all, meaning it's crashing before it even reaches app.listen(),this matches your deploy logs which show zero nestjs initialization

ilyass012

FREE

2 months ago

since this started after the nestjs update and works fine locally, something in the update is causing a silent crash specifically on railway. can you check your railway deploy logs scroll all the way, is there absolutely nothing after node dist/src/main, not even an error?

ilyass012

FREE

2 months ago

second thing to do is to run node dist/src/main locally with your railway env vars set (not your local ones) , does it still start up fine?, this test will likely reproduce the crash and show the actual error

ilyass012

since this started after the nestjs update and works fine locally, something in the update is causing a silent crash specifically on railway. can you check your railway deploy logs scroll all the way, is there absolutely nothing after node dist/src/main, not even an error?

toddcornett

PROOP

2 months ago

Screenshot of the Deploy Logs

Attachments

Screenshot%...

ilyass012

FREE

2 months ago

okay so the app crashes immediately and silently right after node dist/src/main with zero output. no error, nothing

ilyass012

FREE

2 months ago

given you updated nestjs packages: your dist/ folder might not have been rebuilt after the updates. if you committed the old compiled dist/ to your repo, railway is running stale compiled code that doesn't match your updated packages , make sure your railway build step is actually running pnpm build before pnpm start:prod check your package.json start:prod script and your railway build command , if railway is skipping the build step it would explain this

ilyass012

given you updated nestjs packages: your dist/ folder might not have been rebuilt after the updates. if you committed the old compiled dist/ to your repo, railway is running stale compiled code that doesn't match your updated packages , make sure your railway build step is actually running `pnpm build` before `pnpm start:prod` check your `package.json` start:prod script and your railway build command , if railway is skipping the build step it would explain this

toddcornett

PROOP

2 months ago

I followed all of the commands and copied over config for the ENV file to my local machine and the system builds and runs without issue. I am able to connect to the Railway database and the Redis instance. Nothing is making sense at this point. There are no errors on startup and I am getting the log outputs without issue.

ilyass012

given you updated nestjs packages: your dist/ folder might not have been rebuilt after the updates. if you committed the old compiled dist/ to your repo, railway is running stale compiled code that doesn't match your updated packages , make sure your railway build step is actually running `pnpm build` before `pnpm start:prod` check your `package.json` start:prod script and your railway build command , if railway is skipping the build step it would explain this

toddcornett

PROOP

2 months ago

Also, the package.json is simply running "node dist/src/main" (without anything else) so I cant imagine it is stale builds

ilyass012

FREE

2 months ago

if your start command is just node dist/src/main with no build step, railway is running your committed dist/ folder as-is. is your dist/ folder committed to your repo? if yes, was it rebuilt locally after the nestjs package updates before you committed? if the compiled dist/ was built with the old packages and never rebuilt after the update, railway would be running mismatched compiled code which could cause exactly this silent crash

ilyass012

FREE

2 months ago

is your dist/ folder committed to your github repo (i m jst asking if it is in your .gitignore or not)?

ilyass012

is your `dist/` folder committed to your github repo (i m jst asking if it is in your `.gitignore` or not)?

toddcornett

PROOP

2 months ago

the dist/ is not committed to the repo

ilyass012

FREE

2 months ago

also what does your railway build command look like in your service settings (under settings > deploy)?

toddcornett

PROOP

2 months ago

Attached is the build logs

Attachments

logs.177431...

ilyass012

FREE

2 months ago

since dist/ is not in your repo, railway has to build it before starting. can you go to your railway service settings under the deploy section and tell us what your build command is set to? if there's no build command configured, railway would try to run node dist/src/main but dist/ wouldn't exist yet, which would cause exactly this instant silent crash with no error output

ilyass012

since dist/ is not in your repo, railway has to build it before starting. can you go to your railway service settings under the deploy section and tell us what your build command is set to? if there's no build command configured, railway would try to run node dist/src/main but dist/ wouldn't exist yet, which would cause exactly this instant silent crash with no error output

toddcornett

PROOP

2 months ago

Attached is the screenshot of the Build Settings

Attachments

Screenshot%...

ilyass012

FREE

2 months ago

okay do this if it works i will explain you what happend hahaha , add NO_CACHE=1 as an environment variable in your service variables tab, then redeploy. that tells railway to skip all cached layers including the pnpm install step and build everything from scratch. once it deploys successfully you can remove that variable again

ilyass012

okay do this if it works i will explain you what happend hahaha , add NO\_CACHE=1 as an environment variable in your service variables tab, then redeploy. that tells railway to skip all cached layers including the pnpm install step and build everything from scratch. once it deploys successfully you can remove that variable again

toddcornett

PROOP

2 months ago

Trying it now

ilyass012

FREE

2 months ago

okay , can you return to me faster bc i need to go to sleep :)

toddcornett

PROOP

2 months ago

Absolutely... it is deploying now ... will respond as soon as the system does

toddcornett

PROOP

2 months ago

Definitely taking longer on the build portion of the deploy (2x)

toddcornett

PROOP

2 months ago

Same thing on the result. The build logs look the same, all calls are getting 502s, and no more logs than before

ilyass012

FREE

2 months ago

ok so the cache was not the issue , the one thing we haven't seen yet that could explain everything is the railway.json , can you share the contents of that file?

ilyass012

ok so the cache was not the issue , the one thing we haven't seen yet that could explain everything is the railway.json , can you share the contents of that file?

toddcornett

PROOP

2 months ago

JSON below:

{
    "$schema": "https://railway.com/railway.schema.json",
    "build": {
        "builder": "RAILPACK",
        "buildCommand": "pnpm build"
    },
    "deploy": {
        "runtime": "V2",
        "numReplicas": 1,
        "startCommand": "pnpm start:prod",
        "sleepApplication": false,
        "useLegacyStacker": false,
        "ipv6EgressEnabled": false,
        "multiRegionConfig": {
            "us-west2": {
                "numReplicas": 1
            }
        },
        "restartPolicyType": "ON_FAILURE",
        "restartPolicyMaxRetries": 10
    }
}

ilyass012

FREE

2 months ago

did anything change in this railway.json file at the same time? worth checking your git history on that file specifically

ilyass012

did anything change in this railway.json file at the same time? worth checking your git history on that file specifically

toddcornett

PROOP

2 months ago

It has only been added at the beginning. No other changes to the file.

ilyass012

FREE

2 months ago

okay so we've exhausted everything external the build is fine, env vars are fine, railway.json is fine the only thing we haven't been able to see is the actual runtime crash reason because the process dies silently with zero output

ilyass012

FREE

2 months ago

at this point the crash is happening so early and so silently that we need to force node to surface the error. can you temporarily wrap your bootstrap in main.ts like this:

async function bootstrap() {
  try {
    // your existing code
  } catch (err) {
    console.error('BOOTSTRAP ERROR:', err);
    process.exit(1);
  }
}
bootstrap();

commit and redeploy, if something is throwing during startup this will force it to print the actual error in your deploy logs

ilyass012

at this point the crash is happening so early and so silently that we need to force node to surface the error. can you temporarily wrap your bootstrap in main.ts like this: ``` async function bootstrap() { try { // your existing code } catch (err) { console.error('BOOTSTRAP ERROR:', err); process.exit(1); } } bootstrap(); ``` commit and redeploy, if something is throwing during startup this will force it to print the actual error in your deploy logs

toddcornett

PROOP

2 months ago

Trying that now

toddcornett

PROOP

2 months ago

Since you last saw the logs, I added the DB migrations to ensure I wasnt failing at the DB connection on deploy so that is addition logs you see here, but as you can see nothing else:

Attachments

Screenshot%...

toddcornett

Trying that now

ilyass012

FREE

2 months ago

?

ilyass012

?

toddcornett

PROOP

2 months ago

I added the Try / Catch loop that you requested and just sent the Deploy Logs in another message

toddcornett

Since you last saw the logs, I added the DB migrations to ensure I wasnt failing at the DB connection on deploy so that is addition logs you see here, but as you can see nothing else:

ilyass012

FREE

2 months ago

this is actually really useful, your migrations run fine on the first replica, which means your database connection is working. but the second replica that runs the actual app still crashes silently after "Starting Container".

did this deploy include the try/catch change in main.ts? because there's still no error output at all from that replica, which means either the try/catch wasn't in this build yet, or the crash is happening before node even executes your code , which would point to a missing or corrupted file in the built dist/

ilyass012

this is actually really useful, your migrations run fine on the first replica, which means your database connection is working. but the second replica that runs the actual app still crashes silently after "Starting Container". did this deploy include the try/catch change in main.ts? because there's still no error output at all from that replica, which means either the try/catch wasn't in this build yet, or the crash is happening before node even executes your code , which would point to a missing or corrupted file in the built dist/

toddcornett

PROOP

2 months ago

Yes, it included the Try/Catch

ilyass012

FREE

2 months ago

greaat the try/catch was there but printed nothing, which means node is crashing before it even reaches your bootstrap function. the crash is happening at the import stage, one of your imports at the top of main.ts or a module it loads is throwing or failing to resolve on railway specifically

ilyass012

FREE

2 months ago

can you add this at the very top of main.ts, above all your imports:

process.on('uncaughtException', (err) => {
  console.error('UNCAUGHT EXCEPTION:', err);
  process.exit(1);
});

commit and redeploy , this catches errors that happen before bootstrap runs and will finaaaaaaaaaaaally show us the actual error message

ilyass012

can you add this at the very top of main.ts, above all your imports: ``` process.on('uncaughtException', (err) => { console.error('UNCAUGHT EXCEPTION:', err); process.exit(1); }); ``` commit and redeploy , this catches errors that happen before bootstrap runs and will finaaaaaaaaaaaally show us the actual error message

toddcornett

PROOP

2 months ago

Doing the change and deploying now

toddcornett

Doing the change and deploying now

toddcornett

PROOP

2 months ago

Unfortunately, same thing ...

Attachments

Screenshot%...

toddcornett

Unfortunately, same thing ...

toddcornett

PROOP

2 months ago

And the code was in there:

Attachments

Screenshot%...

ilyass012

FREE

2 months ago

omg your first import is ./instrument , in the compiled js, that runs before your uncaughtException handler even gets registered, so if it throws, nothing prints. that's why we're getting zero output

ilyass012

FREE

2 months ago

can you check what's in src/instrument.ts? if it's sentry or opentelemetry, a package version change could be causing it to throw silently on railway . try temporarily commenting out import './instrument' and redeploying

ilyass012

FREE

2 months ago

if the app starts up, that's the culprit

toddcornett

PROOP

2 months ago

Got it. It is Sentry ... trying the comment now

toddcornett

Got it. It is Sentry ... trying the comment now

ilyass012

FREE

2 months ago

nice , let me know what happens

ilyass012

nice , let me know what happens

toddcornett

PROOP

2 months ago

Same error but more logs in the Deploy Logs

Attachments

Screenshot%...

ilyass012

FREE

2 months ago

okay so commenting out sentry made the app get much further, we can finally see logs , the punycode warning is harmless can you scroll down and share what comes after those last lines? the app might have fully started or crashed further down with an actual error message this time

ilyass012

okay so commenting out sentry made the app get much further, we can finally see logs , the punycode warning is harmless can you scroll down and share what comes after those last lines? the app might have fully started or crashed further down with an actual error message this time

toddcornett

PROOP

2 months ago

There is nothing more. That is the full log

ilyass012

FREE

2 months ago

okayy , sentry was definitely part of the problem , , the punycode warning is harmless but the app is still not reaching nestjs initialization. since commenting out the sentry import helped but didn't fully fix it, is it possiblz to share the full src/instrument.ts file and the full src/main.ts file? there might be another import or something sentry-related being imported elsewhere that's still failing

toddcornett

PROOP

2 months ago

Nothing in the files of any real relevance since it is still in the development configuration (hardened for prod coming later) so here they are:

Attachments

main.txt

instrument.txt

toddcornett

Nothing in the files of any real relevance since it is still in the development configuration (hardened for prod coming later) so here they are:

toddcornett

PROOP

2 months ago

Sorry had to change the extensions because it would not allow me to upload JS and TS files

1

ilyass012

FREE

2 months ago

since both your uncaughtException and bootstrap try/catch produce zero output, the crash is likely an unhandled promise rejection which is a different event entirely. can you add this right below your uncaughtException handler:

process.on('unhandledRejection', (reason) => {
    console.error('UNHANDLED REJECTION:', reason);
    process.exit(1);
});

commit and redeploy, i hooope this finally catch and print whatever is killing your app silently

ilyass012

since both your uncaughtException and bootstrap try/catch produce zero output, the crash is likely an unhandled promise rejection which is a different event entirely. can you add this right below your uncaughtException handler: ``` process.on('unhandledRejection', (reason) => { console.error('UNHANDLED REJECTION:', reason); process.exit(1); }); ``` commit and redeploy, i hooope this finally catch and print whatever is killing your app silently

toddcornett

PROOP

2 months ago

Doing so now.

toddcornett

PROOP

2 months ago

Sidenote: have you seen this level of a problem before?

toddcornett

PROOP

2 months ago

Nothing new in the logs:

Attachments

Screenshot%...

toddcornett

Sidenote: have you seen this level of a problem before?

ilyass012

FREE

2 months ago

hahaha yes i’m a software engineer since 2019 and i’m getting used to this kind of debugging

ilyass012

hahaha yes i’m a software engineer since 2019 and i’m getting used to this kind of debugging

toddcornett

PROOP

2 months ago

I meant within Railway. Very strange that none of this is happening locally for me.

toddcornett

Nothing new in the logs:

ilyass012

FREE

2 months ago

is there anything below that? can you scroll down or not?

ilyass012

is there anything below that? can you scroll down or not?

toddcornett

PROOP

2 months ago

Nothing below that

ilyass012

FREE

2 months ago

i can say confidently that your app isn't crashing it's hanging. that's why none of your error handlers print anything, there's no error to catch. the process just freezes after the punycode warning and it is killed externally after a timeout

ilyass012

FREE

2 months ago

something during module initialization is waiting forever and never resolving. this is usually a network connection that hangs , redis or the database trying to connect eagerly during module load rather than lazily

can you check your redis and queue module configs , specifically whether they try to establish a connection at startup? also can you confirm your redis env vars (REDIS_HOST, REDIS_PORT, REDIS_PASSWORD) are all set correctly in railway? a hanging redis connection during bullmq/queue module initialization would cause this

ilyass012

something during module initialization is waiting forever and never resolving. this is usually a network connection that hangs , redis or the database trying to connect eagerly during module load rather than lazily can you check your redis and queue module configs , specifically whether they try to establish a connection at startup? also can you confirm your redis env vars (REDIS\_HOST, REDIS\_PORT, REDIS\_PASSWORD) are all set correctly in railway? a hanging redis connection during bullmq/queue module initialization would cause this

toddcornett

PROOP

2 months ago

That was my suspicion too. I am using the same config locally to connect to the Redis instance and Postgres instance on Railway. I will experiment with what other tooling I have that may be a problem. I appreciate the help and want to let you go to bed. I will update the issue with what I find.

ilyass012

FREE

2 months ago

bullmq uses ioredis under the hood, and by default ioredis only does ipv4 lookups. railway's private network uses ipv6, so bullmq silently hangs trying to connect and never resolves, which is exactly what you're seeing

the fix is to add family: 0 to your bullmq connection config to enable dual stack lookup. in your nestjs bullmq module config it should look like this:

BullModule.forRoot({
  connection: {
    host: process.env.REDIS_HOST,
    port: parseInt(process.env.REDIS_PORT),
    username: process.env.REDIS_USERNAME,
    password: process.env.REDIS_PASSWORD,
    family: 0, // this is the fix
  }
})

ilyass012

FREE

2 months ago

and also go to your redis service on railway, check the "connect" tab and make sure you're using the private networking variables (the ones ending in railway.internal) in your backend service's env vars

ilyass012

bullmq uses ioredis under the hood, and by default ioredis only does ipv4 lookups. railway's private network uses ipv6, so bullmq silently hangs trying to connect and never resolves, which is exactly what you're seeing the fix is to add family: 0 to your bullmq connection config to enable dual stack lookup. in your nestjs bullmq module config it should look like this: ``` BullModule.forRoot({ connection: { host: process.env.REDIS_HOST, port: parseInt(process.env.REDIS_PORT), username: process.env.REDIS_USERNAME, password: process.env.REDIS_PASSWORD, family: 0, // this is the fix } }) ```

toddcornett

PROOP

2 months ago

I will give it a shot and check the Redis vars

ilyass012

FREE

2 months ago

okay see you tomorrow, good night

ilyass012

and also go to your redis service on railway, check the "connect" tab and make sure you're using the private networking variables (the ones ending in railway.internal) in your backend service's env vars

toddcornett

PROOP

2 months ago

UNBELIEVEABLE! The entire issue: REDIS_PASSWORD="${{Redis.REDISPASSWORD}}}}" which render the password with a trailing '}}' and nothing through an error or notice at all when it tried to connect. I only noticed it when I deliberately looked at the value filled in on the Variables tab.

I am so sorry for basically wasting your time.