Recurring Deployment Crashes - Chatwoot-K4EO Production Environment
axisor
PROOP

4 months ago

Hi Railway Support Team,

My deployment has crashed for the second time in the production environment without any changes made to the application.

Error details:

- "multirun: one or more of the provided commands ended abnormally"

- Server was already running (pid: 29, file: /app/tmp/pids/server.pid)

- Rails 7.1.5.2 application

- Deployment: Chatwoot-K4EO in Hermes Project — Clinimed

Concerns:

1. This is happening without any code changes or configuration updates

2. The crashes appear to be environment-related rather than application-related

3. I'm paying for a managed service specifically to avoid having to constantly monitor and restart servers

Could you please investigate what's causing these recurring crashes and provide a solution to ensure deployment stability?

I've attached screenshots of the deploy logs and crash notification.

Thank you,

Carlos

CTO, Axisor Technologies Brasil

Attachments

$10 Bounty

6 Replies

4 months ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open brody 4 months ago


bytekeim
PRO

4 months ago

could you please share your current railway deployment 'Start Command' for the Chatwoot service?


bytekeim
PRO

4 months ago

to fix this , go to ur Chatwoot-K4EO service in Railway. then go to settings → scroll down to "Custom Start Command"

and set et the "Custom Start Command" exactly to this:

multirun "rm -f tmp/pids/server.pid && bin/rails server -b 0.0.0.0 -p $PORT" "bundle exec sidekiq"

let me know if it fix it for you


bytekeim

to fix this , go to ur Chatwoot-K4EO service in Railway. then go to settings → scroll down to "Custom Start Command"and set et the "Custom Start Command" exactly to this:multirun "rm -f tmp/pids/server.pid && bin/rails server -b 0.0.0.0 -p $PORT" "bundle exec sidekiq"let me know if it fix it for you

axisor
PROOP

3 months ago

Hi bytekeim,

Thank you for your continued support on this issue.

We've implemented the Custom Start Command exactly as suggested:

multirun "rm -f tmp/pids/server.pid && bin/rails server -b 0.0.0.0 -p $PORT" "bundle exec sidekiq"

Unfortunately, the issue persists. We've also tried the following variations:

1. Using absolute path: rm -f /app/tmp/pids/server.pid

2. Removing and recreating the entire pids directory: rm -rf /app/tmp/pids && mkdir -p /app/tmp/pids

The deployment logs still show:

A server is already running (pid: 29, file: /app/tmp/pids/server.pid).

Exiting

multirun: one or more of the provided commands ended abnormally

Followed by healthcheck failures after 14 retry attempts.

Our current hypothesis:

The PID file seems to be persisting across deployments, possibly due to volume mounting or container restart behavior. The rm -f command may be executing, but the file reappears before Rails starts, or the volume is being mounted after the cleanup command runs.

We're actively investigating:

- Volume mount timing and persistence behavior

- Alternative start command strategies

- Potential Railway-specific configuration issues

Do you have any insights on:

1. Whether the chatwoot-k4eo-volume might be causing PID file persistence?

2. If there's a pre-deploy cleanup step we should configure?

3. Any Railway-specific Chatwoot deployment best practices?

We appreciate your help and are committed to resolving this to ensure service stability.

Best regards,

Thiago

Axisor Developer Team


bytekeim
PRO

3 months ago

Hey Thiago,

thx for the update. sucks that the custom command didn't fully nail it yet, but I think we're close. from what I've seen in other Railway setups with Chatwoot, yeah, that chatwoot-k4eo-volume is prob the culprit for the PID file sticking around. If it's mounted too broadly like at /app or /app/tmp, it keeps temp files alive across restarts, which messes with the rm command timing. Railway's restarts don't wipe the ephemeral stuff like a full redeploy does, so stale pids hang out.

first off, check ur service settings and tweak the volume mount to just /app/storage – that's where Chatwoot dumps attachments if ur using local storage (make sure ACTIVE_STORAGE_SERVICE=local in env vars).

If ur on cloud like S3, ditch the volume altogether to stop the persistence bs.

for the start command, try this tweak to run the rm before multirun kicks in:

rm -f /app/tmp/pids/server.pid && multirun "bin/rails server -b 0.0.0.0 -p $PORT" "bundle exec sidekiq"

that should clean it sequentially. If u got a Dockerfile, toss the rm in the CMD for the web part too.

long-term, I'd split into separate services: one for web (just bin/rails server -b 0.0.0.0 -p $PORT ) and one for worker (bundle exec sidekiq).

share the Postgres/Redis between 'em – cuts down on multirun flakiness and makes restarts smoother.

on pre-deploy cleanup, Railway doesn't have built-in hooks, but if perms are wonky, add a chown in ur entrypoint or something. For Chatwoot best practices on Railway: stick to the official template, set RAILS_ENV=production, use cloud storage to avoid volume headaches, and monitor memory in the dashboard cuz stuff like Gmail IMAP can spike and cause OOM crashes leading to this loop.

lmk if that fixes it or if logs show somethin else


axisor
PROOP

2 months ago

Hey @bytekeim,

Thanks again for the detailed guidance. Unfortunately, the situation has deteriorated significantly since our last exchange.

Current Status: Critical

The deployment is now crashing constantly - we can't keep it stable even for short periods. Every redeploy attempt results in the same PID file error followed by healthcheck failures. This has moved from "occasional crashes" to "complete service unavailability."

What We've Implemented:

  1. white_check_mark emoji Changed volume mount to /app/storage only (not /app or /app/tmp)

  2. white_check_mark emoji Verified ACTIVE_STORAGE_SERVICE=local in environment variables

  3. white_check_mark emoji Updated start command to:

bash

rm -f /app/tmp/pids/server.pid && multirun "bin/rails server -b 0.0.0.0 -p $PORT" "bundle exec sidekiq"
```

**The Problem Persists:**

Despite these changes, the PID file issue continues. Latest crash logs (Jan 8, 2026 06:52:24) show:
```
A server is already running (pid: 41, file: /app/tmp/pids/server.pid).
Exiting
multirun: one or more of the provided commands ended abnormally

This leads to healthcheck timeout and deployment failure every single time.

Our Hypothesis:

Even with the volume mount restricted to /app/storage, something in Railway's container lifecycle is preserving or recreating the PID file before Rails can start cleanly. The rm -f command appears to execute, but the stale PID persists or gets regenerated.

What We Need Help With:

  1. Is there a Railway-specific configuration we're missing that could be causing PID file persistence outside the volume mount?

  2. Should we try the service split approach you mentioned (separate web and worker services) as an emergency measure? We're concerned this might cause additional downtime during migration.

  3. Could this be related to Railway's restart behavior vs full redeploys? Should we be forcing full redeploys instead of restarts?

Business Impact:

Our production customers (Clinimed - occupational health clinic) are actively complaining about service unavailability. Every hour of downtime affects their patient care operations. We need to resolve this urgently or consider migrating away from Railway entirely.

Would you be available for a more in-depth troubleshooting session? We can provide:

  • Full service configuration screenshots

  • Complete deploy logs

  • Environment variables (sanitized)

  • Current Dockerfile (if relevant)

We're committed to getting this stable, but at this point we may need Railway engineering team involvement if this is a platform-level issue.

Thanks for your continued support,

Carlos & Thiago
Axisor Technologies Brasil


Anonymous
HOBBY

2 months ago

Hello Team. I am having the same problem with my Chatwoot instance for few weeks ago. It crashes unexpectedly without any changes made to the deployment variables nor the application. I attach the logs of the issue. Maybe have you found a solution to this problem? Because I followed this thread and I realized that no solution has been found yet.

Attachments


Loading...