Cannot deploy voice agent - Request to Raise
pids.max
ramon
PROOP

9 months ago

Hi Railway team,

TL;DR

Our containerized LiveKit voice agent fails on Railway because the per-container PID ceiling is set to ≈ 1 000. The agent legitimately needs around 1 000 PIDs at boot on a 32-vCPU Metal host (idle worker processes x Tokio runtime threads). We’re asking to raise pids.max to a higher value (e.g. 8 192 or unlimited, matching Render/GCP) so the workload can run without panicking.

Issue description

Our voice agent warms N = ceil(CPU Cores) subprocesses (using `multiprocessing.forkserver`) to slash cold-start latency. In turn, each subprocess then spins a Rust/Tokio multi-thread runtime (using livekit-rtc, default is 1 worker per host core), in our case (32 vCPUs), that's 32 forks x 32 threads = 1024 PIDs before the first room is handled.

Because railway caps the max PIDs to 1000 (as per this thread https://station.railway.com/questions/how-do-i-use-all-cp-us-bdad33f4, lit "No restrictions. We do limit to 1000 PIDs, but that shouldn't be an issue."), our agent crashes immediately after starting

Requested intervention

We would like to request the railway team to increase the per-container pids.max > 8000 or unlimited. Other platforms such as Render or Heroku already do this (in fact, we tested deploying our agent container there and it worked out of the box).

Alternatively, allow us to set the PID limit per deployment.

In our case, it should be safe because each Tokio thread is lightweight (stack = 2 MiB virtual, ~16 KiB RSS until busy) and also, the agent enforces its own memory/CPU guards (WorkerOptions.job_memory_limit_mb, load shedding) so it won’t oversubscribe the node. But again, other PaaS platforms already do the same, we could migrate but don't want to because we really like railway.

Raising the cap will let us keep low-latency warm pools without rewriting core LiveKit orchestration logic, and keeps Railway feature-par with other hosts.

Happy to provide any additional metrics, flamegraphs, or run a test container if that helps.

Thanks a ton!

Ramón

Edit: we are currently able to deploy a capped version of our agent because we manually set the TOKIO_WORKER_THREADS to prevent reaching the pids.max but it's a hacky solution for something that should work out of the box.

Solved

6 Replies

Railway
BOT

9 months ago

Hello!

We're acknowledging your issue and attaching a ticket to this thread.

We don't have an ETA for it, but, our engineering team will take a look and you will be updated as we update the ticket.

Please reply to this thread if you have any questions!


Status changed to Awaiting User Response chandrika 9 months ago


Railway
BOT

9 months ago

🛠️ The internal ticket User requesting to raise max PIDs has been marked as todo.


chandrika
EMPLOYEE

9 months ago

Hi Ramón!

Heard back and the platform team is actually working on Runtime v3 which will not have this limit. We could have you on the beta (this quarter) if you can wait 1.5-2 months max.

They did mention a few notes though:
The PID limit limits OS threads, so if you're using greenthreads (tokio tasks), it shouldn't need as many OS threads. That many OS threads will expose you to a lot of context switch overhead vs. using a pool of 32 worker threads with 1000's of greenthreads (tasks) split between them.

Despite Runtime v3 which we could have you on beta for, we suggest running with fewer OS threads if using an async framework, since only max 32 OS threads will be schedulable on the CPU at any given time.


ramon
PROOP

8 months ago

Hi!

Thank you for the quick reply (and apologies for my delay sweat_smile emoji)

I will look into implementing your team's suggestions but unfortunately I don't think there's much I can do because LiveKit is an external project and forking it just to tweak the threading is not worth it.

I would love to be on the Runtime V3 beta (would it be possible to upgrade just one environment so I don't expose my production traffic to beta infrastructure?).

Best Regards!


Status changed to Awaiting Railway Response Railway 9 months ago


Hey Ramon,

Giving you an update here. We have the new runtime live for builds on the platform. We're still a way out to getting this in your hands, but it's something we're planning to do hopefully by end of month. I will send it your way as soon as we're able to.


Status changed to Awaiting User Response Railway 9 months ago


Railway
BOT

6 months ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway 6 months ago


Railway
BOT

a month ago

❌ The ticket Increased maximum process limits has been marked as canceled.


Loading...