audio generation
gusolive54
PROOP

7 days ago

Hi Railway team,

I have a web app hosted on Railway (brainsharpener.co.uk) which generates audio dynamically.

The developer has noticed that locally the first audio generation request takes around 3–5 seconds, but on the live Railway deployment the first request is often taking around 7–10+ seconds.

Subsequent audio requests are much faster.

We suspect this may be related to cold starts, instance wake-up time, or current server resource allocation.

Could you please advise:

  • whether the service is likely sleeping between requests
  • whether there are settings to reduce cold-start latency
  • whether changing deployment settings would help
  • and whether there are best practices for reducing first-request response times for audio generation workloads on Railway

Thank you.

$20 Bounty

5 Replies

Railway
BOT

7 days ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open Railway 7 days ago


Is serverless enabled? (You can check through service settings > scroll down)


7 days ago

I think the local vs Railway timing difference might actually be a bit of a red herring, because the behaviour seems tied to how the audio is being generated rather than just server startup time. I can consistently reproduce the issue you've mentioned even without any long cooldown between requests, which makes me think the containers probably aren’t spinning down between plays. But that's only an assumption because I've only seen the frontend!

I did some quick testing here:

https://www.brainsharpener.co.uk/search?q=Sacrifice

What I noticed:

  1. Click play on any audio
  2. It takes a while to “prepare audio”
  3. Once the first audio starts playing, the next couple load almost instantly

There’s also an option to skip to the next audio while one is already playing, and those load immediately too... but after a few tracks it buffers again. And then the next couple load almost instantly again.

That makes me think the app is pre-generating the next few audio files in advance so playlist navigation feels instant?

The downside is that the very first click becomes slow because it’s generating multiple audio files that the user hasn’t actually requested yet.

Might be worth changing the behaviour to something like:

  1. User clicks audio --> cancel any ongoing generations and generate only that audio first
  2. Once playback starts, begin generating the next audio in the background
  3. If the user keeps listening/skipping through the playlist, then pre-generate the next 2–3 items

That would probably make the first interaction feel much faster while still keeping the smooth playlist behaviour.


dantor22
PROTop 10% Contributor

7 days ago

▎ The 4-5s gap between local and Railway on the first request then back to normal is a pretty textbook signature of cold

▎ initialization happening lazily on the first request instead of at boot. Two layers contribute and you can fix both:

▎ Layer 1 — Container sleep (only if applicable)

▎ If your service has App Sleeping enabled (Settings → "App Sleeping" / serverless toggle on Hobby plans) the container is

▎ fully suspended after idle time. First request has to wake it up. Disable it if your on a plan that allows always-on. Thats

▎ what the moderator was hinting at — check that toggle first.

▎ But sleep alone doesnt explain a consistent 4-5s extra even on warm services, which brings us to:

▎ Layer 2 — Model / dependency lazy-loading on first request (the real culprit imo)

▎ Locally you usually have the model in OS page cache from a previous run, venv warm, sometimes the dev server already

▎ running. On Railway every fresh container starts cold:

▎ - Model weights read from disk for the first time (Whisper/Bark/XTTS etc are hundreds of MB to multi-GB)

▎ - torch / transformers / librosa imports happen on first call if your importing inside the request handler

▎ - First inference compiles CUDA/MPS kernels — but your on CPU on Railway so even more relevant if doing any JIT

▎ (torch.compile, ONNX session init)

▎ - First HTTPS call to any external API (ElevenLabs / OpenAI / etc) eats DNS + TLS handshake

▎ Fix: do a real warmup at startup, not at first request

▎ # at module import time, not inside the route handler

▎ from your_audio_lib import model

▎ _ = model.generate("warmup", duration=0.1) # dummy inferencee

▎ If using FastAPI/Flask put it in the app startup event:

▎ @app.on_event("startup")

▎ async def warmup():

▎ load_model()

▎ run_dummy_inference()

▎ prewarm_http_clients() # one hit to ElevenLabs/OpenAI if applicable

▎ Then in Railway add a healthcheck path (Settings → Deploy → Healthcheck Path) that only returns 200 after warmup is done —

▎ that way the edge wont route real traffic until the container is actually ready and your users never see the cold path.

▎ Quick way to confirm the diagnosis before you change code:

▎ Hit a /healthz (or any cheap endpoint) right after deploy, then call audio generation. If the gap is still there, its

▎ model/lib loading. If you hit the audio endpoint twice back-to-back right after deploy and only the first is slow, same

▎ conclusion. If you wait 15 min idle and the first call is slow again → App Sleeping is also in play.

▎ Re the "pre-generating multiple files" reply above — thats a real pattern but it would make all requests slow, not just the

▎ first. Doesnt match your symptoms.

▎ Move the cold work to startup + add a proper healthcheck and the user-facing latency should drop to local-equivalent.


emms

I think the local vs Railway timing difference might actually be a bit of a red herring, because the behaviour seems tied to how the audio is being generated rather than just server startup time. I can consistently reproduce the issue you've mentioned even without any long cooldown between requests, which makes me think the containers probably aren’t spinning down between plays. But that's only an assumption because I've only seen the frontend! I did some quick testing here: <https://www.brainsharpener.co.uk/search?q=Sacrifice> What I noticed: 1. Click play on any audio 2. It takes a while to “prepare audio” 3. Once the first audio starts playing, the next couple load almost instantly There’s also an option to skip to the next audio while one is already playing, and those load immediately too... but after a few tracks it buffers again. And then the next couple load almost instantly again. That makes me think the app is pre-generating the next few audio files in advance so playlist navigation feels instant? The downside is that the very first click becomes slow because it’s generating multiple audio files that the user hasn’t actually requested yet. Might be worth changing the behaviour to something like: 1. User clicks audio --> cancel any ongoing generations and generate only that audio first 2. Once playback starts, begin generating the next audio in the background 3. If the user keeps listening/skipping through the playlist, then pre-generate the next 2–3 items That would probably make the first interaction feel much faster while still keeping the smooth playlist behaviour.

7 days ago

I came back to this a few hours later and tested again. Looking at Developer Tools it seems my assumption is correct. I loaded the website, clicked "Play" on the first audio whilst watching Developer Tools (Network) - and it won't play until it has successfully generated 3 Audio files. So it's always a slow start because instead of generating the 10 seconds (example) of Audio I've requested, it could be generating 60+ seconds (example) before playing my first request for 10 seconds. This then continues as it pre-generates the next couple of Audio files. It's a great concept but the logic needs to be changed for the first Audio generation.

Respond with the first requested passage before pre-generating the next two.

Screenshot 2026-05-15 202146.png

The reason this likely feels less noticeable locally is that the local development environment probably has significantly more available CPU/RAM resources than the Railway instance being used, so the same generation workload completes faster overall.

Based on the observed behaviour I don't think there's like much can be done within Railway to significantly improve this specific issue at an infrastructure level. It looks primarily application-side.

Attachments


dantor22
PROTop 10% Contributor

7 days ago

Nice debugging @dhemms — the DevTools check was exactly the right move here, my cold-init theory clearly didnt match the

actual symptom once you showed the 3-audio prefetch in the network tab. Thee CPU/RAM point on top of it ties it together well. Hope OP gets the playback logic refactored, the fix you outlined (play first passage before pre-generating the next 2) is the right call!


Welcome!

Sign in to your Railway account to join the conversation.

Loading...