Highly variable cold start time on serverless mode (1s vs 7s after "Starting Container")
heru
PROOP

a month ago

Hi team,

I'm running a Laravel app (FrankenPHP, built from a custom Dockerfile) on serverless mode and noticing significant variability in cold start time when the container wakes up from sleep.

Looking at the deploy logs, I measure the gap between Railway's Starting Container log line and the first stdout line from my container. Across cold starts, this gap ranges from ~1 second to ~7 seconds, on the same deployment with no code changes between observations.

Example (slow case):

```

Apr 17 2026 13:21:04 web Starting Container

Apr 17 2026 13:21:11 web

```

A few things I've checked:

- App-side startup is consistent — same CMD, same image digest, same env.

- The CMD is simple php artisan config:cache && exec frankenphp run ...); nothing in there should vary by 6 seconds.

- No external network calls happen before the first stdout line.

- Image size is 533 MB, so I don't think image pull alone explains the worst case.

My initial guess was image caching at the host level, but given the modest image size, I suspect the variability might come from somewhere else in the platform (host scheduling, container runtime init, registry latency on cold hosts, etc.).

A few questions:

1. What typically dominates the Starting Container → first-app-log window in serverless mode? Is it image pull, runtime init, network/proxy attach, or something else?

2. Does Railway pin serverless containers to recently-used hosts, or is scheduling effectively random across the fleet? That would explain the bimodal 1s vs 7s pattern.

3. Are there any best practices for cold start on serverless beyond shrinking image size? (e.g., base image choices, layer ordering, registry colocation, keep-alive tricks.)

4. Is there a way to get a more detailed breakdown of what happens during that window, so I can know whether to optimize my container or accept the variability as platform-side?

Thanks!

Solved$20 Bounty

Pinned Solution

silasmuyembi0-cyber
HOBBY

a month ago

Hey heru,

Ran into almost the exact same bimodal pattern a few weeks back on a Symfony app (also FrankenPHP, custom Dockerfile, serverless). That 1s vs 7s gap drove me nuts for a while so hopefully some of what I found helps.

From what I've seen, the Starting Container first stdout window on Railway serverless is basically a composite of a few things, and only one of them is

really your image:

Host scheduling / cold placement when your container gets scheduled onto a host that hasn't recently pulled your image, you pay the full pull + unpack cost. When it lands on a warm host (image already in the local cache), you skip most of it. That alone explains a big chunk of bimodal behavior. 533MB isn't huge but it's not tiny either, and the unpack step (not just the pull) is what usually dominates especially if you have a few fat layers near the top of the image.

Runtime init / sandbox setup there's some platform-side overhead before your ENTRYPOINT even fires. Usually sub-second, but I've seen it spike occasionally. Not much you can do about this one.

FrankenPHP warmup itself php artisan config:cache && exec frankenphp run looks innocent but config:cache does a full bootstrap of the container (service providers, config merge, etc). On a cold filesystem with cold opcache, that can easily be 1–3s on its own depending on how many providers you have. The variance here is lower than you'd think, but it's not zero.

A few things that actually moved the needle for me:

Move config:cache (and route:cache, event:cache, view:cache) into the Dockerfile build step, not runtime. There's almost no reason to run them on every cold start bake them into the image. That alone cut ~1.5s off my worst cases. Just make sure you're not caching anything that depends on runtime env vars that differ between deploys (and if you are, fix that first it's a footgun anyway).

Layer ordering matters more than people say on serverless. Put your vendor/composer install and your app code in separate layers, with vendor earlier. When Railway pulls, layers that haven't changed can theoretically be reused from the host cache across deploys. I also squashed a couple of intermediate layers that were just noise.

Slim the image. 533MB I got mine from ~480MB down to ~180MB by switching to the frankenphp:-alpine base and being aggressive with multi-stage builds (build stage has composer + dev deps, final stage only copies vendor/ and the built app). Smaller image = faster pull on cold hosts = less variance. This is the single biggest lever for the worst-case tail.Check your .dockerignore. I had node_modules and a .git folder sneaking in on one project which was adding ~80MB for literally no reason.

Healthcheck / first-request readiness if your "first stdout" is actually FrankenPHP logging that it's listening, then what you're measuring includes FrankenPHP's own boot. You can add an early echo or a log line at the very top of your entrypoint script to separate "container started" from "app ready" that'll tell you whether the variance is in platform pull/schedule or in your app boot.

To your specific questions:

In my experience, on the 7s tail it's mostly image pull/unpack on a cold host. On the 1s path it's warm host + cached layers and you're basically just paying app boot. The platform init itself is pretty consistent.

Yes, pretty sure Railway pins serverless containers to recently-used hosts when possible that's why you get the bimodal distribution instead of a smooth curve. Cold host = full pull, warm host = near-instant. Not random, just depends on whether you got lucky.

Covered above smallest image you can get away with, bake all caches at build time, multi-stage, alpine base, keep-alive / min instances if Railway offers it on your plan (worth checking, I think they added something like that).

For the platform-side variance specifically, I don't think you'll get a line-by-line breakdown from Railway that part's kind of a black box. But you can infer it: if you time your own entrypoint from line 1 and still see the variance outside your script, then it's pull/schedule. If the variance is inside your script, it's app boot and you can fix it.Honestly, if you get your image under ~200MB and move caching to build time, I'd bet the 7s tail drops to 2–3s worst case. The remaining variance is just the reality of serverless cold starts on shared infra.

Good luck, curious what you find dro

p an update if you try any of this.

2 Replies

Status changed to Awaiting Railway Response Railway about 1 month ago


Railway
BOT

a month ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open Railway about 1 month ago


silasmuyembi0-cyber
HOBBY

a month ago

Hey heru,

Ran into almost the exact same bimodal pattern a few weeks back on a Symfony app (also FrankenPHP, custom Dockerfile, serverless). That 1s vs 7s gap drove me nuts for a while so hopefully some of what I found helps.

From what I've seen, the Starting Container first stdout window on Railway serverless is basically a composite of a few things, and only one of them is

really your image:

Host scheduling / cold placement when your container gets scheduled onto a host that hasn't recently pulled your image, you pay the full pull + unpack cost. When it lands on a warm host (image already in the local cache), you skip most of it. That alone explains a big chunk of bimodal behavior. 533MB isn't huge but it's not tiny either, and the unpack step (not just the pull) is what usually dominates especially if you have a few fat layers near the top of the image.

Runtime init / sandbox setup there's some platform-side overhead before your ENTRYPOINT even fires. Usually sub-second, but I've seen it spike occasionally. Not much you can do about this one.

FrankenPHP warmup itself php artisan config:cache && exec frankenphp run looks innocent but config:cache does a full bootstrap of the container (service providers, config merge, etc). On a cold filesystem with cold opcache, that can easily be 1–3s on its own depending on how many providers you have. The variance here is lower than you'd think, but it's not zero.

A few things that actually moved the needle for me:

Move config:cache (and route:cache, event:cache, view:cache) into the Dockerfile build step, not runtime. There's almost no reason to run them on every cold start bake them into the image. That alone cut ~1.5s off my worst cases. Just make sure you're not caching anything that depends on runtime env vars that differ between deploys (and if you are, fix that first it's a footgun anyway).

Layer ordering matters more than people say on serverless. Put your vendor/composer install and your app code in separate layers, with vendor earlier. When Railway pulls, layers that haven't changed can theoretically be reused from the host cache across deploys. I also squashed a couple of intermediate layers that were just noise.

Slim the image. 533MB I got mine from ~480MB down to ~180MB by switching to the frankenphp:-alpine base and being aggressive with multi-stage builds (build stage has composer + dev deps, final stage only copies vendor/ and the built app). Smaller image = faster pull on cold hosts = less variance. This is the single biggest lever for the worst-case tail.Check your .dockerignore. I had node_modules and a .git folder sneaking in on one project which was adding ~80MB for literally no reason.

Healthcheck / first-request readiness if your "first stdout" is actually FrankenPHP logging that it's listening, then what you're measuring includes FrankenPHP's own boot. You can add an early echo or a log line at the very top of your entrypoint script to separate "container started" from "app ready" that'll tell you whether the variance is in platform pull/schedule or in your app boot.

To your specific questions:

In my experience, on the 7s tail it's mostly image pull/unpack on a cold host. On the 1s path it's warm host + cached layers and you're basically just paying app boot. The platform init itself is pretty consistent.

Yes, pretty sure Railway pins serverless containers to recently-used hosts when possible that's why you get the bimodal distribution instead of a smooth curve. Cold host = full pull, warm host = near-instant. Not random, just depends on whether you got lucky.

Covered above smallest image you can get away with, bake all caches at build time, multi-stage, alpine base, keep-alive / min instances if Railway offers it on your plan (worth checking, I think they added something like that).

For the platform-side variance specifically, I don't think you'll get a line-by-line breakdown from Railway that part's kind of a black box. But you can infer it: if you time your own entrypoint from line 1 and still see the variance outside your script, then it's pull/schedule. If the variance is inside your script, it's app boot and you can fix it.Honestly, if you get your image under ~200MB and move caching to build time, I'd bet the 7s tail drops to 2–3s worst case. The remaining variance is just the reality of serverless cold starts on shared infra.

Good luck, curious what you find dro

p an update if you try any of this.


heru
PROOP

a month ago

Hey! Most of this worked out really well 🙌

Switched to the frankenphp alpine base with proper multi-stage — image went from 533MB all the way down to 175MB. Also did a quick .dockerignore pass and found a few things worth pruning while I was at it.

Cold start is sitting at 1–2s consistently now. Still get the occasional slower one (probably just the cold-host pull like you mentioned), but honestly at this point I don't really care — it's totally fine for what I need.

Really appreciate you taking the time to write all that out, it helped a ton. Marking resolved!


Status changed to Solved brody about 1 month ago


Welcome!

Sign in to your Railway account to join the conversation.

Loading...