Postgres crash loop - volume resize not taking effect, production down

alinamichelle

PROOP

4 months ago

Postgres is stuck in a crash loop with "FATAL: could not write to file

pg_wal/xlogtemp.30: No space left on device" even after resizing volume

to 20GB. Volume metrics show only 4GB used of 20GB so the resize worked

but new containers are still crashing with the same error. Redeployed 5+

times, same result every time. The volume resize is not being picked up.

Volume ID: vol_5yyrdxpxdpuhc3qa

Project: thriving-communication

Environment: production

Service: Postgres (fbc1c5c7-48c6-48a2-b...)

Need the volume remounted or WAL cleared manually.

Production is completely down.

Solved

14 Replies

alinamichelle

PROOP

4 months ago

Still down, production impacted. Volume ID: vol_5yyrdxpxdpuhc3qa

alinamichelle

PROOP

3 months ago

Still down. 10+ hours. No response.

chandrika

EMPLOYEE

3 months ago

We can confirm your Postgres service is in a crash state and your application is unable to connect to it. This is a known issue where the volume resize completes at the storage level but the filesystem is not properly expanded, so Postgres still sees the old disk size. This requires manual intervention from our infrastructure team to extend the filesystem. We're escalating this now to get it resolved as quickly as possible.

Status changed to Awaiting User Response Railway • 4 months ago

alinamichelle

PROOP

3 months ago

thanks for the update!!

Status changed to Awaiting Railway Response Railway • 4 months ago

chandrika

EMPLOYEE

3 months ago

Anytime! ..and we're still looking into this

Status changed to Awaiting User Response Railway • 4 months ago

alinamichelle

PROOP

3 months ago

Thank you

Status changed to Awaiting Railway Response Railway • 4 months ago

brody

EMPLOYEE

3 months ago

Your volume resize to 20 GB didn't fully apply to the underlying disk, which is why Postgres kept seeing the old size and running out of space. We've corrected this on our end and redeployed your Postgres service. You should now have the full 20 GB available.

We've also shipped a fix to prevent this from happening on future volume resizes.

Status changed to Awaiting User Response Railway • 4 months ago

alinamichelle

PROOP

3 months ago

Thank you so much!

Status changed to Awaiting Railway Response Railway • 4 months ago

Status changed to Solved brody • 4 months ago

alinamichelle

PROOP

3 months ago

Thanks for the fix guys - we're running a big data import that I didnt realize we needed bigger size. Just want to confirm its ok for me to do live resize to 100gb without it crashing again?

Status changed to Awaiting Railway Response Railway • 4 months ago

chandrika

EMPLOYEE

3 months ago

Yes, the fix we shipped addresses the filesystem extension issue you experienced, so live resizing going forward should work without any crashes. One thing to note: Pro plan volumes can be self-serve resized up to 50GB. For 100GB, we can increase that limit for you on our end - just confirm and we'll get it bumped up.

Status changed to Awaiting User Response Railway • 4 months ago

alinamichelle

PROOP

3 months ago

yes please resize to 100gb - thanks again!

Status changed to Awaiting Railway Response Railway • 4 months ago

alinamichelle

yes please resize to 100gb - thanks again!

chandrika

EMPLOYEE

3 months ago

Hey, actually, we recently made some changes to the flow and you may be able to self-serve this already - could you please try "Live Resize" your volume from the dashboard?

Status changed to Awaiting User Response Railway • 4 months ago

chandrika

Hey, actually, we recently made some changes to the flow and you may be able to self-serve this already - could you please try "Live Resize" your volume from the dashboard?

alinamichelle

PROOP

3 months ago

all set thank you!!

Status changed to Awaiting Railway Response Railway • 4 months ago

alinamichelle

all set thank you!!

chandrika

EMPLOYEE

3 months ago

Anytime, and happy building 🚅 !!

Status changed to Awaiting User Response Railway • 4 months ago

Status changed to Solved chandrika • 4 months ago

Welcome!