Postgres database crashing due to disk space - Need volume increase immediately
shamanop
PROOP

a month ago

Hi Railway team,

My production Postgres database is in a crash loop and my application is completely down. The logs show:

FATAL: could not write to file "pg_wal/xlogtemp.28": No space left on device

Details:

  • Project: VP-CRM

  • Service: Postgres (currently showing "Crashed")

  • Plan: Pro (just upgraded)

  • Account email: [YOUR EMAIL]

The database keeps trying to recover but fails due to no disk space for WAL files. I cannot find volume resize options in the dashboard for managed Postgres.

Request: Please increase my Postgres volume size to 10GB (or whatever is needed) immediately so the database can complete recovery.

This is blocking my production application and affecting customers right now.

Thank you!

Solved$10 Bounty

Pinned Solution

darseen
HOBBYTop 1% Contributor

a month ago

Here is a workaroud to run the command pg_resetwal -f /var/lib/postgresql/data/pgdata in your crashing service. Since the postgres service crashes before you can SSH, you can set Custom Start Command in your service settings to sleep infinity (see the attached image).
Now you can right click on your service, copy the ssh command and connect to it.

I did test this in my workspace, I crashed the db service and tested the Start Command, and was able to ssh this way to run the command.
Please keep in mind that you might need to increase max_wal_size as well, if it didn't work.

Hope this helps.

Attachments

10 Replies

Railway
BOT

a month ago

Hey there! We've found the following might help you get unblocked faster:

If you find the answer from one of these, please let us know by solving the thread!


brody
EMPLOYEE

a month ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open brody about 1 month ago


darseen
HOBBYTop 1% Contributor

a month ago

You can find the option to grow your volume when you click on it, and go to settings. There you'll see the option to grow your volume.
Check the images I attached to see exactly where to find it.


shamanop
PROOP

a month ago

Update - Volume resize didn't fix it, need Railway support intervention

Thank you @darseen - I found the volume resize option and successfully increased the volume to 250GB. However, the database is still crashing with the same error:

redo done at 0/26FFFE08 system usage: CPU: user: 0.09 s, system: 0.13 s, elapsed: 2.91 s
FATAL: could not write to file "pg_wal/xlogtemp.29": No space left on device

Critical observation: The recovery actually COMPLETES successfully (redo done) - meaning my data is intact. It only crashes when trying to write the checkpoint afterward.

What I've tried:

  • Resized volume to 250GB (shows 491MB/250GB used - plenty of space)

  • Created a backup (491MB) and attempted restore

  • Tried restoring to multiple different Postgres services

  • All attempts fail with the same WAL space error

SSH into a working Postgres shows:

/dev/zd3392  46G  47M  46G  1% /var/lib/postgresql/data

The mounted volume has space, but pg_wal/xlogtemp.29 appears to be writing to container ephemeral storage, not the mounted volume.

What I need from Railway support:

  1. Run pg_resetwal -f /var/lib/postgresql/data/pgdata on my crashed Postgres volume to clear the corrupted WAL state, OR

  2. Provide shell access during the brief recovery window so I can run it myself, OR

  3. Allow me to download my backup file so I can recover locally

The data IS recoverable - the redo completes. We just need to clear the WAL checkpoint that's failing.

Project: VP-CRM
Service: Postgres
Volume: postgres-2026-01-12 -2026-01-12 17:39 UTC (491MB data)

This is production data for a live business. Customers cannot access quotes. Any help would be greatly appreciated.


brody
EMPLOYEE

a month ago

I'm sorry, but these are unmanaged databases. We cannot provide support for them. I will step back now and let the community continue to assist you.


shamanop
PROOP

a month ago

@brody With respect, this is a platform infrastructure issue, not a database administration question.

The problem is that pg_wal is writing to container ephemeral storage instead of the mounted volume - that's a Railway platform configuration issue. I resized my volume to 250GB and it shows plenty of space (491MB/250GB), yet PostgreSQL still fails with "No space left on device."

I'm not asking for help with SQL queries or database optimization. I'm asking for:

  1. Access to my own data that's stored on Railway's infrastructure

  2. Shell access to run a single recovery command (pg_resetwal)

  3. Or simply download my backup file that I created through Railway's backup feature

I'm a paying Pro customer. My production application is down. Customers cannot access their quotes. The data IS recoverable - your own logs show recovery completes successfully before the WAL write fails.

If Railway cannot provide any assistance with accessing data stored on your platform, please escalate this to someone who can help, or refund my Pro subscription so I can migrate to a provider that supports their customers


brody

I'm sorry, but these are unmanaged databases. We cannot provide support for them. I will step back now and let the community continue to assist you.

shamanop
PROOP

a month ago

Is there any way to download the database? The data is recoverable... I just do not have an option to do it.

I am happy to pay whatever bounty to have this issue fixed or have someone help me.


darseen
HOBBYTop 1% Contributor

a month ago

I have something cooking up to help you fix this. I'm just testing it in my workspace to ensure it works before commenting.


darseen

I have something cooking up to help you fix this. I'm just testing it in my workspace to ensure it works before commenting.

shamanop
PROOP

a month ago

I would be so grateful. Thank you so much.


darseen
HOBBYTop 1% Contributor

a month ago

Here is a workaroud to run the command pg_resetwal -f /var/lib/postgresql/data/pgdata in your crashing service. Since the postgres service crashes before you can SSH, you can set Custom Start Command in your service settings to sleep infinity (see the attached image).
Now you can right click on your service, copy the ssh command and connect to it.

I did test this in my workspace, I crashed the db service and tested the Start Command, and was able to ssh this way to run the command.
Please keep in mind that you might need to increase max_wal_size as well, if it didn't work.

Hope this helps.

Attachments


darseen

Here is a workaroud to run the command pg_resetwal -f /var/lib/postgresql/data/pgdata in your crashing service. Since the postgres service crashes before you can SSH, you can set Custom Start Command in your service settings to sleep infinity (see the attached image).Now you can right click on your service, copy the ssh command and connect to it.I did test this in my workspace, I crashed the db service and tested the Start Command, and was able to ssh this way to run the command.Please keep in mind that you might need to increase max_wal_size as well, if it didn't work.Hope this helps.

shamanop
PROOP

a month ago

Thank you so much for your help.

For reference, after about 6 hours of working on this, this was the final solution.

Side note, Railway support was completely unhelpful and basically told me to pound sand.

Thank you @darseen.

## Quick Reference

**Production Database:** Postgres (postgres.railway.internal)
**Public URL:** caboose.proxy.rlwy.net:19483
**Password:** jHhMPrcJTLGGwXnlnomPhpTseUrgggwF

---

## If PostgreSQL Crashes with "No space left on device"

This happens when WAL (Write-Ahead Log) files fill up the disk.

### Solution: Reset WAL Files

1. **Go to Railway Dashboard** → Click on the crashed Postgres service
2. **Settings → Custom Start Command** → Enter: `sleep infinity`
3. **Click Deploy** - The service will start but just sleep
4. **Right-click the service → Copy SSH Command**
5. **Run the SSH command** in your terminal
6. **Once connected, run:**
   ```bash
   su postgres -c "pg_resetwal -f /var/lib/postgresql/data/pgdata"
   ```
7. **Exit SSH**, remove the custom start command, and redeploy

---

## If VP-CRM Crashes on Startup (Prisma Error)

### Problem
VP-CRM runs `prisma db push` on startup which can fail if schema doesn't match.

### Solution
1. Edit `package.json` - change the start script to:
   ```json
   "start": "node scripts/seed-admin.js && next start"
   ```
2. **IMPORTANT:** Push to the **main** branch (not master):
   ```bash
   git push origin master:main
   ```
3. Railway will auto-deploy from main branch

---

## Regular Backup Process

Run this weekly (or set up a cron job):

```bash
cd "c:\Users\ctkul\Desktop\VoterPing\VoterPing CRM"
node backup-database.js
```

Backups are saved to the `backups/` folder with timestamps.

---

## Important Branch Note

Railway is configured to deploy from the **main** branch.
Local development uses **master** branch.

To deploy changes:
```bash
git push origin master:main
```

---

## Database Connection Details

| Service | Internal URL | Public URL |
|---------|-------------|------------|
| Postgres (Production) | postgres.railway.internal:5432 | caboose.proxy.rlwy.net:19483 |

**Connection String:**
```
postgresql://postgres:jHhMPrcJTLGGwXnlnomPhpTseUrgggwF@caboose.proxy.rlwy.net:19483/railway
```

---

## Services to Keep

- **VP-CRM** - The main application
- **Postgres** - Production database (250GB volume)

## Services to Delete (Cleanup)

- Postgres-GtWr
- Postgres-JpOZ
- Postgres-qNt9
- Postgres-84Ge
- Any orphaned volumes

---

## Contact

If you need help, the recovery was performed on 2026-01-12 using Claude Code.

Status changed to Solved brody about 1 month ago


Loading...