2 months ago
Hi Railway team,
My production Postgres database is in a crash loop and my application is completely down. The logs show:
FATAL: could not write to file "pg_wal/xlogtemp.28": No space left on device
Details:
Project: VP-CRM
Service: Postgres (currently showing "Crashed")
Plan: Pro (just upgraded)
Account email: [YOUR EMAIL]
The database keeps trying to recover but fails due to no disk space for WAL files. I cannot find volume resize options in the dashboard for managed Postgres.
Request: Please increase my Postgres volume size to 10GB (or whatever is needed) immediately so the database can complete recovery.
This is blocking my production application and affecting customers right now.
Thank you!
Pinned Solution
2 months ago
Here is a workaroud to run the command pg_resetwal -f /var/lib/postgresql/data/pgdata in your crashing service. Since the postgres service crashes before you can SSH, you can set Custom Start Command in your service settings to sleep infinity (see the attached image).
Now you can right click on your service, copy the ssh command and connect to it.
I did test this in my workspace, I crashed the db service and tested the Start Command, and was able to ssh this way to run the command.
Please keep in mind that you might need to increase max_wal_size as well, if it didn't work.
Hope this helps.
Attachments
10 Replies
2 months ago
Hey there! We've found the following might help you get unblocked faster:
🧵 Can I reduce the disk space on my Postgres and H8N to 20 GB?
🧵 URGENT: Database crashed - No space left on device - Production down
If you find the answer from one of these, please let us know by solving the thread!
2 months ago
This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.
Status changed to Open brody • about 2 months ago
2 months ago
You can find the option to grow your volume when you click on it, and go to settings. There you'll see the option to grow your volume.
Check the images I attached to see exactly where to find it.
Attachments
2 months ago
Update - Volume resize didn't fix it, need Railway support intervention
Thank you @darseen - I found the volume resize option and successfully increased the volume to 250GB. However, the database is still crashing with the same error:
redo done at 0/26FFFE08 system usage: CPU: user: 0.09 s, system: 0.13 s, elapsed: 2.91 s
FATAL: could not write to file "pg_wal/xlogtemp.29": No space left on device
Critical observation: The recovery actually COMPLETES successfully (redo done) - meaning my data is intact. It only crashes when trying to write the checkpoint afterward.
What I've tried:
Resized volume to 250GB (shows 491MB/250GB used - plenty of space)
Created a backup (491MB) and attempted restore
Tried restoring to multiple different Postgres services
All attempts fail with the same WAL space error
SSH into a working Postgres shows:
/dev/zd3392 46G 47M 46G 1% /var/lib/postgresql/data
The mounted volume has space, but pg_wal/xlogtemp.29 appears to be writing to container ephemeral storage, not the mounted volume.
What I need from Railway support:
Run
pg_resetwal -f /var/lib/postgresql/data/pgdataon my crashed Postgres volume to clear the corrupted WAL state, ORProvide shell access during the brief recovery window so I can run it myself, OR
Allow me to download my backup file so I can recover locally
The data IS recoverable - the redo completes. We just need to clear the WAL checkpoint that's failing.
Project: VP-CRM
Service: Postgres
Volume: postgres-2026-01-12 -2026-01-12 17:39 UTC (491MB data)
This is production data for a live business. Customers cannot access quotes. Any help would be greatly appreciated.
2 months ago
I'm sorry, but these are unmanaged databases. We cannot provide support for them. I will step back now and let the community continue to assist you.
2 months ago
@brody With respect, this is a platform infrastructure issue, not a database administration question.
The problem is that pg_wal is writing to container ephemeral storage instead of the mounted volume - that's a Railway platform configuration issue. I resized my volume to 250GB and it shows plenty of space (491MB/250GB), yet PostgreSQL still fails with "No space left on device."
I'm not asking for help with SQL queries or database optimization. I'm asking for:
Access to my own data that's stored on Railway's infrastructure
Shell access to run a single recovery command (
pg_resetwal)Or simply download my backup file that I created through Railway's backup feature
I'm a paying Pro customer. My production application is down. Customers cannot access their quotes. The data IS recoverable - your own logs show recovery completes successfully before the WAL write fails.
If Railway cannot provide any assistance with accessing data stored on your platform, please escalate this to someone who can help, or refund my Pro subscription so I can migrate to a provider that supports their customers
brody
I'm sorry, but these are unmanaged databases. We cannot provide support for them. I will step back now and let the community continue to assist you.
2 months ago
Is there any way to download the database? The data is recoverable... I just do not have an option to do it.
I am happy to pay whatever bounty to have this issue fixed or have someone help me.
2 months ago
I have something cooking up to help you fix this. I'm just testing it in my workspace to ensure it works before commenting.
darseen
I have something cooking up to help you fix this. I'm just testing it in my workspace to ensure it works before commenting.
2 months ago
I would be so grateful. Thank you so much.
2 months ago
Here is a workaroud to run the command pg_resetwal -f /var/lib/postgresql/data/pgdata in your crashing service. Since the postgres service crashes before you can SSH, you can set Custom Start Command in your service settings to sleep infinity (see the attached image).
Now you can right click on your service, copy the ssh command and connect to it.
I did test this in my workspace, I crashed the db service and tested the Start Command, and was able to ssh this way to run the command.
Please keep in mind that you might need to increase max_wal_size as well, if it didn't work.
Hope this helps.
Attachments
darseen
Here is a workaroud to run the command pg_resetwal -f /var/lib/postgresql/data/pgdata in your crashing service. Since the postgres service crashes before you can SSH, you can set Custom Start Command in your service settings to sleep infinity (see the attached image).Now you can right click on your service, copy the ssh command and connect to it.I did test this in my workspace, I crashed the db service and tested the Start Command, and was able to ssh this way to run the command.Please keep in mind that you might need to increase max_wal_size as well, if it didn't work.Hope this helps.
2 months ago
Thank you so much for your help.
For reference, after about 6 hours of working on this, this was the final solution.
Side note, Railway support was completely unhelpful and basically told me to pound sand.
Thank you @darseen.
## Quick Reference
**Production Database:** Postgres (postgres.railway.internal)
**Public URL:** caboose.proxy.rlwy.net:19483
**Password:** jHhMPrcJTLGGwXnlnomPhpTseUrgggwF
---
## If PostgreSQL Crashes with "No space left on device"
This happens when WAL (Write-Ahead Log) files fill up the disk.
### Solution: Reset WAL Files
1. **Go to Railway Dashboard** → Click on the crashed Postgres service
2. **Settings → Custom Start Command** → Enter: `sleep infinity`
3. **Click Deploy** - The service will start but just sleep
4. **Right-click the service → Copy SSH Command**
5. **Run the SSH command** in your terminal
6. **Once connected, run:**
```bash
su postgres -c "pg_resetwal -f /var/lib/postgresql/data/pgdata"
```
7. **Exit SSH**, remove the custom start command, and redeploy
---
## If VP-CRM Crashes on Startup (Prisma Error)
### Problem
VP-CRM runs `prisma db push` on startup which can fail if schema doesn't match.
### Solution
1. Edit `package.json` - change the start script to:
```json
"start": "node scripts/seed-admin.js && next start"
```
2. **IMPORTANT:** Push to the **main** branch (not master):
```bash
git push origin master:main
```
3. Railway will auto-deploy from main branch
---
## Regular Backup Process
Run this weekly (or set up a cron job):
```bash
cd "c:\Users\ctkul\Desktop\VoterPing\VoterPing CRM"
node backup-database.js
```
Backups are saved to the `backups/` folder with timestamps.
---
## Important Branch Note
Railway is configured to deploy from the **main** branch.
Local development uses **master** branch.
To deploy changes:
```bash
git push origin master:main
```
---
## Database Connection Details
| Service | Internal URL | Public URL |
|---------|-------------|------------|
| Postgres (Production) | postgres.railway.internal:5432 | caboose.proxy.rlwy.net:19483 |
**Connection String:**
```
postgresql://postgres:jHhMPrcJTLGGwXnlnomPhpTseUrgggwF@caboose.proxy.rlwy.net:19483/railway
```
---
## Services to Keep
- **VP-CRM** - The main application
- **Postgres** - Production database (250GB volume)
## Services to Delete (Cleanup)
- Postgres-GtWr
- Postgres-JpOZ
- Postgres-qNt9
- Postgres-84Ge
- Any orphaned volumes
---
## Contact
If you need help, the recovery was performed on 2026-01-12 using Claude Code.Status changed to Solved brody • about 2 months ago