22 days ago
Hello,
I'm trying to diagnose a performance issue on a self-hosted Forest Admin instance running on Railway.
Architecture:
- Forest Admin self-hosted on Railway
- PostgreSQL database hosted on Supabase
- Forest Admin UI accessed through Forest Admin Cloud
- Both Railway and Supabase are hosted in the EU West (Europe) region
- No deployments or code changes have been made in approximately 3 weeks
Symptoms:
-
Since yesterday, several users have reported significant slowness when using Forest Admin.
-
On Railway, I can see response times occasionally spiking to around 30 seconds.
-
The service remains online and does not appear to be restarting or crashing.
-
Users report slow page loads and slow interactions within the admin panel.
-
No significant increase in traffic or data volume has been identified so far.
My questions:
- What metrics would you investigate first to determine whether the issue comes from Railway, Forest Admin, Supabase, or the network between them?
- Is there a recommended way to identify which Forest Admin routes or actions are slow?
- How do you usually correlate Forest Admin response times with PostgreSQL query execution times?
- Have you experienced similar performance issues on Forest Admin without any recent deployment or code changes?
- Are there any Railway or Supabase metrics that are particularly useful for diagnosing intermittent latency spikes?
Any advice, troubleshooting methodology, or similar experiences would be greatly appreciated.
Thank you!
1 Replies
22 days ago
This thread has been opened as a bounty so the community can help solve it.
Status changed to Open Railway • 22 days ago
19 days ago
- First Metrics to Investigate
Railway first:
CPU and memory utilization on the service container (look for memory creep suggesting a leak)
Response time P95/P99 in Railway's metrics tab — if all routes are slow, the bottleneck is likely at the app/infra level, not a specific query
Container restart count — even if it's not crashing, a container under memory pressure can slow dramatically
Supabase second:
Connection pool usage (Supabase Dashboard → Database → Connection pooling) — this is the #1 silent killer for self-hosted Forest Admin. If connections are exhausted, queries queue and you get exactly this symptom: intermittent 30s spikes with no crashes
Active queries and locks in Supabase's Query Performance dashboard
Network third:
Even though both are EU West, Railway and Supabase don't share a private network. All traffic goes over the public internet. Check if Railway's egress latency has changed.
- Identifying Slow Forest Admin Routes
Forest Admin's agent logs every request. The most practical approaches:
Enable verbose logging on your agent by setting NODE_ENV=development or using the agent's logger option to capture request duration
Railway log filtering: filter logs for ms or response time tokens — Forest Admin's Express layer logs each request with duration by default
Add a simple middleware to your Forest Admin app to capture slow requests:
app.use((req, res, next) => {
const start = Date.now();
res.on('finish', () => {
const duration = Date.now() - start;
if (duration > 3000) {
console.warn(`SLOW REQUEST: ${req.method} ${req.path} - ${duration}ms`);
}});
next();
});
Focus on routes under /forest/ — specifically collection list routes (these hit the DB hardest) vs. action routes
- Correlating Forest Admin Response Times with PostgreSQL Query Times
The most reliable method is log timestamp correlation:
In Railway logs, note the timestamp of a slow request and the route path
In Supabase, go to Logs → Postgres logs and filter for the same time window
Look for queries with duration > 1000 in the Postgres logs
For a more systematic approach, enable pg_stat_statements on Supabase and query it:
SELECT query, calls, mean_exec_time, max_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;
This will immediately surface any queries that have degraded. Also check for lock contention:
SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '2 seconds';
- Performance Issues Without Deployments — Common Causes
Yes, this is very common. The usual suspects after weeks of stability:
Include
1-Table bloat / missing VACUUM (PostgreSQL accumulates dead tuples over time; autovacuum may not keep up)
2-Index bloat (Indexes degrade with high write volume)
3-Connection pool exhaustion (Slow leaks in connection handling; hits the ceiling eventually)
4-Supabase free/pro tier limits (If on a lower tier, compute/storage thresholds can throttle after sustained use)
5-Railway container memory creep (Node.js heap grows gradually; GC pressure increases response times)
6-pg_stat_statements / bloat in system tables (Supabase's own monitoring tables can become large)
The "no changes for 3 weeks" pattern strongly suggests table/index bloat or connection pool saturation — both are time-accumulative issues, not event-triggered.
- Most Useful Railway & Supabase Metrics for Intermittent Spikes
Railway:
- Memory Usage over time (look for a slow upward trend — memory leak signature)
- CPU Usage — spikes correlating with slow responses suggest compute-bound queries
- Log volume — a sudden increase can indicate retry loops
Supabase:
a- Database → Connection pooling → Active connections — the single most important metric here
b- Reports → Query Performance — sorted by slowest average time
c- Logs → Postgres logs — filter for ERROR and duration keywords
d- Check pg_stat_user_tables for tables with high n_dead_tup (dead tuple count):
SELECT relname, n_live_tup, n_dead_tup,
round(n_dead_tup::numeric / nullif(n_live_tup + n_dead_tup, 0) * 100, 1) AS dead_pct,
last_autovacuum, last_autoanalyzeFROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;
A dead_pct above 10-20% on frequently-queried tables is a strong indicator.
Given your symptoms, prioritize in this order:
Check Supabase connection pool — if it's near the limit, that's likely your culprit
Run the pg_stat_statements query to find degraded queries
Check dead tuple counts and manually trigger VACUUM ANALYZE on the worst tables
Monitor Railway memory over the next hour — look for a trend, not just a current value
If nothing stands out, redeploy the Railway service (no code change needed — just restart) to rule out memory/connection state issues in the running process