Very slow requests
x-steeve
HOBBYOP

22 days ago

Hello,

I'm trying to diagnose a performance issue on a self-hosted Forest Admin instance running on Railway.

Architecture:

  • Forest Admin self-hosted on Railway
  • PostgreSQL database hosted on Supabase
  • Forest Admin UI accessed through Forest Admin Cloud
  • Both Railway and Supabase are hosted in the EU West (Europe) region
  • No deployments or code changes have been made in approximately 3 weeks

Symptoms:

  • Since yesterday, several users have reported significant slowness when using Forest Admin.

  • On Railway, I can see response times occasionally spiking to around 30 seconds.

  • The service remains online and does not appear to be restarting or crashing.

  • Users report slow page loads and slow interactions within the admin panel.

  • No significant increase in traffic or data volume has been identified so far.

My questions:

  1. What metrics would you investigate first to determine whether the issue comes from Railway, Forest Admin, Supabase, or the network between them?
  2. Is there a recommended way to identify which Forest Admin routes or actions are slow?
  3. How do you usually correlate Forest Admin response times with PostgreSQL query execution times?
  4. Have you experienced similar performance issues on Forest Admin without any recent deployment or code changes?
  5. Are there any Railway or Supabase metrics that are particularly useful for diagnosing intermittent latency spikes?

Any advice, troubleshooting methodology, or similar experiences would be greatly appreciated.

Thank you!

$10 Bounty

1 Replies

Railway
BOT

22 days ago

This thread has been opened as a bounty so the community can help solve it.

Status changed to Open Railway 22 days ago


spatrickpaul
HOBBYTop 10% Contributor

19 days ago

  1. First Metrics to Investigate

Railway first:

CPU and memory utilization on the service container (look for memory creep suggesting a leak)

Response time P95/P99 in Railway's metrics tab — if all routes are slow, the bottleneck is likely at the app/infra level, not a specific query

Container restart count — even if it's not crashing, a container under memory pressure can slow dramatically

Supabase second:

Connection pool usage (Supabase Dashboard → Database → Connection pooling) — this is the #1 silent killer for self-hosted Forest Admin. If connections are exhausted, queries queue and you get exactly this symptom: intermittent 30s spikes with no crashes

Active queries and locks in Supabase's Query Performance dashboard

Network third:

Even though both are EU West, Railway and Supabase don't share a private network. All traffic goes over the public internet. Check if Railway's egress latency has changed.

  1. Identifying Slow Forest Admin Routes

Forest Admin's agent logs every request. The most practical approaches:

Enable verbose logging on your agent by setting NODE_ENV=development or using the agent's logger option to capture request duration

Railway log filtering: filter logs for ms or response time tokens — Forest Admin's Express layer logs each request with duration by default

Add a simple middleware to your Forest Admin app to capture slow requests:

app.use((req, res, next) => {

const start = Date.now();

res.on('finish', () => {

const duration = Date.now() - start;

if (duration > 3000) {

  console.warn(`SLOW REQUEST: ${req.method} ${req.path} - ${duration}ms`);

}

});

next();

});

Focus on routes under /forest/ — specifically collection list routes (these hit the DB hardest) vs. action routes

  1. Correlating Forest Admin Response Times with PostgreSQL Query Times

The most reliable method is log timestamp correlation:

In Railway logs, note the timestamp of a slow request and the route path

In Supabase, go to Logs → Postgres logs and filter for the same time window

Look for queries with duration > 1000 in the Postgres logs

For a more systematic approach, enable pg_stat_statements on Supabase and query it:

SELECT query, calls, mean_exec_time, max_exec_time, total_exec_time

FROM pg_stat_statements

ORDER BY mean_exec_time DESC

LIMIT 20;

This will immediately surface any queries that have degraded. Also check for lock contention:

SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state

FROM pg_stat_activity

WHERE (now() - pg_stat_activity.query_start) > interval '2 seconds';

  1. Performance Issues Without Deployments — Common Causes

Yes, this is very common. The usual suspects after weeks of stability:

Include

1-Table bloat / missing VACUUM (PostgreSQL accumulates dead tuples over time; autovacuum may not keep up)

2-Index bloat (Indexes degrade with high write volume)

3-Connection pool exhaustion (Slow leaks in connection handling; hits the ceiling eventually)

4-Supabase free/pro tier limits (If on a lower tier, compute/storage thresholds can throttle after sustained use)

5-Railway container memory creep (Node.js heap grows gradually; GC pressure increases response times)

6-pg_stat_statements / bloat in system tables (Supabase's own monitoring tables can become large)

The "no changes for 3 weeks" pattern strongly suggests table/index bloat or connection pool saturation — both are time-accumulative issues, not event-triggered.

  1. Most Useful Railway & Supabase Metrics for Intermittent Spikes

Railway:

  • Memory Usage over time (look for a slow upward trend — memory leak signature)
  • CPU Usage — spikes correlating with slow responses suggest compute-bound queries
  • Log volume — a sudden increase can indicate retry loops

Supabase:

a- Database → Connection pooling → Active connections — the single most important metric here

b- Reports → Query Performance — sorted by slowest average time

c- Logs → Postgres logs — filter for ERROR and duration keywords

d- Check pg_stat_user_tables for tables with high n_dead_tup (dead tuple count):

SELECT relname, n_live_tup, n_dead_tup,

   round(n_dead_tup::numeric / nullif(n_live_tup + n_dead_tup, 0) * 100, 1) AS dead_pct,

   last_autovacuum, last_autoanalyze

FROM pg_stat_user_tables

ORDER BY n_dead_tup DESC

LIMIT 10;

A dead_pct above 10-20% on frequently-queried tables is a strong indicator.

Given your symptoms, prioritize in this order:

Check Supabase connection pool — if it's near the limit, that's likely your culprit

Run the pg_stat_statements query to find degraded queries

Check dead tuple counts and manually trigger VACUUM ANALYZE on the worst tables

Monitor Railway memory over the next hour — look for a trend, not just a current value

If nothing stands out, redeploy the Railway service (no code change needed — just restart) to rule out memory/connection state issues in the running process


Welcome!

Sign in to your Railway account to join the conversation.

Loading...