MySQL memory observability + understanding container resource metrics

Question

Hi Railway team,

I'm running a MySQL 8 Docker service with the following start command:

```
docker-entrypoint.sh mysqld --innodb-use-native-aio=0 --disable-log-bin --performance_schema=1 --innodb-buffer-pool-size=20G --slow-query-log=1 --long-query-time=3
```

I'm seeing a steady \~15GB memory consumption in the observability dashboard, which I'd expect given my 20GB buffer pool setting (with some overhead). However, I have a few questions about what the metrics capture:

1. **What exactly does the memory metric measure?** Is this container RSS, the cgroup memory limit usage, or something else? Does it include all memory allocated by the MySQL process including per-connection buffers?
2. **Are there memory limits on my plan/service that might cap the effective memory before my 20GB buffer pool setting?** I want to ensure MySQL isn't being OOM-killed or swapping silently.
3. **I'm experiencing intermittent p99 latency spikes (up to 30 seconds) on my Django service connecting to this MySQL instance.** To help diagnose:  
   * Is there visibility into connection counts over time for the MySQL service?  
   * Can I see if the container is experiencing memory pressure, CPU throttling, or I/O wait?  
   * Is there any metric for network latency between services in my project?
4. **What additional observability would you recommend?** I have `performance_schema=1` enabled. Are there specific queries or dashboards you'd suggest for understanding resource bottlenecks on Railway specifically?

My Django connection settings use `CONN_MAX_AGE=60` for persistent connections across $$X$$ Gunicorn workers, so I'd expect a baseline of $$X$$ persistent connections plus some churn.

Thanks for any guidance on both the memory accounting and any Railway-specific diagnostics I should be looking at.

yusufmo1 · Accepted Answer

railway's memory metric shows container memory usage from cgroups (similar to working set in kubernetes). it includes your mysql process memory, buffer pool, per connection buffers, and any page cache. the 15GB you see for a 20GB buffer pool setting is normal since innodb doesn't allocate the full buffer pool immediately and the metric doesn't double count shared memory.

for memory limits: check your service Settings > Resource Limits to see the actual cap. railway won't swap, it'll OOM kill silently if you exceed the limit. with a 20GB buffer pool plus connection overhead you probably need 24GB+ headroom. if your plan limit is lower than that, mysql might be getting killed during load spikes. the metrics tab should show if memory hits the ceiling right before restarts.

for the p99 latency spikes, railway has acknowledged known issues with private networking causing intermittent 10-30s latency spikes. they're working on fixes. in the meantime, useful performance\_schema queries:

\`\`\`sql

\-- connection count over time

SELECT \* FROM performance\_schema.status\_by\_thread WHERE VARIABLE\_NAME = 'Threads\_connected';

\-- or simpler

SHOW STATUS LIKE 'Threads\_%';

\-- buffer pool efficiency (want >99% hit rate)

SHOW STATUS LIKE 'Innodb\_buffer\_pool\_read%';

\-- slow queries and wait events

SELECT \* FROM performance\_[schema.events](http://schema.events)\_waits\_summary\_global\_by\_event\_name ORDER BY SUM\_TIMER\_WAIT DESC LIMIT 20;

```

railway doesn't expose per service network latency metrics or IO wait visibility directly. your best bet is adding application level timing (log query duration from django) and comparing internal vs public mysql url latency to isolate if it's network or db side.