MySQL Latency Issues

invictus

PROOP

13 days ago

Hey guys,

I'm having some issues with my Laravel applications on Railway - they're hitting higher than expected latency connecting to MySQL pods inside their respective projects.

I have the MySQL pods set up capped at 2 vCPU and 1GB RAM, which for applications which see at most 2-3 concurrent users should be OK - I cannot see anything in the load profile (https://cleanshot.com/share/6VB6pxcG) which causes me any concern there.

I have Laravel Nightwatch monitoring the traffic and for this example app, which is a single user, I'm getting 32ms on the first query and 15+ms on every subsequent: https://cleanshot.com/share/lncSM9vf - these are fast queries from a tiny database which when run locally on my dev machine are single digit milliseconds.

Ping between services comes to 5-7ms which is a bit slow for things physically colocated but isn't accounting for all of the performance drain.

I can replicate the issue on other projects, though this one is the one named HIMC. On my other Orbiter project there was enough of a latency issue to make Bookstack uncomfortably slow again with a single user.

I have another stack running in DigitalOcean (London) - a Kubernetes cluster and a modest (though bigger than this, admittedly) MySQL managed server - and from that I'm getting ~2ms query performance over a similar database pattern.

I'll do a test with an external database server tonight, perhaps try Neon but the EU region doesn't line up terribly well to Neon for me (NL to DE).

Its not a huge deal nor is it actively stopping me using the product. Its just odd and I can't get my head around what is causing it.

$20 Bounty

7 Replies

Railway

BOT

13 days ago

This thread has been marked as public for community involvement, as it does not contain any sensitive or personal information. Any further activity in this thread will be visible to everyone.

Status changed to Open Railway • 13 days ago

darseen

HOBBYTop 5% Contributor

13 days ago

There are multiple factors that contribute to increased latency, in your case: Your Laravel app might be spinning up a new database connection for every incoming HTTP request. You need to persistent connections in this case. Additionally, you can use --skip-name-resolve flag in your MySQL container, it forces MySQL to only use IP addresses for client authentication, bypassing the DNS lookup, thus reducing latency.

darseen

There are multiple factors that contribute to increased latency, in your case: Your Laravel app might be spinning up a new database connection for every incoming HTTP request. You need to persistent connections in this case. Additionally, you can use --skip-name-resolve flag in your MySQL container, it forces MySQL to only use IP addresses for client authentication, bypassing the DNS lookup, thus reducing latency.

invictus

PROOP

13 days ago

Thanks for the reply. My Laravel apps are set to persist connections but this is subsequent queries on a single request, when the connection is already established (see the second screenshot). I can well imagine that the first connection for each FPM worker has a connection cost - this was circa 100ms until I turned on persistent connections. Once I did that, it dropped to ~30ms but I still get ~15ms on round-trip subsequent queries.

With a direct ping from frontend service to database at 5-7ms where I'd expect to see 1ms I am thinking I've either misconfigured something or there's something awry with the networking. I thought I'd try a different region later on this evening.

Some other gotchas I've checked for already:

1 - All in the same region, EU West

2 - All services are alive (not sleeping) when the tests start

3 - Replicated behaviour over the course of several weeks, on both Laravel and non-Laravel applications but always PHP and always Alpine as the underlying OS.

4 - No difference in performance between using 'mysql' and 'mysql.railway.internal' hostnames.

darseen

HOBBYTop 5% Contributor

13 days ago

I don't think it's a misconfiguration on your side. My best guess, it's related to how your app is communicating with your MySQL database.
As I mentioned before, you can try using --skip-name-resolve to use IP addresses only, and reduce latency by bypassing DNS lookups. I also did some research, and found that Prepared Statements could cause your Laravel app to send two distinct network round trips (prepare + execute). You can use emulated prepares (PDO::ATTR_EMULATE_PREPARES => true) to prevent this.

darseen

I don't think it's a misconfiguration on your side. My best guess, it's related to how your app is communicating with your MySQL database. As I mentioned before, you can try using --skip-name-resolve to use IP addresses only, and reduce latency by bypassing DNS lookups. I also did some research, and found that Prepared Statements could cause your Laravel app to send two distinct network round trips (prepare + execute). You can use emulated prepares (PDO::ATTR_EMULATE_PREPARES => true) to prevent this.

invictus

PROOP

13 days ago

Thanks for the reply.

I don't really understand --skip-name-resolve - that would require that I know the IP address which will shift around with time, right? Perhaps I'm not understanding. I could hardcode the alias into the DNS cache but again, when the IP address of the service changes (the app is in 10.226.xx.xx and the DB in 10.160.xx.xx so I assume there's quite a lot of room for them to move it around) it'll start to drop again. I did a bit of looking around but all I can see is instructions on how to activate it.

Prepared statements would potentially create two round trips - but even then two round trips on a same interface same network should be sub-2ms not 5-7ms per trip. Two round trips explains the difference between the ping and the total query time but it still doesn't explain why that round trip time is so long - inside the same dc I would expect to be seeing sub-ms even across routers.

I originally opened this as a private ticket to Railway because I assumed I had something setup wrong - I cannot see that out of the box this makes sense.

darseen

HOBBYTop 5% Contributor

13 days ago

Yeah, Railway only has static outbound IPs, your internal URL's IP will change overtime. You could try it using the resolved IP address temporarily to rule out increased latency caused by DNS lookups. After all, this is all I know regarding this issue. Hopefully someone else steps in and comes up with a solution to your problem.

invictus

PROOP

12 days ago

Some further testing (in the hope that someone from Railway can get in touch):

1 - The issue isn't universal across all my projects in the region. Some projects have sustained sub-ms ping to their MySQL pods, others are 5-7ms. It appears consistent on a per-project basis across over a week of testing and recording from Nightwatch.

2 - A new project spun up has sub-ms ping to MySQL with the same services spun up and (as near as I can in a quick test) the same configuration. The only difference between it and a slow project is the number of services on the network.

3 - Directly pinging internal 10.x IP addresses has near identical latency to the hostname lookups, suggesting DNS is not the issue (but thanks for the input @darseen)

I haven't yet tried a different region, nor a roll-my-own MySQL as opposed to the Railway provided DB instance.

invictus

PROOP

15 hours ago

Is there any way for me to flag this ticket for a response from Railway? I opened it as a private ticket because I genuinely believe it is an infrastructure problem.

The new project I spun up with sub-ms ping has now degraded to 5+ms ping

0eabcfa27f07:/var/www/html# ping mysql

PING mysql (10.176.92.69): 56 data bytes

64 bytes from 10.176.92.69: seq=0 ttl=42 time=5.892 ms

64 bytes from 10.176.92.69: seq=1 ttl=42 time=5.741 ms

64 bytes from 10.176.92.69: seq=2 ttl=42 time=5.724 ms

64 bytes from 10.176.92.69: seq=3 ttl=42 time=5.755 ms

64 bytes from 10.176.92.69: seq=4 ttl=42 time=9.213 ms

64 bytes from 10.176.92.69: seq=5 ttl=42 time=5.673 ms

64 bytes from 10.176.92.69: seq=6 ttl=42 time=8.785 ms

64 bytes from 10.176.92.69: seq=7 ttl=42 time=6.071 ms

64 bytes from 10.176.92.69: seq=8 ttl=42 time=5.948 ms

64 bytes from 10.176.92.69: seq=9 ttl=42 time=5.761 ms

64 bytes from 10.176.92.69: seq=10 ttl=42 time=5.710 ms

64 bytes from 10.176.92.69: seq=11 ttl=42 time=5.636 ms

64 bytes from 10.176.92.69: seq=12 ttl=42 time=6.000 ms

And a 5-7ms response time for MySQL queries.

This is in comparison to an equivalent response time of between 0.9-1.3ms for the same queries on a local box running Docker (another image with MySQL) with similar (though not identical) hardware constraints.

Inside a single availability zone (Europe), in a single datacenter, I would expect ping response times to be sub-ms, 1.5ms at the absolute most.

This is having a meaningful impact on my performance and its a real shame that after 11 days I've not received a response from Railway.

Welcome!