Ever since moving to Railway Metal, MySQL instance takes a lot of time to execute simple SELECT query
patrikhorvatic
PROOP

8 months ago

Hello, ever since I moved all my instances and drives to Railway Metal in Amsterdam, my MySQL isntance is behaving oddly. Instance is deployed from the Railway template.

For simple query like: SELECT * FROM Biljeske WHERE id_korisnik=? ORDER BY kreiran DESC it sometimes takes up to 10 seconds to execute the query and return the data to a client. It often returns data very fast, but this 10 second anomaly is very annoying as it ruins user experience. There is not even a lot of data allowed in each row it would fetch.

I'm using Rust Axum framework with SQLx to communicate with the database. Here is the code how I connect to the pool:

use serde::{Deserialize, Serialize};

use sqlx::{mysql::MySqlPoolOptions, MySql, Pool};

use std::time::Duration;

pub async fn prepare_database_connection() -> Pool<MySql> {

let database_url = dotenvy::var("DATABASE_URL_PRIVATE").expect("DATABASE_URL_PRIVATE must be set");

let pool = MySqlPoolOptions::new()

.min_connections(1)

.max_connections(15)

.max_lifetime(Duration::new(1800, 0))

.test_before_acquire(false)

.connect(&database_url)

.await;

match pool {

Ok(p) => p,

Err(er) => {

println!("{:?}", er);

std::process::exit(1);

}

}

}

I do not belive i do something wrong here as 2 months ago it was working perfectly well. Never a delay.

Here is example of a simple CRUD endpoint:
pub async fn dohvati_biljeske(

State(state): State<Arc<AppState>>,

Extension(current_user): Extension<ExtendRequestKorisnikId>,

) -> ApiResult<Json<Vec<Biljeska>>> {

let mut biljeske = sqlx::query_as::<MySql, Biljeska>(

"SELECT * FROM Biljeske WHERE id_korisnik=? ORDER BY kreiran DESC",

)

.bind(current_user.id_korisnik)

.persistent(false)

.fetch_all(&state.db)

.await

.map_err(SqlCommands::map_and_return_get_query)?;

Sanitizer::sanitize_list::<Biljeska>(&mut biljeske);

Ok(Json(biljeske))

}

Does anyone else experience issues with MySQL instance? How do I fix the instance?

Solved

39 Replies

8 months ago

Hi Patrick. Acknowledged! We're seeing this sometimes with MySQL. I've applied some config to that instance; can you let me know if it resolves?


Status changed to Awaiting User Response Railway 8 months ago


patrikhorvatic
PROOP

8 months ago

Hello and thank You for the reply. There is definently improvement in fetching the data, but commiting the data is sometimes not so fast. I would like to know if there is something I need to do after redeploying my MySQL instance. As it can sometimes have high RAM usage, I often restart the instance so I can lower the cost.

When I register the user I need to check the verification code it received on the email. I started logging timing for each operation. After verifying that code is valid user can do other stuff. Here is the code for the verification:

let start = std::time::Instant::now();

let mut transakcije = SqlCommands::begin_transaction(&state.db).await;

println!("begin transaction took: {:?}", start.elapsed());

let _ = sqlx::query(

"UPDATE Korisnik SET verified=1, verification_code=NULL

WHERE id_korisnik=?

AND uuid=? AND verification_code=?",

)

.bind(korisnik)

.bind(&body.uuid)

.bind(&body.kod)

.persistent(false)

.execute(&mut *transakcije)

.await

.inspect_err(SqlCommands::print_database_error)

.map_err(SqlCommands::map_and_return_post_query)?;

println!("query took: {:?}", start.elapsed());

let comm = SqlCommands::commit_transaction(transakcije).await;

println!("commit transaction took: {:?}", start.elapsed());

Railway logs show this:

begin transaction took: 1.547621ms

query took: 3.108886ms

commit transaction took: 2.203179379s

I do not know why commiting a transaction takes up to 2 seconds? Traffic to the API is not intense, there are few request every couple of minutes as user base is not yet big. id_korisnik is indexed in the table so updating should be faster if I am not mistaken?

Looking at other simple update or delete operations, the commit timing varies. Sometimes there is 17ms, 779ms,
286ms. Everything under 1s I consider acceptable.


Status changed to Awaiting Railway Response Railway 8 months ago


chandrika
EMPLOYEE

8 months ago

Thanks for sharing that feedback - good to know the reads are in a better spot. We’re continuing to make these changes and polling the small subset of users that are affected. Like you mentioned, we’re hearing back from folks that it is better, please hold for more updates on this


Status changed to Awaiting User Response Railway 8 months ago


patrikhorvatic
PROOP

8 months ago

Hello, just wondering if there is any progress in solving this issue? Commits still take about 2 to 3 seconds confused emoji


Status changed to Awaiting Railway Response Railway 8 months ago


Railway
BOT

8 months ago

Hello!

We've escalated your issue to our engineering team.

We aim to provide an update within 1 business day.

Please reply to this thread if you have any questions!

Status changed to Awaiting User Response Railway 8 months ago


8 months ago

I've gone ahead and applied a mitigation. Would you mind letting me know if it's resolved?


patrikhorvatic
PROOP

8 months ago

Hello, these are the logs from my API:

begin transaction took: 2.308705ms

query took: 4.889181ms

commit transaction took: 50.203621ms

select user start: 50.215214ms

select user took: 51.750527ms

begin transaction took: 505.307µs

QUERY LOKACIJA INSERT took: 2.227243ms

COMMIT transaction took: 5.484733385s

UPDATE KORISNIK WITH STATE took: 6.980267819s

Unfortunately issue is not resolved :/


Status changed to Awaiting Railway Response Railway 8 months ago


8 months ago

Gotchya. We'll keep looking into this urgently. Apologies on this!


Status changed to Awaiting User Response Railway 8 months ago


patrikhorvatic
PROOP

8 months ago

I forgot to mention, my container custom start command is:

docker-entrypoint.sh mysqld --innodb-use-native-aio=0 --disable-log-bin --performance_schema=0

Looking on the web, those options shoulnd not present a problem?


Status changed to Awaiting Railway Response Railway 8 months ago


8 months ago

I've attempted to rollout one more config. Would you mind checking to see if it recovers over the next 5 minutes and letting us know?


Status changed to Awaiting User Response Railway 8 months ago


patrikhorvatic
PROOP

8 months ago

These are the logs after your config:

Database query took: 1.856719ms

begin transaction took: 2.438314ms

query took: 5.978853ms

commit transaction took: 22.201148ms

select user start: 22.221187ms

select user took: 31.484517ms

begin transaction took: 447.231µs

QUERY LOKACIJA INSERT took: 2.166138ms

COMMIT transaction took: 9.422911ms

UPDATE KORISNIK WITH STATE took: 13.305753ms

After redeploying my MySQL instance, here are the logs:
begin transaction took: 2.286205ms

query took: 3.674839ms

commit transaction took: 7.912205ms

select user start: 7.946757ms

select user took: 9.921501ms

begin transaction took: 482.734µs

QUERY LOKACIJA INSERT took: 2.141403ms

COMMIT transaction took: 5.484676ms

UPDATE KORISNIK WITH STATE took: 8.885056ms

There is definetly an improvement. I will keep monitoring my logs and inform you it the issue comes back. Thank you for your time.


Status changed to Awaiting Railway Response Railway 8 months ago


8 months ago

Perfect. I'll close this out but please do re-open it. This ones kinda annoying; we're working a LOT of disk this quarter to resolve this indefinitely (and also increase performance generally)

Thanks again and sorry for the issue


Status changed to Awaiting User Response Railway 8 months ago


Status changed to Solved jake 8 months ago


patrikhorvatic
PROOP

8 months ago

Hello again,

unfortunately I had a random spike in response time. I filtered the responses by IP and newly created user was making some API calls after registration.

Endpoint on the screenshot are very simple read operations that return few rows from the table. I do not belive client's slow internet connection could contribute to a 30 second timeout. After a timeout user tries to execute the call again.

New requests come after 3 hours and all response times are below 500ms.

Is there a way for you to investigate if it truly was a database issue?

Attachments


Status changed to Awaiting Railway Response Railway 8 months ago


8 months ago

Are you able to provide the "after" logs from your API (same as you've done above)?


Status changed to Awaiting User Response Railway 8 months ago


Railway
BOT

8 months ago

Hello!

We've escalated your issue to our engineering team.

We aim to provide an update within 1 business day.

Please reply to this thread if you have any questions!


ray-chen

Are you able to provide the "after" logs from your API (same as you've done above)?

patrikhorvatic
PROOP

8 months ago

Hello, here are the logs. I will send another attachment as this random delay spike happened again :/

Attachments


Status changed to Awaiting Railway Response Railway 8 months ago


patrikhorvatic
PROOP

8 months ago

Attachment here

Attachments


8 months ago

Thanks. These show your end-to-end API response time. We need something similar to what you posted above:

Database query took: 1.856719ms

begin transaction took: 2.438314ms

query took: 5.978853ms

commit transaction took: 22.201148ms

select user start: 22.221187ms

select user took: 31.484517ms

begin transaction took: 447.231µs

QUERY LOKACIJA INSERT took: 2.166138ms

COMMIT transaction took: 9.422911ms

UPDATE KORISNIK WITH STATE took: 13.305753ms

In the meantime, we've also applied some changes that may help with performance.


Status changed to Awaiting User Response Railway 8 months ago


patrikhorvatic
PROOP

8 months ago

Hello, I only monitor few endpoints where I noticed the bottleneck. I do not print logs for every endpoint.

I have now implemented tracing for every SQL execution with a starting substring of every query executed. For each action it will print elapsed time and create a warning for slow queries which I will easily find. Slow query has not occoured yet. Time of the implementation is after those slow end-to-end API response times occoured, so currently I do not have screenshot to send you.

I will definetly let you know if it happens again.


Status changed to Awaiting Railway Response Railway 8 months ago


8 months ago

Ok thanks, please keep us posted.


Status changed to Awaiting User Response Railway 8 months ago


patrikhorvatic
PROOP

8 months ago

Hello, I have attached a screenshot. Reading is fast, but commit action takes some time to execute :/
UNESI_SPRICANJE is a procedure that creates maximum of 100 rows as I before calling the procedure check and validate data and arrays passed as arguments to the procedure. In this specific case user inserted about 3 rows.

Execution of the queries is done throught private URL to the database.

Attachments


Status changed to Awaiting Railway Response Railway 8 months ago


8 months ago

Hi,

Unfortunately, we're seeing this in a small but non-zero amount of cases under specific host conditions. I'm happy to, in the interim, move you back to the cloud machines (which are dead cold) and should help alleviate this

Please let us know what you'd like to do here and we will assist


Status changed to Awaiting User Response Railway 8 months ago


patrikhorvatic
PROOP

8 months ago

Hello, I did some stress testing.

I wrote a script that sent 16,000 POST requests over 10 minutes. While the inserts were executing, I was performing reads at the same time. The data collected shows:

  • Read are always fast.

  • 93% of POST request commits were executed in under 100ms

  • 6.5% of POST request commits were executed between 101ms and 1s

  • The remaining 0.5% of POST requests were anomalies, taking between 1s and 5s

Based on this, I’ve decided to stay on Railway Metal. My app doesn’t require extreme performance — it’s a CRUD app, after all. Plus, Railway is moving away from cloud machines, which aligns with this choice.

Overall, progress has been made, and the chances of performance issues have been significantly reduced.

Thank you and I appreciate all of your effort.


Status changed to Awaiting Railway Response Railway 8 months ago


Hey there Patrik,

I am not Jake or Ray, so apology for the new face in your thread. I am working on aggregating and responding to all customers dealing with the intermittent slow disk on the platform. If staying on Metal is okay for you, then we're happy to leave you there.

With that said, I am going to leave this thread open while I tie your customer report to a bunch of others. When we do ship the core fix, I would love to get your feedback to see if we have truly resolved the matter for you.


Status changed to Awaiting User Response Railway 8 months ago


Railway
BOT

8 months ago

🛠️ The internal ticket [Disk Latency] High read/write on metal has been marked as triage.


Railway
BOT

7 months ago

🛠️ The ticket Performance issue on metal disk has been marked as todo.


patrikhorvatic
PROOP

7 months ago

Hello,
can you please give an update to the issue? My traffic to the API has increased significantly and I would like to know if the issue will be resolved in the near future.
Thanks!


Status changed to Awaiting Railway Response Railway 7 months ago


7 months ago

Hi Patrick. We've rolled out fixes here. Are you not seeing improvements?


Status changed to Awaiting User Response Railway 7 months ago


patrikhorvatic
PROOP

7 months ago

Hello, there is no improvement.

I attached a screenshot of the delays from the database in the range of 1 to 16 seconds.

Attachments


Status changed to Awaiting Railway Response Railway 7 months ago


7 months ago

I've gone ahead and rolled out an additional change? Apologies for making you check so many times, but could you validate the change here?


Status changed to Awaiting User Response Railway 7 months ago


patrikhorvatic
PROOP

7 months ago

Hello.

I decided to have few hours of downtime and create a new instance of the database from the template. I cloned the data and released it to production.

I will monitor the database performance and let you know if the issue is not resolved. Thank you for your time.


Status changed to Awaiting Railway Response Railway 7 months ago


7 months ago

Gotchya. Well, we think we've identified the underlying issue (after much trial and error)

So if it comes back, we are here!


Status changed to Awaiting User Response Railway 7 months ago


Railway
BOT

6 months ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway 6 months ago


patrikhorvatic
PROOP

6 months ago

Hello, here are the logs for the latest traffic and commit transactions.

There is still 1+ second commit transaction execution time. The server is not busy but transactions are still slow.

Attachments


Status changed to Awaiting Railway Response Railway 6 months ago


We're looking better but not perfect, flagged it down, I assume 24 days and now it was normal until it popped up again?


Status changed to Awaiting User Response Railway 6 months ago


Railway
BOT

5 months ago

This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!

Status changed to Solved Railway 5 months ago


Railway
BOT

5 months ago

✅ The ticket Performance issue with disk operations on metal has been marked as completed.


Railway
BOT

5 months ago

🛠️ The ticket Performance issue with disk operations on metal has been marked as in progress.


Railway
BOT

5 months ago

✅ The ticket Performance issue with disk operations on metal has been marked as completed.


Railway
BOT

5 months ago

🛠️ The ticket Performance issue with disk operations on metal has been marked as in progress.


Railway
BOT

5 months ago

🛠️ The ticket Performance issue with disk operations on metal has been marked as in progress.


Railway
BOT

5 months ago

✅ The ticket Performance issue with disk operations on metal has been marked as completed.


Loading...