Asynchronous AOF fsync is taking too long (disk is busy?).
cerefre
PROOP

6 months ago

Since deploying to Railway metal last week, I've had many of these errors on Redis.

Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis. Just noticed now as the service is really slow. Any ideas?

Solved

18 Replies

cerefre
PROOP

6 months ago

Slowness resolved on its own, but still added these variables: REDIS_APPENDONLY=no, REDIS_SAVE=
Still curious if it's a known thing with the metal transition


Hey there Cerefe,

We've been playing whack a mole and scheduling workloads to make sure no one is hit by a noisy neighbor. So you likely been addressed with that action, with that said, those options would help with getting Redis to be a bit less disk heavy.

We're tracking this issue and are working with the Infra team to fix this issue.


Status changed to Awaiting User Response Railway 6 months ago


cerefre
PROOP

6 months ago

Hi - any updates? This is causing an extreme slow down on my app and users have been reaching out to ask if the app is down it is loading so slowly.


Status changed to Awaiting Railway Response Railway 6 months ago


Noted- we can move your instances back to GCP if that’s okay with you so that your business is not affected in the short term. Is that doable?


Status changed to Awaiting User Response Railway 6 months ago


cerefre
PROOP

6 months ago

What would be the long term fix there? I thought all regions needed to be moved to metal by today.


Status changed to Awaiting Railway Response Railway 6 months ago


There is a core fix planned, we plan to ship a fix to the core FSwait on the fleet, but this is affecting a few workloads on the machines. However, the timeline is within a week and we are looking to delay the final call for certain customers impacted like you.

We would move you back as soon as we confirmed that we have a core fix out for Metal.


Status changed to Awaiting User Response Railway 6 months ago


cerefre
PROOP

6 months ago

I have migrated back to a nonmetal region - it is performing better already. Appreciate your fast responses, and would also appreciate being notified before any automatic migrations happen for this service.


Status changed to Awaiting Railway Response Railway 6 months ago


Good to hear. Will try out best to keep you in the know.


Status changed to Awaiting User Response Railway 6 months ago


cerefre
PROOP

6 months ago

We had a slight slowness reprieve but even off metal (for that one Redis service) still seeing much slower loading times. Are they other issues with metal too that might improve after the week?


Status changed to Awaiting Railway Response Railway 6 months ago


cerefre
PROOP

6 months ago

As an update: my app is completely unreachable now.


jake
EMPLOYEE

6 months ago

I've seen some success with a config I'm attempting to rollout for metal here. Let me try it on the cloud machine you're on for now.


Status changed to Awaiting User Response Railway 6 months ago


cerefre
PROOP

6 months ago

Hi Jake. Sounds great. Would be interested in any details or updates you can provide! Service is performing better now (still not as fast as before the switch to metal, but loading at all is much better than yesterday.)


Status changed to Awaiting Railway Response Railway 6 months ago


chandrika
EMPLOYEE

5 months ago

Awesome to hear, we improved our cluster configuration to help balance the load across the fleet better


Status changed to Awaiting User Response Railway 6 months ago


rendercoder
PRO

5 months ago

I encountered a similar-looking issue. It seems this Redis IO problem exists on METAL. Please refer to my case and provide a solution. Thank you.
https://station.railway.com/questions/unstable-metal-disk-io-happed-2-times-i-1a7ec197


Status changed to Awaiting Railway Response Railway 6 months ago


Status changed to Solved itsrems 6 months ago


cerefre
PROOP

5 months ago

Hm, tried to leave a reply earlier but maybe you can't reply after a thread has been marked as solved? I think this was marked as solved too early though! Would still like an update on the automigration back to metal and if all the issues have been resolved.


Status changed to Awaiting Railway Response Railway 5 months ago


itsrems
EMPLOYEE

5 months ago

Heya, sorry about that. As Chandrika mentioned, we've rolled these changes out across the fleet. sounds like you're all set w/performance ?

As for moving you back to metal, I can do that with your approval in about ~12h so our platform team is there in case anything goes wrong.


Status changed to Awaiting User Response Railway 5 months ago


cerefre
PROOP

5 months ago

works for me!


Status changed to Awaiting Railway Response Railway 5 months ago


itsrems
EMPLOYEE

5 months ago

migration ran successfully - your redis now lives in the same region as your other services.

Let me know if you need anything else!


Status changed to Awaiting User Response Railway 5 months ago


Status changed to Solved cerefre 5 months ago


Loading...