Redis cluster memory overcommit

alphalinkPRO

7 months ago

Hi!

I'm using the high availability Redis cluster. When I load big amounts of data into master, it crashes without any comprehensive error.

When redis cluster starts, I noticed that it creates the following log message.

1:C 03 Oct 2024 13:48:10.983 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.

It turns out that enabling memory overcommitting is recommended in linux environments by the Redis team, but seems like Bitnami doesn't do it by default.

I can't find the way to modify the /etc/sysctl.conf file inside my redis container. Could you please tell me how can I enable memory overcommitting in my redis instances?

Awaiting User Response

12 Replies

7 months ago

Hi Mohit,

Thank you for raising this issue. We're investigating and will follow up soon.

Regards,
Christian


Status changed to Awaiting User Response railway[bot] 7 months ago


7 months ago

Hi Mohit,

Thanks for the report, I will be digging into this. Would you be able to provide me with steps to reproduce the issue? Simply issuing many writes to the redis cluster, or is it one write of a large object? You say it crashes, does it write an error or does the cluster simply become inaccessible?

Thanks!

Melissa


7 months ago

Actually, it looks like the container is OOMing. I see that we had increased the memory limit on your old redis instance to 64GB, so I applied the same override to the new redis cluster nodes. Let me know if you continue to experience the issue.

Thanks,
Melissa


alphalinkPRO

7 months ago

Thanks for the report, I will be digging into this. Would you be able to provide me with steps to reproduce the issue? Simply issuing many writes to the redis cluster, or is it one write of a large object? You say it crashes, does it write an error or does the cluster simply become inaccessible?

Hi Melissa!

It was many writes to redis sorted sets. Each write could contain up to 2 or 3 Mb of data.


Status changed to Awaiting Railway Response railway[bot] 7 months ago


alphalinkPRO

7 months ago

Actually, it looks like the container is OOMing. I see that we had increased the memory limit on your old redis instance to 64GB, so I applied the same override to the new redis cluster nodes. Let me know if you continue to experience the issue.

Thank you. Testing with the new configuration.


Status changed to Awaiting User Response christian 7 months ago


alphalinkPRO

7 months ago

64Gb memory limit worked, thank you!

But what about my initial question? Redis team recommends enabling vm.overcommit_memory = 1 in redis instances. Is it possible to do it for our redis container?


Status changed to Awaiting Railway Response railway[bot] 7 months ago


7 months ago

It's not currently possible to set vm.overcommit_memory = 1 at Railway. We've created an internal note to explore adding it in the future.


Status changed to Awaiting User Response railway[bot] 7 months ago


Status changed to Solved christian 7 months ago


Status changed to Awaiting Railway Response alphalink 6 months ago


alphalinkPRO

6 months ago

Hi again.

Previously you increased the memory limit for our Redis instances to 64Gb, but yesterday at 9:36pm one of our instances (service 383d00e7-baab-4014-a951-34d9f24627ed) died and memory limit for it now is 32Gb, therefore it can't start.

Could you please increase the memory limit for this instance to 64 Gb again?

How can we make sure that this won't happen in future?

Is it possible to increase the memory limit to more than 64Gb in future?


6 months ago

Hey there,

I'm glad you wrote in here, I attempted to reach you via email yesterday!

We found that those Redis nodes in your project with the higher cap, were causing some issues with our workload scheduling, so we had to remove the limit override and reassess the solution. I found that there is a much better, andcost-efficient, way to address the issues you are running into.

Specifically, there is a maxmemory configuration that is not set by default in the Redis image.  Without this, Redis is free to consume as much memory as it has access to, which is what we've been seeing.  (Read more about that in Redis docs.)  

To address this, you can simply update your Redis nodes to set a maxmemory.  You can either connect to Redis and use CONFIG SET, or you can add a Start Command to the service like so (example):

/opt/bitnami/scripts/redis/entrypoint.sh /opt/bitnami/scripts/redis/run.sh --maxmemory 16gb

We suggest implementing the maxmemory configuration above, and set the

maxmemory to something that makes sense for your use case.

Apologies for the reversal in action, we had to act fast on it yesterday to avoid issues in the fleet. Let me know if you have any questions or need assistance implementing the above.


Status changed to Awaiting User Response railway[bot] 6 months ago


alphalinkPRO

6 months ago

Hi Melissa.

Thank you for returning back to me. I've tried setting the maxmemory setting to 31 GB first, then to 16, and now to 10Gb, but seems like it doesn't help. Redis cannot start with these settings. It tries to load data, but eventualy I see container event container died in the logs and redis restarts. Can you help me with solving this issue, please?

service id: 383d00e7-baab-4014-a951-34d9f24627ed


Status changed to Awaiting Railway Response railway[bot] 6 months ago


6 months ago

Looks like the RDB snapshot that is trying to load into memory is 51GB. Is the data in this snapshot critical, or would it be acceptable to start with a fresh data set?

I see you are already trying different maxmemory-policy settings, that's great and will be good going forward, but we need to address the large rdb file to enable Redis to startup.

If your app can stand having Redis start empty, we can disable RDB for now and allow it to startup. Then we can re-enable it to give us a clean slate going forward.

Per the docs: https://github.com/bitnami/containers/blob/main/bitnami/redis/README.md#configuration, looks like you can set `REDIS_RDB_POLICY_DISABLED` to yes to achieve this


Status changed to Awaiting User Response railway[bot] 6 months ago


6 months ago

As for the maxmemory setting, I think keeping it low is best for cost-efficiency, but it really depends on your needs. Probably starting low and then adjusting up will be best


Redis cluster memory overcommit - Railway Help Station