a year ago
Suddently starting the last 48 hours, I've noticed old deployments have not been getting removed automatically and must be manually removed. This caused our service to go down as our pg connection count 10x'd from having 10 active deployments when I'd at most expect 2 (given we configured RAILWAYDEPLOYMENTDRAINING_SECONDS to be 600 (10 minutes)). Was there a behavior change? Is this a new bug? Project id b34282ae-5797-4e55-bd45-347f7a9fc694
11 Replies
a year ago
Hello!
Can you link to the specfic service in question?
Seems like it fixed itself but I want to understand was there some downtime on railways end? A misconfiguration on our end? How do we mitigate this from causing downtime in the future
a year ago
you do have your overlap time set to I think 600 seconds, could that be why?
a year ago
I was slightly mistaken, you have the draining seconds set to 600, that means the older deployment can run for 600 seconds after a new deployment rolls out, where as the default is 3 seconds
I mean you can see in the screenshot above that it was much much more than 600 seconds
a year ago
fair point, is this still happening?
a year ago
It's happening again. This is twice in two days our app has gone down because we max out db connections because these old containers stay up.
Attachments