a year ago
As Railway moves users to metal for hobby deployments, the issue of some higher latency db queries arises since metal doesn't support them yet. I can't exactly say the implementation or the forced change was well implemented (seen a lot of people complain), but I like looking at it with a positive mindset. After all, I'm glad Railway is going to get rid of GCP.
Now, is there a way to force Railway to keep services other than your database out of metal? I don't really get affected by the higher latency because I only use the platform for personal stuff, but I'd still like to keep my db query times as speedy as they were before.
40 Replies
a year ago
N/A
a year ago
we have indeed made a large oversight here, we should have never migrated any single service to metal.
what we should have done is check a project, and if there was any service in the project that couldn't be migrated to metal (databases) -- do not migrate anything in the project.
instead we only looked at individual services and if they didnt have a volume we migrated them.
we will be looking into allowing users to migrate back.
a year ago
It's understandable. Wanting to move away as quickly as possible from GCP is justified considering you'd rather have things run in your own hardware/datacenter, and I'm glad metal is finally a thing since I was waiting for it to come back to railway… even though it did ruin things a little bit .
But yeah, allowing us to migrate back would be great, or adding the check for additional services to see if they're unavailable in metal.
Lastly, is there any estimated ETA about introducing full metal support for all types of services? I've seen people say that it's right around the corner, but I couldn't find any fixed ETA, and I'd like to know if one has been published/provided by the team.
a year ago
just got word that hobby users are supposed to be allowed to move back to gcp, rubber banding the region settings back to metal was not at all intended
a year ago
so we will fix that, and then you can move your services back to gcp
a year ago
Sounds great 👍
a year ago
as for metal with volumes ETA, i dont have one, and its really hard to give one, it was supposed to be before EOY but hardware shipments pushed that ETA back
a year ago
sorry again for the inconvenience here, we could have done better, and will do better for any similar future migrations, in fact id be very surprised if we kicked off another migration before metal supports volumes
a year ago
Totally alright by me. I understand these sorts of changes & like I said I'm not someone that uses railway as their primary provider for a public service, I just use it to host the random crap I make haha
a year ago
Anyway, thanks for the answers (appreciated especially since it's the end of year and people are supposed to be taking breaks)
a year ago
thank you for being so understanding, ill let you know when we roll out the fix to allow you to choose gcp again
a year ago
Thank you too!
a year ago
@dan - you can switch back to GCP now
a year ago
Damn, that was quick
@Brody pushing such a change on a Friday would already be a bad idea … Even if everything goes smooth … But pushing such a massive change over the holidays … Dude … You guys know better … I'm actually very disappointed not that it stopped working but by the time it took to stop "blaming the users" and "provide a solution".
This is already something that should have gone through "tiered deployment" maybe with dedicated support to smoothen the kinks. But this was just a disaster both in strategy, timing and execution … I really hope you guys do better in the future…
And I mean it in a good way… I really like Railway, it would pain me to move to a different service provider because I can't trust the execution …
a year ago
you are right we could have done better, but for some context on why we did it on a holiday, we needed to lessen the burden on the GCP infrastructure as we where running out of compute overheard, if we hadn't there would have been a much bigger fire to fight, not an excuse at all, just hopefully you can see both sides to the story.
in the future we will be sure to make any migrations ahead of holidays, put out docs, put out warnings, and perhaps most importantly when we do start to migrate hobby to metal again, we will not migrate anything in a project if a service is depending on a database, and of course next time we will make sure you can always revert.
and im deeply sorry if i came off as blaming the users, i had helped many people reduce the latency by connecting to the database privately, so i was only sharing what worked for many others.
Thank you for sticking with us through this learning experience, we will do better!
It's all good mate 😉
Next time you do something. Here are a few ideas:
Take a very reduced number of customers and a support team. And "duplicate" the service instead of migrating them for a first time. Have your support team ready to jump and monitoring metrics to assess problems. Once good and stable, delete the old instances and done, if it's a disaster, well shift the DNS/load balancer and you're done.
Once the first X people have migrated and you're happy with the overall experience of your pilot users. Next batch 2-5x the size of your first batch. Hopefully that should go smooth. Work the kinks and proceed to the next group. Etc.
You guys have a great product & platform. I'm very happy to work with you 😉
Let's say that the timing and the "deal with it sorry it is how it is" (I dont think it was from you actually) could have been better
a year ago
we actually did roll out this change in steps, 25% -> 50% -> 75% -> 100% (percent of hobby workloads) somehow we only started seeing reports a while after the 100% rollout 🤔
Oh interesting. How long between each waves ?
I mean I'm not surprised … It makes so much sense ( to use waves )
For what it's worth … I'm impressed you guys ONLY had network latency issues for such a big migration … I would have screwed up 100x more stuff
a year ago
Let's say that the timing and the "deal with it sorry it is how it is" (I dont think it was from you actually) could have been better
that was from me, that was the attitude i took because i assumed our infra team intended hobby to not be able to revert, in hindsight i don't know why i thought that, and i should have asked and had it corrected, so my apologies there too
a year ago
im not infra and i wasn't paying too much attention to what infra was doing during that rollout (my mistake) so im not sure, but i can say with upmost certainly that it was not long enough between each step
a year ago
i didn't, will reply shortly
Oh and I might have missed it … Was that quite dramatic change was announce by email (before the change) ?
a year ago
it wasn't, only because we didnt foresee any impact, where we ever wrong
I guess I don't need to mention that you don't push such a huge migration (things always go wrong) without alerting your users ahaha
a year ago
the massive latency spike on the public network was due to outdated geoip info on our IP block, so requests where not being routed to the closest endpoint
[To be fair, I was guilty of that and it did cut in half my query time - but the 20ms overhead per query was still painful using the private URL]
a year ago
we had too much confidence after doing past hobby to metal migrations, we now know to always communicate no matter how confident we are
Btw do you know if the "new builder" (the only one that works with PRIVATE URL Networking) will get faster ?
a year ago
its currently in an unmaintained state, slower than the default builder indeed, the plan was to continue work on it once we have made the full move to metal
Thanks for allowing us to move back to GCP for the time being. Can confirm moving back is working. Also in my case latency dropped by nearly half, just confirming that one instance in Metal connecting to a database in GCP is slower.
That said, I'm looking forward to when we can deploy volumes on Metal so that we can move everything there together 👍