2 years ago
ofc that doesn't happens locally. I'm test scaling a new compute heavy feature of my app before shipping and this is a blocker for me. This is a problem because I don't want to be billed 30gb for something I discard a few seconds later. I'm just looking at if this might be a bug and/or misuse before I start looking at alternative hoster (for this particular workload, I'm still happy for the rest) .
95 Replies
2 years ago
are you loading the file into memory?
2 years ago
and, reading a 30gb file from the disk to where?
my rust backend need to read all 180k file I have the attached volume with the service. They're about 170kb each
2 years ago
right but you are reading these files off of disk? where are they going beside into ram?
I'm reading like literally at /cache/something/somefile.mspack and deserializing it, doing compute and drop
2 years ago
so these files are loaded into memory then, I'm not sure why you are surprised that memory has increased?
once I have read and made use of the information how can I discard and let the unused ram go back ?
locally it is discarded right after each read, so my ram never go up. I can run this program with 1gb or less ram
2 years ago
that wouldn't be a platform specific question
Well if that doesn't sound like a bug/misuse to you, then I got my answer I guess 😄
2 years ago
yes unfortunately this would be a code issue
2 years ago
Yep, you will need to deallocate that in Rust
2 years ago
sounds like you aren't resolving your lifetimes afaik
I tried all I could think of really 😕 even with an explicit drop(variable_with_deserialized_data) at the end of each loop. Even after my job is done and clean it doesn't drop. I still want to precise that locally the same job never goes above a few mb of ram, on the same dataset.
2 years ago
nixpacks or dockerfile?
2 years ago
legacy or v2 runtime?
[build]
builder = "NIXPACKS"
[deploy]
startCommand = "./api"
healthcheckPath = "/health"
healthcheckTimeout = 100
[phases.setup]
nixpkgsArchive = 'a459b363de387c078080b719b30c54d8a79b4a3e'
nixPkgs = ["...", "ffmpeg"]
nixLibs = ["...", "dav1d"]not sure that matter tho, also I do my build on github action because they're not trivial and I do a final railway up -e {env} -s {service_id} to upload
2 years ago
thats going to cause railway to run another build
are you sure you're did not mix 2 problems 😄 I was on something unrelated to deployment, I don't have deployment problem
2 years ago
i know, im getting side tracked, just wanted to point it out
2 years ago
but when in doubt write a dockerfile that uses alpine instead of nixpacks
2 years ago
have seen alpine based dockerfiles help with strange memory issues before plenty of times
2 years ago
cant hurt
unfortunately did almost everything but for alpine I need some special target compilation and I can't get it to work easily with my rust code. got too much deps
I could try a docker image on ubuntu or something but at this point we're back to what nixpack does
2 years ago
ah yes the joys of rust, compiling
so I did a small experimentation, last peak is me running that heavy task (limited to 20k item to not go to 30gb again). I have changed the start command to while true; do free -h; sleep 5; done & ./api and here are the result. We can clearly see that the used column stay to 251gb, before and after the tasks (why does it show me the host machine spec's instead of my docker, no clue). But the column buff/cache is billed on me and indeed grew by a few gb.


2 years ago
buffer / cache is indeed included in the results of docker stats
I had some hope for a moment with O_DIRECT, it works locally, doesn't bump the buff/cache but on railway I get an error, I guess the filesystem doesn't accept this custom flag <:sossa:756985575243776191>
2 years ago
I wonder if the v2 runtime would also gather metrics that include the buffer / cache, but you would need to find a way to run your tests without a volume since any service with a volume defaults back to the legacy runtime despite the selector saying v2
2 years ago
Have you tried copying the volume’s contents into the container’s disk on startup?
2 years ago
In that case you may be able to do local file system file streaming in rust and it shouldn’t increase the memory
2 years ago
Local file system is a little different than docker mounted volumes i think
2 years ago
volumes are ext4 on legacy, and zfs on v2
2 years ago
I forgot railway doesn’t actually use docker
2 years ago
legacy runtime does
2 years ago
Man I don’t even use railway
2 years ago
🗣️
2 years ago
v2 uses podman
2 years ago
Man portable air defense?
2 years ago
Oh wait that’s manpad
Since I could not find a solution with railway for this particular thing, I bought a server somewhere else, although the disk io isn't as good as railway and this add me some complexity for deployment and monitoring :( But yeah I can't afford adding 200$+ to my monthly billing just to read a few files sometimes. I still use railway for a lot of other stuff and I'm happy with those. I just figured out that I should railway where it is helping me instead of trying to fight it. No hate, I can understand why buff/cached is counted. Just wanted to give an update for future people searching this thread.
2 years ago
i wrote a benchmark test (in Go) to write 30000 1MiB files (for a total of 30GiB) to disk at 250 files concurrently.
running this locally of course there was no wild increase in memory.
running this program on railway with the legacy runtime had my memory reach ~23GB.
i then switched to the v2 runtime and ran the program again and the memory never increased above ~45MB, it also wrote the data a bit faster.
i then re-ran the test on the v2 runtime but this time configured it to write 50GIB worth of files and still saw no wild increase in memory.
and just to be sure one last time, i switched back to the legacy runtime and ran the test to write the 50GIB worth of files (same concurrency), and the memory this time peaked at 32GB.
tl;dr this issue is fixed in the v2 runtime but only the legacy runtime has support for volumes (even if you select the v2 runtime and you have a volume it will run with the legacy runtime)
2 years ago
heres a memory graph that backs up these statements

2 years ago
yes I know I'm writing files instead of reading them, but the same issue is being surfaced
2 years ago
interesting
2 years ago
I need a sticker "V2 fixes it"
2 years ago
v2 builder / runtime / proxy has just simply fixed a lot of issue thus far
2 years ago
when is v2 going ga ?
2 years ago
asap once the bugs are fixed
2 years ago
nice
2 years ago
from the sounds of it the legacy runtime will not be running on bare metal
2 years ago
it sounds like bare metal will be v2 runtime exclusive
2 years ago
oh nice
2 years ago
We are taking our time btw
2 years ago
We aren’t going to V2 cutover until everything is polished
2 years ago
we aren't going to remove the legacy option*
2 years ago
Ofc we are moving fast- but migrating running workloads over will take time
2 years ago
Its likely that we have a one way migration and will remove the option, like we did for BuildPacks to NixPacks
2 years ago
thankfully I wasn't around for that 😆
2 years ago
It was very painful
2 years ago
But we did it
2 years ago
And we’ll accomplish this as well
2 years ago
I just wanna know if we will see V2 runtime support volumes before bare metal, it's whats stopping alaanor from running this project on railway
2 years ago
Ofc, this will be added before metal
2 years ago
Completely stateless workloads aren’t very useful
2 years ago
char said otherwise, unless I was misunderstanding him
2 years ago
Metal workloads will mostly be Trial and experimental workloads until we flight Railway Metal that requires volunes
2 years ago
If the servers are plugged in, why not serve from then
2 years ago
But we won’t stop the metal rollout until we have vol. support
2 years ago
right but people could benefit from volume support on the v2 runtime with the current gcp hosts, like OP, or everyone running uptime kuma that are getting EHOSTUNREACH
2 years ago
Yea, heard, it’s a, we are speedrunning all short coming fixes when we can
2 years ago
Never enough hands on boards
2 years ago
But its not a OR its a AND
2 years ago
Metal is happening on a different timeframe than V2 cutover
2 years ago
It just so happens that Metal will be V2 only, no sense in extending the lifetime of Legacy
2 years ago
ay at least i was right in that regard
2 years ago
either way, would you say it would be safe to mark this thread as solved, and would i be correct in saying this wont be getting fixed on the legacy runtime?
2 years ago
Yep
This is really cool, appreciate the finding a lot. Thanks 👍 I'll be checking the railway changelog for v2 with volume frequently and hopefully one day I can be fully back on railway :)
2 years ago
hopefully!
@Brody @angelo I saw v2 volume are now a thing and so I spent the day to setup the stuff and try on railway again but unfortunately it's still stuck this way. Not to complain or anything, just wanted to share that it did not magically fix that, as we though it perhaps would.
2 years ago
you might not be on the v2 runtime
at least the UI said I was on it, I remember you said that it might be a lie from the UI because it would not works with volume, but now we got volume on v2 so I though I could maybe trust that UI
2 years ago
not all of railway's hosts support volumes on the v2 runtime, a surefire way to be sure you are on the v2 runtime would be to check for container event logs like "starting container"
Just wanted to tell that I have finally moved back this particular service to railway and it's working great now 👍 thanks again for all
a year ago
that's awesome!
