Reading ~30gb sequentially from the volume make the ram go to ~30gb
alaanor
HOBBYOP

2 years ago

ofc that doesn't happens locally. I'm test scaling a new compute heavy feature of my app before shipping and this is a blocker for me. This is a problem because I don't want to be billed 30gb for something I discard a few seconds later. I'm just looking at if this might be a bug and/or misuse before I start looking at alternative hoster (for this particular workload, I'm still happy for the rest) .

95 Replies

alaanor
HOBBYOP

2 years ago

8bccf693-4059-4ef3-9dd0-55493979fdb7


alaanor
HOBBYOP

2 years ago

1251260843388309800


alaanor
HOBBYOP

2 years ago

I should note that it does get stuck there and will not go down again


2 years ago

are you loading the file into memory?


2 years ago

and, reading a 30gb file from the disk to where?


alaanor
HOBBYOP

2 years ago

my rust backend need to read all 180k file I have the attached volume with the service. They're about 170kb each


alaanor
HOBBYOP

2 years ago

locally it goes fine and doesn't fill up my ram


2 years ago

right but you are reading these files off of disk? where are they going beside into ram?


alaanor
HOBBYOP

2 years ago

I'm not sure to understand the question


alaanor
HOBBYOP

2 years ago

I'm reading like literally at /cache/something/somefile.mspack and deserializing it, doing compute and drop


alaanor
HOBBYOP

2 years ago

/cache is where my volume is mounted at


2 years ago

so these files are loaded into memory then, I'm not sure why you are surprised that memory has increased?


alaanor
HOBBYOP

2 years ago

once I have read and made use of the information how can I discard and let the unused ram go back ?


alaanor
HOBBYOP

2 years ago

locally it is discarded right after each read, so my ram never go up. I can run this program with 1gb or less ram


2 years ago

that wouldn't be a platform specific question


alaanor
HOBBYOP

2 years ago

Well if that doesn't sound like a bug/misuse to you, then I got my answer I guess 😄


2 years ago

yes unfortunately this would be a code issue


Yep, you will need to deallocate that in Rust


sounds like you aren't resolving your lifetimes afaik


alaanor
HOBBYOP

2 years ago

I tried all I could think of really 😕 even with an explicit drop(variable_with_deserialized_data) at the end of each loop. Even after my job is done and clean it doesn't drop. I still want to precise that locally the same job never goes above a few mb of ram, on the same dataset.


2 years ago

nixpacks or dockerfile?


2 years ago

legacy or v2 runtime?


alaanor
HOBBYOP

2 years ago

[build]
builder = "NIXPACKS"

[deploy]
startCommand = "./api"
healthcheckPath = "/health"
healthcheckTimeout = 100

[phases.setup]
nixpkgsArchive = 'a459b363de387c078080b719b30c54d8a79b4a3e'
nixPkgs = ["...", "ffmpeg"]
nixLibs = ["...", "dav1d"]

not sure that matter tho, also I do my build on github action because they're not trivial and I do a final railway up -e {env} -s {service_id} to upload


2 years ago

thats going to cause railway to run another build


alaanor
HOBBYOP

2 years ago

are you sure you're did not mix 2 problems 😄 I was on something unrelated to deployment, I don't have deployment problem


2 years ago

i know, im getting side tracked, just wanted to point it out


2 years ago

but when in doubt write a dockerfile that uses alpine instead of nixpacks


2 years ago

have seen alpine based dockerfiles help with strange memory issues before plenty of times


alaanor
HOBBYOP

2 years ago

I could give a try


2 years ago

cant hurt


alaanor
HOBBYOP

2 years ago

unfortunately did almost everything but for alpine I need some special target compilation and I can't get it to work easily with my rust code. got too much deps


alaanor
HOBBYOP

2 years ago

I could try a docker image on ubuntu or something but at this point we're back to what nixpack does


2 years ago

ah yes the joys of rust, compiling


alaanor
HOBBYOP

2 years ago

so I did a small experimentation, last peak is me running that heavy task (limited to 20k item to not go to 30gb again). I have changed the start command to while true; do free -h; sleep 5; done & ./api and here are the result. We can clearly see that the used column stay to 251gb, before and after the tasks (why does it show me the host machine spec's instead of my docker, no clue). But the column buff/cache is billed on me and indeed grew by a few gb.

1251501391566868500
1251501391885893600


2 years ago

buffer / cache is indeed included in the results of docker stats


alaanor
HOBBYOP

2 years ago

I had some hope for a moment with O_DIRECT, it works locally, doesn't bump the buff/cache but on railway I get an error, I guess the filesystem doesn't accept this custom flag <:sossa:756985575243776191>


alaanor
HOBBYOP

2 years ago

this is pain


2 years ago

I wonder if the v2 runtime would also gather metrics that include the buffer / cache, but you would need to find a way to run your tests without a volume since any service with a volume defaults back to the legacy runtime despite the selector saying v2


2 years ago

Have you tried copying the volume’s contents into the container’s disk on startup?


2 years ago

In that case you may be able to do local file system file streaming in rust and it shouldn’t increase the memory


2 years ago

Local file system is a little different than docker mounted volumes i think


2 years ago

volumes are ext4 on legacy, and zfs on v2


2 years ago

I forgot railway doesn’t actually use docker


2 years ago

legacy runtime does


2 years ago

Man I don’t even use railway


2 years ago

🗣️


2 years ago

v2 uses podman


2 years ago

Man portable air defense?


2 years ago

Oh wait that’s manpad


alaanor
HOBBYOP

2 years ago

Since I could not find a solution with railway for this particular thing, I bought a server somewhere else, although the disk io isn't as good as railway and this add me some complexity for deployment and monitoring :( But yeah I can't afford adding 200$+ to my monthly billing just to read a few files sometimes. I still use railway for a lot of other stuff and I'm happy with those. I just figured out that I should railway where it is helping me instead of trying to fight it. No hate, I can understand why buff/cached is counted. Just wanted to give an update for future people searching this thread.


2 years ago

i wrote a benchmark test (in Go) to write 30000 1MiB files (for a total of 30GiB) to disk at 250 files concurrently.

running this locally of course there was no wild increase in memory.

running this program on railway with the legacy runtime had my memory reach ~23GB.

i then switched to the v2 runtime and ran the program again and the memory never increased above ~45MB, it also wrote the data a bit faster.

i then re-ran the test on the v2 runtime but this time configured it to write 50GIB worth of files and still saw no wild increase in memory.

and just to be sure one last time, i switched back to the legacy runtime and ran the test to write the 50GIB worth of files (same concurrency), and the memory this time peaked at 32GB.

tl;dr this issue is fixed in the v2 runtime but only the legacy runtime has support for volumes (even if you select the v2 runtime and you have a volume it will run with the legacy runtime)


2 years ago

heres a memory graph that backs up these statements

1252002120341913900


2 years ago

yes I know I'm writing files instead of reading them, but the same issue is being surfaced


2 years ago

interesting


2 years ago

I need a sticker "V2 fixes it"


2 years ago

v2 builder / runtime / proxy has just simply fixed a lot of issue thus far


2 years ago

when is v2 going ga ?


2 years ago

asap once the bugs are fixed


2 years ago

nice


2 years ago

from the sounds of it the legacy runtime will not be running on bare metal


2 years ago

it sounds like bare metal will be v2 runtime exclusive


2 years ago

oh nice


We are taking our time btw


We aren’t going to V2 cutover until everything is polished


2 years ago

we aren't going to remove the legacy option*


Ofc we are moving fast- but migrating running workloads over will take time


Its likely that we have a one way migration and will remove the option, like we did for BuildPacks to NixPacks


2 years ago

thankfully I wasn't around for that 😆


It was very painful


But we did it


And we’ll accomplish this as well


2 years ago

I just wanna know if we will see V2 runtime support volumes before bare metal, it's whats stopping alaanor from running this project on railway


Ofc, this will be added before metal


Completely stateless workloads aren’t very useful


2 years ago

char said otherwise, unless I was misunderstanding him


Metal workloads will mostly be Trial and experimental workloads until we flight Railway Metal that requires volunes


If the servers are plugged in, why not serve from then


But we won’t stop the metal rollout until we have vol. support


2 years ago

right but people could benefit from volume support on the v2 runtime with the current gcp hosts, like OP, or everyone running uptime kuma that are getting EHOSTUNREACH


Yea, heard, it’s a, we are speedrunning all short coming fixes when we can


Never enough hands on boards


But its not a OR its a AND


Metal is happening on a different timeframe than V2 cutover


It just so happens that Metal will be V2 only, no sense in extending the lifetime of Legacy


2 years ago

ay at least i was right in that regard


2 years ago

either way, would you say it would be safe to mark this thread as solved, and would i be correct in saying this wont be getting fixed on the legacy runtime?



alaanor
HOBBYOP

2 years ago

This is really cool, appreciate the finding a lot. Thanks 👍 I'll be checking the railway changelog for v2 with volume frequently and hopefully one day I can be fully back on railway :)


2 years ago

hopefully!


alaanor
HOBBYOP

2 years ago

@Brody @angelo I saw v2 volume are now a thing and so I spent the day to setup the stuff and try on railway again but unfortunately it's still stuck this way. Not to complain or anything, just wanted to share that it did not magically fix that, as we though it perhaps would.


2 years ago

you might not be on the v2 runtime


alaanor
HOBBYOP

2 years ago

at least the UI said I was on it, I remember you said that it might be a lie from the UI because it would not works with volume, but now we got volume on v2 so I though I could maybe trust that UI


2 years ago

not all of railway's hosts support volumes on the v2 runtime, a surefire way to be sure you are on the v2 runtime would be to check for container event logs like "starting container"


alaanor
HOBBYOP

a year ago

Just wanted to tell that I have finally moved back this particular service to railway and it's working great now 👍 thanks again for all


a year ago

that's awesome!


Loading...