Should we expect an increase in bandwidth usage with V2 runtime?
timbo-tj
PROOP

2 years ago

Recently (amongst many other changes..!) swapped out gateway to V2. And we saw a large increase in bandwidth. Is this expected? Swapping back to legacy "fixed" it. I can investigate further and try to produce a "clean" repro (turn on V2, redeploy, run for a few hours, and revert) if this is not expected behaviour. My other services showed similar bandwidth increases, however the data there is much noisier due to all the other changes I was making at the time.

Project ID 4c3b4b0e-006a-407e-90c7-9c3031cd622f

101 Replies

2 years ago

!team


2 years ago

I would very much appreciate if you could come up with a clean way to reproduce this


timbo-tj
PROOP

2 years ago

im running a test today, will get back with result. i've just swapped our gateway over to v2 again. ill run it for a bit and show a comparison


2 years ago

im not too sure if a in use app is a very good reproducible example


timbo-tj
PROOP

2 years ago

i dont have time to put in any more effort, unfortunately. I am the only programmer at my studio and i am spread too thin. thats why we use platforms like railway!


timbo-tj
PROOP

2 years ago

so ALL I changed was the runtime to V2 on our app. nothing else was changed. I clicked V2, got prompted to deploy the changes, I hit ok and thats all


timbo-tj
PROOP

2 years ago

s

1249590999613964300


timbo-tj
PROOP

2 years ago

our bandwidth use just doubles when we enable v2 runtime, along with estiamte bill for the month etc


timbo-tj
PROOP

2 years ago

i am not reverting the change ($$$$) and can report back if and when the bandwidth drops back in line


timbo-tj
PROOP

2 years ago

project id is 4c3b4b0e-006a-407e-90c7-9c3031cd622f
and the service in question is 3545427b-d98c-42ec-b5ac-f9cc4326e3c4
if any railway dev wants to poke around and investigate


timbo-tj
PROOP

2 years ago

i guess its more than double..! almost triple 🙂


2 years ago

i created an example project, with 3 services, 2 services to download a file on a loop with a fixed download size and download speed, and the other service to serve the file, one of the download services used the legacy runtime, and the other used the v2 runtime.

i am unable to reproduce, in fact the v2 runtime uses a tiny bit less network


timbo-tj
PROOP

2 years ago

thanks for trying to reproduce it!


2 years ago

ignore the large bumps, i was dialing in the settings as to not rack up a massive bill

1249592402566578200


timbo-tj
PROOP

2 years ago

i dont know what it might be. but from what i understand legacy will eventually be disabled and we will be pushed onto v2. and v2 is supposed to 'just work' with no changes right? Its not something we need to concern our selves with?


timbo-tj
PROOP

2 years ago

nice yeah, do you have any idea what it might be?


timbo-tj
PROOP

2 years ago

perhaps regions are involved? we host on US East


2 years ago

i dont think its just the v2 runtime, with your app there are many other factors at play


timbo-tj
PROOP

2 years ago

maybe private traffic is being counted incorrectly in v2 runtime when your region is not the default


timbo-tj
PROOP

2 years ago

though again, with this one change, it will more than triple our bandwidth bill , and if its something thats just supposed to work then i think its something that railway may want to investigate before pushing it to their users?


2 years ago

private traffic shouldnt be counted at all


timbo-tj
PROOP

2 years ago

i know


2 years ago

but good idea


2 years ago

well v2 is the default for all new services


timbo-tj
PROOP

2 years ago

yeah I noticed - i recently split the responsibilites a DIFFERENT service (in the same project) in 2. the service was doing 2 jobs at once, essentially. A rest API and a socketio/realtime comms service (chat, etc). I basically added a switch to make the service act as one or the other, because i wanted to get a good idea how much of our bandiwdth bill was coming from the socketio/realtime stuff vs rest api external database queries


timbo-tj
PROOP

2 years ago

so anyway, that service was using LEGACY (its been around for a while)


timbo-tj
PROOP

2 years ago

i split it into two, made the existing service into Rest API only, and the NEW service I made into the socketio/realtime service..


timbo-tj
PROOP

2 years ago

the NEW service was automatically v2 runtime


timbo-tj
PROOP

2 years ago

bandwidth usage was HUGE


timbo-tj
PROOP

2 years ago

again, like a 3x jump in normal usage


timbo-tj
PROOP

2 years ago

i eventually figured out the v2 switch was '''''to blame'''''


timbo-tj
PROOP

2 years ago

set it to Legacy


timbo-tj
PROOP

2 years ago

and now old service + new service bandwidth = old combined service bnadwidth, as expeted


2 years ago

didnt you say that websocket connections failed on the v2 runtime, or was that the edge proxy?


timbo-tj
PROOP

2 years ago

no the websocket connections failed in 'edge proxy'


timbo-tj
PROOP

2 years ago

(Though I may have messed up my words, sorry - i was knee deep in a bunch of problems when i was debugging all that, as you can tell)


2 years ago

i ran my test with the edge proxy on, im going to disable that and try again


timbo-tj
PROOP

2 years ago

the gateway has edge proxy enabled! (the one from the test today)


timbo-tj
PROOP

2 years ago

my current config is:

Gateway: Edge Proxy ON, Runtime: Legacy
Rest Api: Edge Proxy ON, Runtime: Legacy
SocketIO: Edge Proxy OFF, Runtime: Legacy


2 years ago

what service are these graphs from?


timbo-tj
PROOP

2 years ago

the graphs are from API Gateway


timbo-tj
PROOP

2 years ago

here is the moments before i SPLIT my restapi+socket io service into TWO the other day


timbo-tj
PROOP

2 years ago

1249594680036036600


timbo-tj
PROOP

2 years ago

the purple lines are Socketio/rest api services (you can see where I split it into two (two purple lines) and enabled v2 runtime


2 years ago

did you ever get any errors from the socketio service when you said the edge proxy wasnt working for you?


timbo-tj
PROOP

2 years ago

and the YELLOW is my api gateway where i ALSO enabled v2 runtiem at the same time


timbo-tj
PROOP

2 years ago

then swiftly revereted v2 -> legacy, and you can see my traffic back to expected levels -- where API gateway (yellow) looks usual, and the two purple lines 'add up' to approx what the traffic was before the split


timbo-tj
PROOP

2 years ago

i have not investigated that yet, i am not sure when i will have a chance to at the moment, i will open a separate help thread for that if i can confirm that Edge Proxy -> ON just breaks my Socket IO functionality


timbo-tj
PROOP

2 years ago

just trying to think of what else is 'strange' about my setup, but, mm, the region being different is the only thing i can think of that isn't "stock standard". the nodejs app is just a nestjs app. especially the api gateway one is VERY straightforward and simple. it just proxies requests to one of 3 (dev/stg/prd) servers (using internal url) based on some headers in the request. it exposes a health check endpoint. it also exposes an endpoint to query info about the 3 servers. finally, it has a redis client connection to receive updates about changes to those 3 servers (rare occurance. once a week or two when I push out an update to the game)


timbo-tj
PROOP

2 years ago

API gateway, just now, 20 mins after reverting back to Legacy:

1249597313203175400


2 years ago

and just to be clear, the service works just fine on the v2 rutime right?


timbo-tj
PROOP

2 years ago

yeah!


timbo-tj
PROOP

2 years ago

OH


timbo-tj
PROOP

2 years ago

also


timbo-tj
PROOP

2 years ago

just remembered we also have an OTEL collector that my server is reporting its data too (again, internal)


timbo-tj
PROOP

2 years ago

so the start command for the api gateway is actually node --require '@opentelemetry/auto-instrumentations-node/register' dist/apps/ssr-api-gateway/main to do all the opentel auto instrument stuff


timbo-tj
PROOP

2 years ago

i can test with it disabled + v2 maybe


2 years ago

are you sure there arent errors anywhere, and something is going into a retry loop and bloating the bandwidth?


timbo-tj
PROOP

2 years ago

as a side node, for the window of time we were running on V2 runtime, the API gateways response time was great and so stable 😛

1249598410579771400


timbo-tj
PROOP

2 years ago

possible? I can never discount it i guess? but it would have to be in response to a client request. no one else pokes this server. just requests from clients in the game. and the request is then proxied and the response sent back


timbo-tj
PROOP

2 years ago

but logs are clean, and I DO get errors when proxies fail in other cases


2 years ago

then that rules that out


timbo-tj
PROOP

2 years ago

just double checking logs


timbo-tj
PROOP

2 years ago

yeah nothing suspecious


timbo-tj
PROOP

2 years ago

no errors in the last 5 hours, except for when i reverted back to legacy and redeployed 😛


timbo-tj
PROOP

2 years ago

ill try v2 runtime w/o otel instrumentation, just in case


timbo-tj
PROOP

2 years ago

<:3HC_Shrug:783043455382585354>


2 years ago

hypothetically, what would happen if railway isnt able to determine the cause of your increased network?


2 years ago

(its still the weekend, i cant bring anyone in yet anyway)


timbo-tj
PROOP

2 years ago

i would stick to legacy, and if legacy is going to be removed assuming the bandwidth costs dont come down by that time then we will have to leave! I know bare metal is around the corner though and I am in talks with some lovely folk at RW about trying it out for some of our bandwidth heavy services. (they've been lovely to deal with) we also have some major bandwidth optimisations coming soon so that will help bring the cost down too!


timbo-tj
PROOP

2 years ago

but to give you an idea, if our bandwidth use just tripled then we would be paying about 1500 USD for bandwidth which is, ugh, a LOT for us. It be worth investing effort to port somewhere else with cheaper costs at that point.


2 years ago

1500usd is nothing

1249601664302846000


timbo-tj
PROOP

2 years ago

haha what are YOU doing!


timbo-tj
PROOP

2 years ago

😆


2 years ago

tests to try and reproduce your issue


timbo-tj
PROOP

2 years ago

ah , network test 😉


timbo-tj
PROOP

2 years ago

oh boy, i hope railway slashes that bill for you for helping people out


2 years ago

conductors get a 100% off coupon


timbo-tj
PROOP

2 years ago

oh well choo choo


2 years ago

choo choo indeed


2 years ago

i will be bringing in char (who i think has the most to do with the v2 runtime) as soon as i feel he's available


timbo-tj
PROOP

2 years ago

no dice with disabling otel, still high bandwidth


timbo-tj
PROOP

2 years ago

sweet thanks, but no rush since we can just stick to legacy for now!


timbo-tj
PROOP

2 years ago

how curious hey


timbo-tj
PROOP

2 years ago

by the way, if i wanted to put together as minimal of a repo as i could, whats an easy way to load test?


timbo-tj
PROOP

2 years ago

spin up two services and get them to talk to each toher via public url. my repro service will be a stripped down nestjs rest api that """proxies""" messages


2 years ago

i just have this

1249604202481844200


2 years ago

the service in the middle serves a infinite file (in the sense that the response is null bytes) and the downloader services request a 1gb file from it on a loop and download it at a fixed 5MB/s


2 years ago

super controlled environment with no variables other than v2 or legacy


timbo-tj
PROOP

2 years ago

did you just whip up the code for that downloader service yourself?


2 years ago

yeah


timbo-tj
PROOP

2 years ago

sweet, cool


2 years ago

all go services


timbo-tj
PROOP

2 years ago

nice, i want to learn go. it seems nice, light, powerful


2 years ago

indeed it is


2 years ago

ill try to talk to char about this when hes in tmr


timbo-tj
PROOP

2 years ago

Which one? A through to Z. You've got a lot to pick from?


timbo-tj
PROOP

2 years ago

I'll see myself out


2 years ago

"this" being the topic of the title for this thread


timbo-tj
PROOP

2 years ago

I just meant because you were going to speak to "char"… You know what never mind. It was a terrible joke haha


2 years ago

ohhhhh I see what you mean


Loading...