Migration to Metal - Service became incredibly slow
speleodb
HOBBYOP

10 months ago

Back in December there was a forced migration of my services to METAL. Which ended up leading to catastrophic performance because only the webserver was migrated not the DB.

Now it's April - I just opted in "priority boarding" to test if the issues were fixed. And yes I can migrate the DB & all services. However performance overall is catastrophic. Thanks for allowing us to rollback that change. Very much appreciated I can take my time to debug this situation.

Any idea about what is happening here ?
Unless the CPU performance is catastrophically less it's probably again a database issue. Any idea ? I am now 100% sure they are in the same datacenter.

[If it helps I was doing a migration from US West Oregon to US East Virginia METAL]

Thanks a lot

47 Replies

speleodb
HOBBYOP

10 months ago

247e5ad5-c8c4-41c3-9205-3014cc65df86


speleodb
HOBBYOP

10 months ago

Based in florida - so we're good: https://rlwy-edge-info-production.up.railway.app/

edge=railway/us-east4-eqdc4a
zone=us-east4-eqdc4a
ip=::ffff:100.64.0.4
forwarded=129.222.201.212
hs=5cf4fc755882
req=W7udzeZKSuu_-klkpKirug_28081791

brody
EMPLOYEE

10 months ago

Hello!

So yes all your staging services are on Metal, but your domain still uses the GCP edge, would you please enable the Metal edge for your staging domain and let me know if that helps any?


speleodb
HOBBYOP

10 months ago

How do you do that ?


brody
EMPLOYEE

10 months ago

The little thunderbolt

1356495184250011600


speleodb
HOBBYOP

10 months ago

Oh gosh you guys need to put a warning or something. This is not obvious


brody
EMPLOYEE

10 months ago

Its a DNS update so please wait several minutes before giving your final verdict


brody
EMPLOYEE

10 months ago

its more like a beta feature right now


brody
EMPLOYEE

10 months ago

I live on the eastern side of the continent too, would you be able to provide two endpoints, one from staging and one from prod that I could test as well?


speleodb
HOBBYOP

10 months ago

Alright I migrated the production service (again) to METAL:

nslookup www.speleodb.org
Server:        1.0.0.1
Address:    1.0.0.1#53

Non-authoritative answer:
www.speleodb.org    canonical name = ukdjqx9p.up.railway.app.
ukdjqx9p.up.railway.app    canonical name = edge.railway.app.
Name:    edge.railway.app
Address: 66.33.22.4
Name:    edge.railway.app
Address: 66.33.22.3
Name:    edge.railway.app
Address: 66.33.22.1
Name:    edge.railway.app
Address: 66.33.22.2

It still is considerably slower. And I'm near certain it's the DB requests. Pages that are heavy on DB requests are now much slower to load.

I live on the eastern side of the continent too, would you be able to provide two endpoints, one from staging and one from prod that I could test as well?
I dont think you will see much. You need to authenticate and query some data to get some database interaction.

Can you check if there's anything incorrect between the "web frontend" and the PGSQL server ?


speleodb
HOBBYOP

10 months ago

I'm almost certain it's database related. A page that took < 1 sec to load now takes 5+ seconds to load.


brody
EMPLOYEE

10 months ago

Everything looks fine to me, you are sure you are connecting to the db via the private network right?


speleodb
HOBBYOP

10 months ago

postgresql://postgres:@postgres.railway.internal:5432/railway


brody
EMPLOYEE

10 months ago

Forgive me for asking, but are you positive your code is using that environment variable set in the service variables?


speleodb
HOBBYOP

10 months ago

Absolutely certain there's no other way for my database to "guess the credentials"


brody
EMPLOYEE

10 months ago

Not guess, but hardcoded in some way, I've seen it lots


speleodb
HOBBYOP

10 months ago

Nope nope.
Could it be that the "internal URL" goes first to the old datacenter and then back to US East ?


brody
EMPLOYEE

10 months ago

I can't see that possible, the IPv6 address has the target container encoded directly into it


brody
EMPLOYEE

10 months ago

Can you move the release command into a pre-deploy command and then disable the V2 builder


speleodb
HOBBYOP

10 months ago

I'm not sure if that's normal. But the traffic pattern of the DB service is very different. It used to never go flat. And there used to not be any yellow.
Is "outbound" outside of datacenter or outside of VM/container/server ?

1356501071442481200


speleodb
HOBBYOP

10 months ago

Because if it was to mean that the traffic is going "outside the datacenter" or outside some perimeter that would explain the problem


brody
EMPLOYEE

10 months ago

outbound is any traffic that leaves the private network


brody
EMPLOYEE

10 months ago

but lets eliminate all variables first, can you make the chagnes I suggested?


speleodb
HOBBYOP

10 months ago

I cant - no build are functioning at all


speleodb
HOBBYOP

10 months ago

Starting Container

panic: bad response code: 403



goroutine 1 [running]:

github.com/railwayapp/mono/packages/build-image/gateways/snapshot.NewFromURL({0x12ab820?, 0xc00053bb20}, {0x7fffb5f57438, 0x377}, {0x129e3c4, 0x1})

    /mono/packages/build-image/gateways/snapshot/main.go:85 +0x285

github.com/railwayapp/mono/packages/build-image/controller.RunBuild({0x12ab820, 0xc00053bb20}, {{0x7fffb5f57423, 0x6}, {0x129e3c4, 0x1}, {0x0, 0x0}, {0x7fffb5f57438, 0x377}, ...})

    /mono/packages/build-image/controller/main.go:28 +0x4f

main.main.func1(0xc000434080)

    /mono/packages/build-image/main.go:34 +0x368

github.com/urfave/cli/v2.(*Command).Run(0xc000193080, 0xc000434080, {0xc00065f880, 0x8, 0x8})

    /go/pkg/mod/github.com/urfave/cli/v2@v2.27.1/command.go:279 +0x9dd

github.com/urfave/cli/v2.(*Command).Run(0xc000193340, 0xc00066ffc0, {0xc00012e000, 0x9, 0x9})

    /go/pkg/mod/github.com/urfave/cli/v2@v2.27.1/command.go:272 +0xc2e

github.com/urfave/cli/v2.(*App).RunContext(0xc00021ea00, {0x12ab740?, 0x198ad80}, {0xc00012e000, 0x9, 0x9})

    /go/pkg/mod/github.com/urfave/cli/v2@v2.27.1/app.go:337 +0x5db

github.com/urfave/cli/v2.(*App).Run(...)

    /go/pkg/mod/github.com/urfave/cli/v2@v2.27.1/app.go:311

main.main()

    /mono/packages/build-image/main.go:49 +0x696

speleodb
HOBBYOP

10 months ago

Seems like the issue was limited to my staging environment. Let's see


brody
EMPLOYEE

10 months ago

Yep that exact error is why I want you (everyone) off the v2 builder


speleodb
HOBBYOP

10 months ago

Done. And still very slow


brody
EMPLOYEE

10 months ago

should i be looking at the prod service? its still using the v2 builder.


speleodb
HOBBYOP

10 months ago

Well I cant deactivate it


speleodb
HOBBYOP

10 months ago

On any service. I click on it and it reappears


brody
EMPLOYEE

10 months ago

mind if i try?


speleodb
HOBBYOP

10 months ago

Sure

maybe it's my railway.toml

# Documentation: https://docs.railway.com/reference/config-as-code#nixpacks-version
[build]
nixpacksPlan = { "providers" = ["python"] }
builder = "NIXPACKS"
nixpacksVersion = "1.30.0"                  # https://github.com/railwayapp/nixpacks/releases/
buildEnvironment = "V2"
nixpacksConfigPath = "./nixpacks.toml"

[deploy]
runtime = "V2"
numReplicas = 1
sleepApplication = false
restartPolicyType = "ON_FAILURE"
restartPolicyMaxRetries = 10

[variables]
NIXPACKS_PYTHON_VERSION = "3.13" # https://nixpacks.com/docs/providers/python

brody
EMPLOYEE

10 months ago

yeah that would do it


speleodb
HOBBYOP

10 months ago

What parameters need to go ?


brody
EMPLOYEE

10 months ago

you are also running post_compile twice?


brody
EMPLOYEE

10 months ago

just remove the line


speleodb
HOBBYOP

10 months ago

Doesnt really matter. But I removed the unnecessary one.

Which line ?


brody
EMPLOYEE

10 months ago

buildEnvironment = "V2"


speleodb
HOBBYOP

10 months ago

Alright. Building


speleodb
HOBBYOP

10 months ago

Looks like it broke the release … Now django panicks because it complains some artifacts are missing.


speleodb
HOBBYOP

10 months ago

Alright I found a workaround for tonight.
Looks like disabling the V2 builder did indeed reduce the latency. I think it's still a little slower but not 5x anymore


brody
EMPLOYEE

10 months ago

do you have any tracing in your app, otherwise we are just guessing here


jer-tan
HOBBY

10 months ago

Came here because my application also running slow, turned out it was because of this issue


brody
EMPLOYEE

10 months ago

May I ask what issue, I think we have covered a few topics in this thread.


jer-tan
HOBBY

10 months ago

The metal edge issue, I have enabled it but it didn't seems to help


brody
EMPLOYEE

10 months ago

Alright, let's go back to your thread


Loading...