Migration to Metal - Service became incredibly slow - Railway Central Station

Migration to Metal - Service became incredibly slow

speleodb

HOBBYOP

a year ago

Back in December there was a forced migration of my services to METAL. Which ended up leading to catastrophic performance because only the webserver was migrated not the DB.

Now it's April - I just opted in "priority boarding" to test if the issues were fixed. And yes I can migrate the DB & all services. However performance overall is catastrophic. Thanks for allowing us to rollback that change. Very much appreciated I can take my time to debug this situation.

Any idea about what is happening here ?

Unless the CPU performance is catastrophically less it's probably again a database issue. Any idea ? I am now 100% sure they are in the same datacenter.

[If it helps I was doing a migration from US West Oregon to US East Virginia METAL]

Thanks a lot

47 Replies

speleodb

HOBBYOP

a year ago

247e5ad5-c8c4-41c3-9205-3014cc65df86

speleodb

HOBBYOP

a year ago

Based in florida - so we're good: https://rlwy-edge-info-production.up.railway.app/

edge=railway/us-east4-eqdc4a
zone=us-east4-eqdc4a
ip=::ffff:100.64.0.4
forwarded=129.222.201.212
hs=5cf4fc755882
req=W7udzeZKSuu_-klkpKirug_28081791

a year ago

Hello!

So yes all your staging services are on Metal, but your domain still uses the GCP edge, would you please enable the Metal edge for your staging domain and let me know if that helps any?

speleodb

HOBBYOP

a year ago

How do you do that ?

a year ago

The little thunderbolt

1356495184250011678

speleodb

HOBBYOP

a year ago

Oh gosh you guys need to put a warning or something. This is not obvious

a year ago

Its a DNS update so please wait several minutes before giving your final verdict

a year ago

its more like a beta feature right now

a year ago

I live on the eastern side of the continent too, would you be able to provide two endpoints, one from staging and one from prod that I could test as well?

speleodb

HOBBYOP

a year ago

Alright I migrated the production service (again) to METAL:

nslookup www.speleodb.org
Server:        1.0.0.1
Address:    1.0.0.1#53

Non-authoritative answer:
www.speleodb.org    canonical name = ukdjqx9p.up.railway.app.
ukdjqx9p.up.railway.app    canonical name = edge.railway.app.
Name:    edge.railway.app
Address: 66.33.22.4
Name:    edge.railway.app
Address: 66.33.22.3
Name:    edge.railway.app
Address: 66.33.22.1
Name:    edge.railway.app
Address: 66.33.22.2

It still is considerably slower. And I'm near certain it's the DB requests. Pages that are heavy on DB requests are now much slower to load.

I live on the eastern side of the continent too, would you be able to provide two endpoints, one from staging and one from prod that I could test as well?

I dont think you will see much. You need to authenticate and query some data to get some database interaction.

Can you check if there's anything incorrect between the "web frontend" and the PGSQL server ?

speleodb

HOBBYOP

a year ago

I'm almost certain it's database related. A page that took < 1 sec to load now takes 5+ seconds to load.

a year ago

Everything looks fine to me, you are sure you are connecting to the db via the private network right?

speleodb

HOBBYOP

a year ago

postgresql://postgres:@postgres.railway.internal:5432/railway

a year ago

Forgive me for asking, but are you positive your code is using that environment variable set in the service variables?

speleodb

HOBBYOP

a year ago

Absolutely certain there's no other way for my database to "guess the credentials"

a year ago

Not guess, but hardcoded in some way, I've seen it lots

speleodb

HOBBYOP

a year ago

Nope nope.

Could it be that the "internal URL" goes first to the old datacenter and then back to US East ?

a year ago

I can't see that possible, the IPv6 address has the target container encoded directly into it

a year ago

Can you move the release command into a pre-deploy command and then disable the V2 builder

speleodb

HOBBYOP

a year ago

I'm not sure if that's normal. But the traffic pattern of the DB service is very different. It used to never go flat. And there used to not be any yellow.

Is "outbound" outside of datacenter or outside of VM/container/server ?

1356501071442481232

speleodb

HOBBYOP

a year ago

Because if it was to mean that the traffic is going "outside the datacenter" or outside some perimeter that would explain the problem

a year ago

outbound is any traffic that leaves the private network

a year ago

but lets eliminate all variables first, can you make the chagnes I suggested?

speleodb

HOBBYOP

a year ago

I cant - no build are functioning at all

speleodb

HOBBYOP

a year ago

Starting Container

panic: bad response code: 403

 

goroutine 1 [running]:

github.com/railwayapp/mono/packages/build-image/gateways/snapshot.NewFromURL({0x12ab820?, 0xc00053bb20}, {0x7fffb5f57438, 0x377}, {0x129e3c4, 0x1})

    /mono/packages/build-image/gateways/snapshot/main.go:85 +0x285

github.com/railwayapp/mono/packages/build-image/controller.RunBuild({0x12ab820, 0xc00053bb20}, {{0x7fffb5f57423, 0x6}, {0x129e3c4, 0x1}, {0x0, 0x0}, {0x7fffb5f57438, 0x377}, ...})

    /mono/packages/build-image/controller/main.go:28 +0x4f

main.main.func1(0xc000434080)

    /mono/packages/build-image/main.go:34 +0x368

github.com/urfave/cli/v2.(*Command).Run(0xc000193080, 0xc000434080, {0xc00065f880, 0x8, 0x8})

    /go/pkg/mod/github.com/urfave/cli/v2@v2.27.1/command.go:279 +0x9dd

github.com/urfave/cli/v2.(*Command).Run(0xc000193340, 0xc00066ffc0, {0xc00012e000, 0x9, 0x9})

    /go/pkg/mod/github.com/urfave/cli/v2@v2.27.1/command.go:272 +0xc2e

github.com/urfave/cli/v2.(*App).RunContext(0xc00021ea00, {0x12ab740?, 0x198ad80}, {0xc00012e000, 0x9, 0x9})

    /go/pkg/mod/github.com/urfave/cli/v2@v2.27.1/app.go:337 +0x5db

github.com/urfave/cli/v2.(*App).Run(...)

    /go/pkg/mod/github.com/urfave/cli/v2@v2.27.1/app.go:311

main.main()

    /mono/packages/build-image/main.go:49 +0x696

speleodb

HOBBYOP

a year ago

Seems like the issue was limited to my staging environment. Let's see

a year ago

Yep that exact error is why I want you (everyone) off the v2 builder

speleodb

HOBBYOP

a year ago

Done. And still very slow

a year ago

should i be looking at the prod service? its still using the v2 builder.

speleodb

HOBBYOP

a year ago

Well I cant deactivate it

speleodb

HOBBYOP

a year ago

On any service. I click on it and it reappears

a year ago

mind if i try?

speleodb

HOBBYOP

a year ago

Sure

maybe it's my railway.toml

# Documentation: https://docs.railway.com/reference/config-as-code#nixpacks-version
[build]
nixpacksPlan = { "providers" = ["python"] }
builder = "NIXPACKS"
nixpacksVersion = "1.30.0"                  # https://github.com/railwayapp/nixpacks/releases/
buildEnvironment = "V2"
nixpacksConfigPath = "./nixpacks.toml"

[deploy]
runtime = "V2"
numReplicas = 1
sleepApplication = false
restartPolicyType = "ON_FAILURE"
restartPolicyMaxRetries = 10

[variables]
NIXPACKS_PYTHON_VERSION = "3.13" # https://nixpacks.com/docs/providers/python

a year ago

yeah that would do it

speleodb

HOBBYOP

a year ago

What parameters need to go ?

a year ago

you are also running post_compile twice?

a year ago

just remove the line

speleodb

HOBBYOP

a year ago

Doesnt really matter. But I removed the unnecessary one.

Which line ?

a year ago

buildEnvironment = "V2"

speleodb

HOBBYOP

a year ago

Alright. Building

speleodb

HOBBYOP

a year ago

Looks like it broke the release ... Now django panicks because it complains some artifacts are missing.

speleodb

HOBBYOP

a year ago

Alright I found a workaround for tonight.

Looks like disabling the V2 builder did indeed reduce the latency. I think it's still a little slower but not 5x anymore

a year ago

do you have any tracing in your app, otherwise we are just guessing here

jer-tan

HOBBY

a year ago

Came here because my application also running slow, turned out it was because of this issue

a year ago

May I ask what issue, I think we have covered a few topics in this thread.

jer-tan

HOBBY

a year ago

The metal edge issue, I have enabled it but it didn't seems to help

a year ago

Alright, let's go back to your thread

Welcome!

Sign in to your Railway account to join the conversation.