10 months ago
Back in December there was a forced migration of my services to METAL. Which ended up leading to catastrophic performance because only the webserver was migrated not the DB.
Now it's April - I just opted in "priority boarding" to test if the issues were fixed. And yes I can migrate the DB & all services. However performance overall is catastrophic. Thanks for allowing us to rollback that change. Very much appreciated I can take my time to debug this situation.
Any idea about what is happening here ?
Unless the CPU performance is catastrophically less it's probably again a database issue. Any idea ? I am now 100% sure they are in the same datacenter.
[If it helps I was doing a migration from US West Oregon to US East Virginia METAL]
Thanks a lot
47 Replies
Based in florida - so we're good: https://rlwy-edge-info-production.up.railway.app/
edge=railway/us-east4-eqdc4a
zone=us-east4-eqdc4a
ip=::ffff:100.64.0.4
forwarded=129.222.201.212
hs=5cf4fc755882
req=W7udzeZKSuu_-klkpKirug_2808179110 months ago
Hello!
So yes all your staging services are on Metal, but your domain still uses the GCP edge, would you please enable the Metal edge for your staging domain and let me know if that helps any?
10 months ago
The little thunderbolt

Oh gosh you guys need to put a warning or something. This is not obvious
10 months ago
Its a DNS update so please wait several minutes before giving your final verdict
10 months ago
its more like a beta feature right now
10 months ago
I live on the eastern side of the continent too, would you be able to provide two endpoints, one from staging and one from prod that I could test as well?
Alright I migrated the production service (again) to METAL:
nslookup www.speleodb.org
Server: 1.0.0.1
Address: 1.0.0.1#53
Non-authoritative answer:
www.speleodb.org canonical name = ukdjqx9p.up.railway.app.
ukdjqx9p.up.railway.app canonical name = edge.railway.app.
Name: edge.railway.app
Address: 66.33.22.4
Name: edge.railway.app
Address: 66.33.22.3
Name: edge.railway.app
Address: 66.33.22.1
Name: edge.railway.app
Address: 66.33.22.2It still is considerably slower. And I'm near certain it's the DB requests. Pages that are heavy on DB requests are now much slower to load.
I live on the eastern side of the continent too, would you be able to provide two endpoints, one from staging and one from prod that I could test as well?
I dont think you will see much. You need to authenticate and query some data to get some database interaction.
Can you check if there's anything incorrect between the "web frontend" and the PGSQL server ?
I'm almost certain it's database related. A page that took < 1 sec to load now takes 5+ seconds to load.
10 months ago
Everything looks fine to me, you are sure you are connecting to the db via the private network right?
10 months ago
Forgive me for asking, but are you positive your code is using that environment variable set in the service variables?
Absolutely certain there's no other way for my database to "guess the credentials"
10 months ago
Not guess, but hardcoded in some way, I've seen it lots
Nope nope.
Could it be that the "internal URL" goes first to the old datacenter and then back to US East ?
10 months ago
I can't see that possible, the IPv6 address has the target container encoded directly into it
10 months ago
Can you move the release command into a pre-deploy command and then disable the V2 builder
I'm not sure if that's normal. But the traffic pattern of the DB service is very different. It used to never go flat. And there used to not be any yellow.
Is "outbound" outside of datacenter or outside of VM/container/server ?

Because if it was to mean that the traffic is going "outside the datacenter" or outside some perimeter that would explain the problem
10 months ago
outbound is any traffic that leaves the private network
10 months ago
but lets eliminate all variables first, can you make the chagnes I suggested?
Starting Container
panic: bad response code: 403
goroutine 1 [running]:
github.com/railwayapp/mono/packages/build-image/gateways/snapshot.NewFromURL({0x12ab820?, 0xc00053bb20}, {0x7fffb5f57438, 0x377}, {0x129e3c4, 0x1})
/mono/packages/build-image/gateways/snapshot/main.go:85 +0x285
github.com/railwayapp/mono/packages/build-image/controller.RunBuild({0x12ab820, 0xc00053bb20}, {{0x7fffb5f57423, 0x6}, {0x129e3c4, 0x1}, {0x0, 0x0}, {0x7fffb5f57438, 0x377}, ...})
/mono/packages/build-image/controller/main.go:28 +0x4f
main.main.func1(0xc000434080)
/mono/packages/build-image/main.go:34 +0x368
github.com/urfave/cli/v2.(*Command).Run(0xc000193080, 0xc000434080, {0xc00065f880, 0x8, 0x8})
/go/pkg/mod/github.com/urfave/cli/v2@v2.27.1/command.go:279 +0x9dd
github.com/urfave/cli/v2.(*Command).Run(0xc000193340, 0xc00066ffc0, {0xc00012e000, 0x9, 0x9})
/go/pkg/mod/github.com/urfave/cli/v2@v2.27.1/command.go:272 +0xc2e
github.com/urfave/cli/v2.(*App).RunContext(0xc00021ea00, {0x12ab740?, 0x198ad80}, {0xc00012e000, 0x9, 0x9})
/go/pkg/mod/github.com/urfave/cli/v2@v2.27.1/app.go:337 +0x5db
github.com/urfave/cli/v2.(*App).Run(...)
/go/pkg/mod/github.com/urfave/cli/v2@v2.27.1/app.go:311
main.main()
/mono/packages/build-image/main.go:49 +0x69610 months ago
Yep that exact error is why I want you (everyone) off the v2 builder
10 months ago
should i be looking at the prod service? its still using the v2 builder.
10 months ago
mind if i try?
Sure
maybe it's my railway.toml
# Documentation: https://docs.railway.com/reference/config-as-code#nixpacks-version
[build]
nixpacksPlan = { "providers" = ["python"] }
builder = "NIXPACKS"
nixpacksVersion = "1.30.0" # https://github.com/railwayapp/nixpacks/releases/
buildEnvironment = "V2"
nixpacksConfigPath = "./nixpacks.toml"
[deploy]
runtime = "V2"
numReplicas = 1
sleepApplication = false
restartPolicyType = "ON_FAILURE"
restartPolicyMaxRetries = 10
[variables]
NIXPACKS_PYTHON_VERSION = "3.13" # https://nixpacks.com/docs/providers/python10 months ago
yeah that would do it
10 months ago
you are also running post_compile twice?
10 months ago
just remove the line
10 months ago
buildEnvironment = "V2"
Looks like it broke the release … Now django panicks because it complains some artifacts are missing.
Alright I found a workaround for tonight.
Looks like disabling the V2 builder did indeed reduce the latency. I think it's still a little slower but not 5x anymore
10 months ago
do you have any tracing in your app, otherwise we are just guessing here
Came here because my application also running slow, turned out it was because of this issue
10 months ago
May I ask what issue, I think we have covered a few topics in this thread.
10 months ago
Alright, let's go back to your thread