Prometheus scrape jobs failing unexpectedly

bowtiedpickle

PROOP

2 years ago

I have a project with prometheus and grafana deployed, which scrapes metrics from several of my other Railway projects. Earlier today about 12 hrs ago, the prometheus instance stopped being able to scrape these endpoints. They are still being served correctly by the target projects, can be visited by browser.

Has anything changed on the railway backend which could be causing this issue?

Solved

12 Replies

nico

EMPLOYEE

2 years ago

Nothing should've changed you ability to query other services.

Could you share links to the services you were trying to query so we can take a look at them ? Thanks !

Status changed to Awaiting User Response Railway • almost 2 years ago

bowtiedpickle

PROOP

2 years ago

Ponder instances in environments able/baker/charlie in project ecb65c72-931f-428c-8070-5f123a0a60d2

Status changed to Awaiting Railway Response Railway • almost 2 years ago

nico

EMPLOYEE

2 years ago

Heya, we actually resolved a small issue with railway->railway communication over the public network earlier today

. could you confirm if you're still running into this issue ?

Status changed to Awaiting User Response Railway • almost 2 years ago

bowtiedpickle

PROOP

2 years ago

Yes, I am still having this issue.

Status changed to Awaiting Railway Response Railway • almost 2 years ago

nico

EMPLOYEE

2 years ago

Could you collect data from the requests made by your service, and get me an example of what the returned error is ?

Status changed to Awaiting User Response Railway • almost 2 years ago

bowtiedpickle

PROOP

2 years ago

Prometheus scrape target shows

Get "https://onlyapes-ponder-able.up.railway.app:80/metrics": http: server gave HTTP response to HTTPS client

Status changed to Awaiting Railway Response Railway • almost 2 years ago

nico

EMPLOYEE

2 years ago

this is purely a configuration error, you're making a call to port 80 (http port) to an https service. You should remove the :80 from the request url, or change it to 443 if required

Status changed to Awaiting User Response Railway • almost 2 years ago

bowtiedpickle

PROOP

2 years ago

It should be clarified that this has been working without any issue prior to the other day, with this same configuration.

Status changed to Awaiting Railway Response Railway • almost 2 years ago

nico

EMPLOYEE

2 years ago

Were you on the new edge proxy prior to this ? This is the only change that might've affected it (old proxy->new proxy).

I'm honestly unsure how https:// + port 80 ever worked 😅

Status changed to Awaiting User Response Railway • almost 2 years ago

bowtiedpickle

PROOP

2 years ago

I never explicitly set any proxy settings, I believe I was on the old proxy as I deployed this service several months ago when I don't believe you had rolled out the new proxy.

It's not deliberately set to use those ports/protocols, it's whatever prometheus defaults to. It was working before so your guess is as good as mine haha.

Status changed to Awaiting Railway Response Railway • almost 2 years ago

nico

EMPLOYEE

2 years ago

We have been cutting users over to the new proxy so I think you just got hit by the change.

Sorry for any impact !

Status changed to Awaiting User Response Railway • almost 2 years ago

bowtiedpickle

PROOP

2 years ago

Explicitly using HTTPS scheme in the Prometheus config seems to have fixed it. Thanks for your help.

Status changed to Awaiting Railway Response Railway • almost 2 years ago

Status changed to Solved nico • almost 2 years ago

Welcome!