Prometheus scrape jobs failing unexpectedly
bowtiedpickle
PROOP

a year ago

I have a project with prometheus and grafana deployed, which scrapes metrics from several of my other Railway projects. Earlier today about 12 hrs ago, the prometheus instance stopped being able to scrape these endpoints. They are still being served correctly by the target projects, can be visited by browser.

Has anything changed on the railway backend which could be causing this issue?

Solved

12 Replies

itsrems
EMPLOYEE

a year ago

Nothing should've changed you ability to query other services.

Could you share links to the services you were trying to query so we can take a look at them ? Thanks !


Status changed to Awaiting User Response Railway over 1 year ago


bowtiedpickle
PROOP

a year ago

Ponder instances in environments able/baker/charlie in project ecb65c72-931f-428c-8070-5f123a0a60d2


Status changed to Awaiting Railway Response Railway over 1 year ago


itsrems
EMPLOYEE

a year ago

Heya, we actually resolved a small issue with railway->railway communication over the public network earlier today
. could you confirm if you're still running into this issue ?


Status changed to Awaiting User Response Railway over 1 year ago


bowtiedpickle
PROOP

a year ago

Yes, I am still having this issue.


Status changed to Awaiting Railway Response Railway over 1 year ago


itsrems
EMPLOYEE

a year ago

Could you collect data from the requests made by your service, and get me an example of what the returned error is ?


Status changed to Awaiting User Response Railway over 1 year ago


bowtiedpickle
PROOP

a year ago

Prometheus scrape target shows

Get "https://onlyapes-ponder-able.up.railway.app:80/metrics": http: server gave HTTP response to HTTPS client


Status changed to Awaiting Railway Response Railway over 1 year ago


itsrems
EMPLOYEE

a year ago

this is purely a configuration error, you're making a call to port 80 (http port) to an https service. You should remove the :80 from the request url, or change it to 443 if required


Status changed to Awaiting User Response Railway over 1 year ago


bowtiedpickle
PROOP

a year ago

It should be clarified that this has been working without any issue prior to the other day, with this same configuration.


Status changed to Awaiting Railway Response Railway over 1 year ago


itsrems
EMPLOYEE

a year ago

Were you on the new edge proxy prior to this ? This is the only change that might've affected it (old proxy->new proxy).

I'm honestly unsure how https:// + port 80 ever worked sweat_smile emoji


Status changed to Awaiting User Response Railway over 1 year ago


bowtiedpickle
PROOP

a year ago

I never explicitly set any proxy settings, I believe I was on the old proxy as I deployed this service several months ago when I don't believe you had rolled out the new proxy.

It's not deliberately set to use those ports/protocols, it's whatever prometheus defaults to. It was working before so your guess is as good as mine haha.


Status changed to Awaiting Railway Response Railway over 1 year ago


itsrems
EMPLOYEE

a year ago

We have been cutting users over to the new proxy so I think you just got hit by the change.

Sorry for any impact !


Status changed to Awaiting User Response Railway over 1 year ago


bowtiedpickle
PROOP

a year ago

Explicitly using HTTPS scheme in the Prometheus config seems to have fixed it. Thanks for your help.


Status changed to Awaiting Railway Response Railway over 1 year ago


Status changed to Solved itsrems over 1 year ago


Loading...