Recent change to Railway to cause app cron or ENV issues?

rjbathgate

PROOP

4 months ago

Hey,

NOTE: this is referring to crontab with my application, not the Cron schedule feature in Railway.

With my application, deployed by Docker image, I have a cron.php file which, when run, connects to our MySQL container (using the ENV variables set in the Railway config) and then runs a whole bunch of stuff.

And this is triggered by a crontab.

This is set up during the docker compose build.

Dockerfile snippet:
# ENTRYPOINTS to enable cron and apache

ADD entrypoint.sh /entrypoint.sh

RUN chmod +x /entrypoint.sh

ENTRYPOINT /entrypoint.sh

Entrypoint:

Cannot post exact contents as CloudFlare seems to block it - likely due to the code it contains (filepath to php perhaps?) so a slightly edited version:
* * * * * pathToPHP pathToCron.php >> pathToLog.log 2>&1

service cron start

This successfully adds the cron (crontab -l after deploying confirms it exists.)

HOWEVER, recently (since yesterdayish, I think), upon deployment, whilst the cron is added to the crontab, when it runs we get MySQL connection errors from cron.php -- suggesting it is NOT getting the ENV variables we have set in Railway.

Each subsequent run from crontab has the same issue.

BUT, within the CLI, if we manually run it with the same command:

pathToPHP pathToCron.php

not only does it work (successfully connects to MySQL and therefore we can presume it has the environment variables), but it also means that any subsequent automatic (cron) run of the same command also works.

The fact that running the same command manually then fixes the crontab runs is a complete mystery to me.

If we run this command within our entrypoint.sh upon deployment, it also resolves the issue, and the automatic runs subsequently work, for example:
* * * * * pathToPHP pathToCron.php >> pathToLog.log 2>&1

service cron start

"Running warm-up cron.php"

pathToPHP pathToCron.php

I know it might be application-specific, and hard to debug without that knowledge, but this is only impacting one of our services (all others have the same set-up, and in the same project, with the same ENV vars, and Dockerfile formats) - and it happens to be the service that we've recently redeployed.

AND it has not been an issue with previous deployments of this same application prior to a few days ago.

This is what ChatGPT thinks may have caused it:

Railway may have changed something recently — likely one of:

1. Container startup order / timing

Previously, your cron job ran after DNS and internal networking were fully ready.
Now, Railway’s image or scheduler might be launching cron immediately after container start, before DNS is usable.

2. Base image change (e.g., Alpine, Debian)

If Railway upgraded the base image (e.g., from Alpine 3.18 → 3.19), the musl libc DNS resolver may behave differently.
musl is known for having strict/limited DNS resolution — sometimes it doesn't retry or cache like glibc does.

3. DNS infrastructure / nameserver change

Internal name resolution for *.railway.internal may now behave differently (e.g., longer delay before it's available in /etc/resolv.conf).

4. Different container cold-start behavior

Your container may now start with fewer pre-initialized network services or preloaded caches (i.e., “colder” cold starts).
This means your first cron job hits a race condition that it didn’t previously.

Any thoughts, or ideas if anything has changed at Railway's end would be appreciated. Whilst I have a fix with my updated Docker file, I would like to understand why.

Solved

5 Replies

Railway

BOT

4 months ago

Hey there! We've found the following might help you get unblocked faster:

If you find the answer from one of these, please let us know by solving the thread!

brody

EMPLOYEE

4 months ago

Hello,

1. DNS is usable immediately at startup; the network interface railnet0 is attached to the container before the container is started. This hasn't changed for quite some time.
2. You are using a Dockerfile, so this change is something you would have had to make.
3. We have also not made any changes to the internal DNS resolver recently.
4. No changes have been made to the container initialization process either.

To give me a better idea of what is happening, could you please link to the service in question?

Best,
Brody

Status changed to Awaiting User Response Railway • 5 months ago

rjbathgate

PROOP

4 months ago

Hey

Service is: 3d869c6b-bcf5-41f2-a908-19f614381b6b

The deploy (6da068ae-e0dd-40c5-84a6-62270f9dcd55) includes the warmup in the entrypoint, making the cron work.

The two prior to that, deploy 57607a3d-59d2-44bd-9700-e1df2fccfc73 and b21fa47a-3309-445b-b87b-45dfef5207bb did not have the warmup, and they exhibited the weird behaviour as described.

Evreything earlier than that, such as deploy f0a15223-823b-4d97-91e0-69f36e81b4fe also did not have the warm up, but did not exhibit this strange behaviour.

Not sure if you can see anything extra with this info?

Note there are a few newer deploys (since 6da068ae), which is me testing a few different things around this.

Thanks

Rob

Status changed to Awaiting Railway Response Railway • 5 months ago

itsrems

EMPLOYEE

4 months ago

As mentioned earlier, nothing changed on our end.

Have you been able to revert to the previously working commit to test for a regression made on your end?

Best,
Nico

Status changed to Awaiting User Response Railway • 5 months ago

rjbathgate

PROOP

4 months ago

Hey

Really really strange, as if I deploy a previously working commit/image, the issue is still there.

Gremlins.

I'll chalk it down to that, and be happy with my warmup added to the Dockerfile!

Cheers

Status changed to Awaiting Railway Response Railway • 5 months ago

Status changed to Solved rjbathgate • 5 months ago