RQ Worker Connects to Redis but Fails to Dequeue from Specific Queue (
summarize_tasks
)
bruumel
FREEOP

10 months ago

Hi Railway Community & Team,

I'm encountering a persistent issue with an RQ worker service not dequeuing jobs from a specific queue, even though the backend confirms enqueuing and the worker appears to connect and listen correctly.

Architecture:

* Backend Service: FastAPI (Python 3.11), deployed via Nixpacks. Enqueues jobs using rq library onto a shared Redis instance.

* Crawler Worker Service: Python RQ worker, deployed via Dockerfile from the main repo. Successfully processes jobs from the default queue on the shared Redis instance.

* Summarizer Worker Service (Problem Service): Python RQ worker, deployed via Dockerfile from a separate repository. Intended to process jobs from the summarize_tasks queue on the same shared Redis instance.

* Redis Service: Standard Railway Redis service add-on.

Problem Details:

1. The Backend service successfully enqueues jobs to the summarize_tasks queue. Backend logs confirm this:

Summarize Job X ... succesvol ge-enqueued met RQ ID: ...

2. The Summarizer Worker service deploys successfully using its Dockerfile. Logs confirm successful connection to the identical REDIS_URL and that it starts listening on the correct queue:

```

Successfully connected to Redis on redis://...

Worker will listen to queue: 'summarize_tasks'

Summarizer worker starting with name: summarizer-[UUID]...

INFO:rq.worker:*** Listening on summarize_tasks...

DEBUG:rq.worker:Dequeueing jobs on queues summarize_tasks and timeout 405

DEBUG:rq.queue:Starting BLMOVE operation for rq:queue:summarize_tasks with timeout of 405

```

3. However, the worker never dequeues any jobs. It repeatedly logs the BLMOVE timeout:

DEBUG:rq.queue:BLMOVE timeout, no jobs found on rq:queue:summarize_tasks

4. Jobs remain queued in the Supabase summarize_jobs database table.

5. Using the Railway Redis Data browser, the Redis list key rq:queue:summarize_tasksdoes not appear or is empty, even immediately after the backend logs successful enqueueing. Other related RQ keys (`rq:workers:summarize_tasks`, rq:finished:summarize_tasks) do exist for this queue.

Troubleshooting Done:

* Verified identical REDIS_URL for backend and worker.

* Verified compatible rq/`redis-py` versions.

* Moved summarizer worker to a separate repository.

* Confirmed correct Dockerfile build for the summarizer worker.

Confirmed the other* worker (`crawler-worker`) can dequeue from the default queue on the same Redis instance.

* Implemented unique UUID-based worker names.

* Refactored worker code to be fully synchronous.

* Added RQ debug logging, confirming the BLMOVE attempt on the correct key.

* Attempted explicitly checking/creating the queue list from the backend before enqueuing (didn't help, key still didn't appear reliably in Redis browser).

Request:

Could anyone provide insight into why jobs successfully enqueued by the backend to the summarize_tasks queue might not be appearing in the corresponding Redis list key (`rq:queue:summarize_tasks`) or why the worker, despite listening correctly, cannot dequeue from it? The default queue works fine between the backend and the crawler-worker.

Is there a potential Railway-specific Redis configuration, network policy, or known issue with RQ list operations that could cause this behaviour for only one specific queue?

Thanks!

Solved

5 Replies

bruumel
FREEOP

10 months ago

Update on this issue:

We've added further debugging and confirmed the following:

1. Backend Enqueue Confirmation: The backend service (`byodb`) successfully calls summarize_queue.enqueue(...) for the summarize_tasks queue. Logs explicitly show the job ID and RQ ID being generated, indicating RQ believes the enqueue was successful. Backend logs:

```

2025-05-05 18:52:54 - [Backend Job 25] DEBUG: Checking queue 'summarize_tasks' existence before enqueue...

2025-05-05 18:52:54 - [Backend Job 25] DEBUG: Queue key 'rq:queue:summarize_tasks' exists: 0

2025-05-05 18:52:54 - [Backend Job 25] DEBUG: Queue key didn't exist, trying dummy push to potentially create it...

2025-05-05 18:52:54 - [Backend Job 25] DEBUG: Dummy push/pop executed.

Summarize Job 25 voor domein 'andlight.dk' succesvol ge-enqueued met RQ ID: ce2d6153-13e4-4fd5-9d56-aa30606a25ef

```

2. Worker Still Sees No Jobs: The byodb-summarizer-worker (deployed from a separate repo, connecting to the same Redis instance with the sameREDIS_URL) starts correctly, connects to Redis, and actively listens on summarize_tasks. However, it consistently times out waiting for jobs:

```

INFO:rq.worker:*** Listening on summarize_tasks...

DEBUG:rq.worker:Dequeueing jobs on queues summarize_tasks and timeout 405

DEBUG:rq.queue:Starting BLMOVE operation for rq:queue:summarize_tasks with timeout of 405

DEBUG:rq.queue:BLMOVE timeout, no jobs found on rq:queue:summarize_tasks

```

3. Redis State: Inspecting Redis directly via the Railway Data browser confirms that the Redis list key rq:queue:summarize_tasks is either missing or empty immediately after the backend reports successful enqueueing. Other rq: keys related to this queue (like rq:workers:summarize_tasks, rq:finished:summarize_tasks) do exist.

Conclusion:

The evidence strongly suggests that jobs enqueued by the backend to the summarize_tasks queue are not persisting in the corresponding Redis list, or the list itself is not being created/made visible correctly within the Railway Redis service. Since the backend enqueue operation seems to succeed from RQ's perspective, and the worker connects but finds nothing, this points towards an issue at the Redis/platform level specific to this queue key, rather than an application code error in the enqueue/dequeue logic itself (especially since the default queue works fine for another worker).

Could there be an issue with Redis list creation/persistence or visibility for specific keys within the Railway Redis service? Any further diagnostic steps we can take?

Thanks!


10 months ago

Hey!

This seems like an issue with your project/application. Unfortunately, we're unable to offer first-party support for issues unrelated to the Railway product or platform.

Other communities such as Stackoverflow might be able to help you out further.

Best,
Brody


Status changed to Awaiting User Response Railway 10 months ago


bruumel
FREEOP

10 months ago

Hi Brody / Railway Team,

Thanks for the previous response suggesting it might be an application issue. We've continued debugging based on that assumption, and have a significant update which strongly points back towards a potential platform/Redis interaction issue specific to the queue name.

Workaround Test & Results:

As a diagnostic step, we modified both the backend service and the dedicated byodb-summarizer-worker service to use the `default` queue instead of summarize_tasks.

* Backend (`byodb` service): We changed Queue("summarize_tasks", ...) to Queue("default", ...) in routers/summarize.py. The backend logs confirmed it successfully enqueued the job to the default queue:

```

Backend/Summarize gebruikt nu Redis Queue 'default'.

Summarize Job 27 ... succesvol ge-enqueued met RQ ID: ...

```

* Worker (`byodb-summarizer-worker` service): We changed QUEUE_NAME = "summarize_tasks" to QUEUE_NAME = "default" in worker.py. The worker logs confirmed it started listening on the correct queue:

```

Worker will listen to queue: 'default'

INFO:rq.worker:*** Listening on default...

```

* Outcome: With this setup, the byodb-summarizer-workersuccessfully dequeued and started processing the job from the default queue almost immediately. The logs showed the expected progression:

```

2025-05-06 06:47:07 - [SummarizeJob 27] Stap 1: Ophalen job details...

INFO:httpx:HTTP Request: GET https://nnvhicilwncgqnidqtbt.supabase.co/rest/v1/summarize_jobs?select=user_id%2Cdomain%2Cdomain_pipeline_status_id%2Coptions&id=eq.27 "HTTP/2 200 OK"

# ... followed by further processing steps ...

```

(Note: The job eventually failed later due to an application error related to a missing DB column (`summary_text`), which is a separate issue we can fix in our code. The crucial point is that the worker did* receive and start the job from the default queue).*

Analysis:

This test demonstrates that:

1. The byodb-summarizer-workercan connect to the shared Redis instance correctly.

2. The byodb-summarizer-workercan successfully dequeue and process jobs when listening to the default queue.

3. The backend can successfully enqueue jobs that a worker can pick up (when using the default queue).

4. The only variable causing the "job not dequeued" issue is the specific queue name `summarize_tasks`.

Given that the rq:queue:summarize_tasks list key consistently fails to appear or persist in the Redis Data browser (as noted previously), while the default queue works reliably for both the crawler and (during this test) the summarizer worker on the same Redis instance, this strongly suggests an issue specific to the handling or visibility of the rq:queue:summarize_tasks list key within the Railway Redis service, rather than an application logic error in dequeuing.

Context & Importance:

We understand you have limited resources for application-specific support, especially for free-tier projects. However, this project (`byodb`, Project ID: e2c40f16-01be-4686-8992-e4a8297af74c) involves significant complexity with multiple interconnected services (FastAPI backend, React frontend, multiple Python workers, Supabase integration) processing potentially large amounts of data. We are currently evaluating Railway's capabilities for this workload. While currently on the free/trial tier during development, the architecture is designed for scaling, and a successful outcome here is critical for our decision to upgrade to a paid plan to support expected user growth. This Redis queue issue is currently the primary blocker for the entire summarization feature, which is core to the application.

Renewed Request:

Based on this new evidence that the worker can process jobs from the default queue but not from summarize_tasks on the same Redis instance, could you please reconsider if there might be a platform-level anomaly affecting the summarize_tasks Redis list key specifically?

Any insights or further diagnostic steps from your side regarding the Redis service behaviour would be greatly appreciated.

Thanks again for your time and consideration.

Regards,

Michiel


Status changed to Awaiting Railway Response Railway 10 months ago


bruumel
FREEOP

10 months ago

Hi Railway Team,

Following up on the issue where our worker fails to pick up jobs from the summarize_tasks queue. As previously noted, the same worker can process jobs successfully when using the default queue on the same Redis instance.

To further diagnose, we added a test endpoint in our backend service that bypasses RQ and interacts directly with Redis using redis-py. This endpoint attempts to:

  1. Connect to Redis.

  2. LPUSH a test item directly to the problematic key: rq:queue:summarize_tasks.

  3. Immediately check if the key EXISTS.

  4. Immediately get the content using LRANGE.

Results:
The test endpoint successfully connects to Redis. The LPUSH command reports success to the client (returns 1, indicating the list supposedly has one item). However, the subsequent EXISTS command returns 0 (key does not exist), and LRANGE returns an empty list ([]).

Here is the relevant JSON output from the test endpoint:

{
  "redis_url_found": true,
  "queue_key_tested": "rq:queue:summarize_tasks",
  "connection_status": "Success",
  "lpush_result_type": "int",
  "lpush_value_if_int": 1,
  "lpush_status": "Success (check value)",
  "exists_result": 0,
  "exists_status": "Success",
  "lrange_result": [],
  "lrange_status": "Success"
}

Backend logs confirm this sequence: LPUSH seems to succeed according to the client library, but the data doesn't actually persist under the key rq:queue:summarize_tasks immediately afterwards.

Conclusion:
This strongly suggests the issue is not with our application's ability to connect or send commands, but rather with how the Railway Redis service handles persistence or visibility specifically for the key rq:queue:summarize_tasks.

Could you please investigate if there are any platform-level configurations, restrictions, or known issues that might be affecting this specific Redis key name or pattern within your Redis service?

Thanks for your help!


bruumel
FREEOP

10 months ago

Hi all,

Just a final update on the strange issue where our worker couldn't read from the rq:queue:summarize_tasks queue, even though the default queue worked fine and direct Redis writes seemed okay but didn't persist for that specific key.

After exhausting debugging options in our code, we simply tried changing the queue name used by both the backend and worker from summarize_tasks to summarize_queue.

That immediately fixed the problem. The worker now picks up jobs from summarize_queue without any issues.

It's bizarre that the specific name "summarize_tasks" was apparently the sole cause of the problem within the Redis service environment. While we're happy to have a working solution now by using the new name, it strongly suggests a platform-level quirk related to that particular key name.

Thanks for the earlier suggestions. We'll proceed with the new queue name.


Status changed to Solved bruumel 10 months ago


Loading...