Communication issues across multiple chatbot services deployed on Railway
hubility
HOBBYOP

2 years ago

Dear Railway Support Team,

I am writing to report a significant issue I've encountered with multiple chatbot services deployed on the Railway platform. Recently, all of these services have started experiencing communication problems, which are severely impacting their functionality and reliability.

Issue description:

All deployed chatbot services are facing intermittent communication failures.
The issue manifests as timeouts and failed requests when the chatbots attempt to communicate with external services or APIs.
This problem has been consistent across different times of the day and does not seem to be related to a specific deployment or configuration.

Troubleshooting steps taken:

  • Verified network configurations and ensured that there are no changes on our end that could cause these issues.

  • Restarted the services to check if the problem persists after a fresh deployment.

  • Checked Railway platform status and recent updates to identify any known issues or maintenance activities that might be related.

Impact:

  • The communication issues are severely affecting the user experience, leading to increased response times and service unavailability.

  • This has also hindered our ability to reliably update and maintain our chatbot services.

  • We would greatly appreciate your prompt attention to this matter. Could you please investigate the root cause of these communication issues and provide guidance on how to resolve them? Additionally, if there are any recommended best practices or configurations we should implement to prevent such issues in the future, we would be grateful for your advice.

Lack of error messages and diagnostic information:

Notably, there are no error messages appearing in the service logs that could provide insights into the root cause of the communication failures.
The absence of error logs or any other diagnostic information makes it challenging to identify and troubleshoot the issue on our end.
We have attempted to enable more verbose logging where possible, but this has not yielded any additional information that could help pinpoint the problem.
I'm particularly reliant on your support to understand the nature of these communication issues. Any insights or tools you could provide to help us monitor and diagnose these problems more effectively would be highly appreciated.

Critical migration of services with zero downtime requirement:

I would also like to bring to your attention that we are in the process of migrating some of our critical services to Railway. These services require strict uptime and cannot afford any interruptions or communication failures.
The current communication issues present a significant risk to this migration process, potentially impacting our operational continuity and service reliability to our users.
Ensuring a smooth migration with zero downtime is a top priority for us, and any assistance in mitigating these communication issues would be invaluable during this transition.
Given the critical nature of these services and the need for uninterrupted operation, your prompt and effective support in resolving these communication issues is crucial. We are committed to working closely with your team to address any challenges and ensure a successful migration.

Thank you for your support and assistance. We look forward to your response.

Note: i´m using Railway services for few months and it´s the first time we are experimenting this king off issues

Best regards!

5 Replies

ray-chen
EMPLOYEE

2 years ago

Please elaborate on the issues you're facing clearly and succinctly, and ideally without using ChatGPT to generate a wall-of-text that makes it hard for us to read.


hubility
HOBBYOP

2 years ago

I have deployed some chatbots for automated customer support, last Friday they started taking longer than usual to respond, or even left messages unanswered. When checking the logs, I noticed that some messages were not reaching the service without any error messages. I deployed on another provider, and the service was restored again. I also conducted tests locally and did not experience any delays.

The chatbot services receive messages through webhooks.

I apologize for sending such a lengthy text, but I wanted to explain some details and tests conducted because I do not have error information.

Today, the service appears to be working normally; I don't understand what could have happened.


brody
EMPLOYEE

2 years ago

when did your bots first start having communication failures?


hubility
HOBBYOP

2 years ago

I apologize for sending such a lengthy text, but I wanted to explain some details and tests conducted because I do not have error information.

I deployed the services in different environments and managed to identify a timeout issue that is NOT related to Railway services.

Thks! have a good day!


hubility
HOBBYOP

2 years ago

when did your bots first start having communication failures?

the last friday (02/02/2024) but i can confirm that it's not related to the Railway service.

Many thks!


Loading...