a month ago
My Grist service is down after a scheduled restart. Both restart/redploy are not working. This is affecting production users. Please consider checking some state issue on your (Railway) end.
20 Replies
a month ago
Can you please quickly validate if this is a similar issue that I faced with my other service (minio bucket) recently?
https://discord.com/channels/713503345364697088/1459523759441449091/1460688951257076068
a month ago
No, abruptly went down, no error log
a month ago
This happened with my other service a couple of weeks back and Brody from Railway team had to update some status to fix it!
a month ago
I have shared the thread reference above
a month ago
I am seeing frequent out-of-memory notifications regarding the service. Also received emails
Attachments
a month ago
But I have already set the limit as 32GB and the average usage is around 4-10GB
a month ago
trying to bump this thread up!
a month ago
this is happening only when the current volume is mounted. If I remove the volume mount or use a fresh volume mount, it works.
a month ago
My document files and metadata files are in the volume mount - I cannot consider it alive or operational unless it works with the volume mount!
a month ago
Hey there, just for testing purposes can you try adding this env var: NODE_OPTIONS=--max-old-space-size=28672
This would allow the Grist NodeJs service to use 28GB of ram since it seems to spike to a max of 16GB.
If you can also share the logs from the crashed deployments, it would help us debug this further!
a month ago
I don't have any build logs (it is a public container) - no deploy logs either.
It should be something related to the disk - can you verify with brody once what was the state issue with the disk/service in the same project for the MinIO bucket service?
Also, don't you have the visibility into my service logs/history/deployments?
https://railway.com/project/d8a9cdda-ca19-44a5-814f-1ecaff088212/service/a72e891f-cb1e-460a-8e37-b8a31006f225?environmentId=82212507-616a-4d85-a946-87879ae84c13
I don't have any build logs (it is a public container) - no deploy logs either.It should be something related to the disk - can you verify with brody once what was the state issue with the disk/service in the same project for the MinIO bucket service?Also, don't you have the visibility into my service logs/history/deployments?https://railway.com/project/d8a9cdda-ca19-44a5-814f-1ecaff088212/service/a72e891f-cb1e-460a-8e37-b8a31006f225?environmentId=82212507-616a-4d85-a946-87879ae84c13
a month ago
What do you mean by public container?
There’s not much that anyone besides maybe railway staff can help with if there aren’t any logs to reference.
Is there anything in any sort of log that indicates why your service isn’t working with this existing volume? Does it have malformed data in it?
a month ago
I meant public-image no build step involved.
- I tried to import the same document into a different grist service mounted on different volume, it works absolutely fine there!
- I tried to unmount the volume and run the service - it comes up without an issue!
- I tried to mount the same volume to a new grist service - it hangs at the same place and fails with "Out of memory" error.
a month ago
I also tried to mount the old (non-working) volume on to a simple FileBrowser () service - but that too gets hanged at the same "Creating containers step"
a month ago
Hey Railway Team,
I suppose something wrong with the disk mounting logic (which usually also gets triggered during restart/redeploy I assume).
Can you please check and update?
a month ago
I just tried to create a volume backup and restore it to a new volume - looks like it did the trick - but not sure if there is any data corruption expected here!
a month ago
Im unfortunately not seeing too much indicating an issue on our end. Poked through logs etc and didnt see much.
If you run into this live again can you please let us know? Super sorry you're running into it, would love to walk through and make sure its not a reproducible issue.
Status changed to Awaiting User Response Railway • about 1 month ago
a month ago
This thread has been marked as solved automatically due to a lack of recent activity. Please re-open this thread or create a new one if you require further assistance. Thank you!
Status changed to Solved Railway • about 1 month ago