Need help using Flowise + GPT-4o for image/audio input on Railway

badexltd

PROOP

10 months ago

Hi, I'm using FlowiseAI hosted on Railway with GPT-4o (latest) and I want to handle user inputs like images and audio messages coming from WhatsApp.

Text-based flows work fine using the Conversational Retrieval QA Chain. However, I want to extend this to:

- Read and analyze image URLs (e.g., food photo, product label)

- Accept audio URLs and get transcription or direct response

So far, I’m sending structured content like this from my backend:

```json

[

{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}

]

$20 Bounty

3 Replies

angelo-railway

EMPLOYEE

10 months ago

Hey Badex,

I can't assist with this matter but I am wondering if there are other Flowise enthusiasts who can help. I have set a bounty of $20 to see if others can assist with getting you going. Best of success!

Thanks,
Angelo

angobello

PRO

10 months ago

Hello Badex,

So you want to accept the images and audio from whatsapp and run it through your backend before returning a response?

testuser123

PRO

9 months ago

For images, you’re close. Just make sure Flowise is actually sending the image URL as an image input to GPT-4o, not just as a string. Sometimes you need to tweak the node or use a custom function so the OpenAI API gets the image in the right format (some Flowise templates only handle text).

For audio, GPT-4o doesn’t transcribe audio by itself. You’ll need to send the audio file to OpenAI’s Whisper API (or another speech-to-text service such as Assembly which is a lot better tbh), then get the text back, and then pass that text into your Flowise/GPT-4o flow.

If you have few stuff in Flowise, I recommend you try n8n.