Need help using Flowise + GPT-4o for image/audio input on Railway
badexltd
PROOP

6 months ago

Hi, I'm using FlowiseAI hosted on Railway with GPT-4o (latest) and I want to handle user inputs like images and audio messages coming from WhatsApp.

Text-based flows work fine using the Conversational Retrieval QA Chain. However, I want to extend this to:

- Read and analyze image URLs (e.g., food photo, product label)

- Accept audio URLs and get transcription or direct response

So far, I’m sending structured content like this from my backend:

```json

[

{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}

]

$20 Bounty

3 Replies

Hey Badex,

I can't assist with this matter but I am wondering if there are other Flowise enthusiasts who can help. I have set a bounty of $20 to see if others can assist with getting you going. Best of success!

Thanks,
Angelo


angobello
PRO

6 months ago

Hello Badex,

So you want to accept the images and audio from whatsapp and run it through your backend before returning a response?


testuser123
PRO

6 months ago

For images, you’re close. Just make sure Flowise is actually sending the image URL as an image input to GPT-4o, not just as a string. Sometimes you need to tweak the node or use a custom function so the OpenAI API gets the image in the right format (some Flowise templates only handle text).

For audio, GPT-4o doesn’t transcribe audio by itself. You’ll need to send the audio file to OpenAI’s Whisper API (or another speech-to-text service such as Assembly which is a lot better tbh), then get the text back, and then pass that text into your Flowise/GPT-4o flow.

If you have few stuff in Flowise, I recommend you try n8n.


Loading...