6 months ago
Hi, I'm using FlowiseAI hosted on Railway with GPT-4o (latest) and I want to handle user inputs like images and audio messages coming from WhatsApp.
Text-based flows work fine using the Conversational Retrieval QA Chain. However, I want to extend this to:
- Read and analyze image URLs (e.g., food photo, product label)
- Accept audio URLs and get transcription or direct response
So far, I’m sending structured content like this from my backend:
```json
[
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
]
3 Replies
6 months ago
Hey Badex,
I can't assist with this matter but I am wondering if there are other Flowise enthusiasts who can help. I have set a bounty of $20 to see if others can assist with getting you going. Best of success!
Thanks,
Angelo
6 months ago
Hello Badex,
So you want to accept the images and audio from whatsapp and run it through your backend before returning a response?
6 months ago
For images, you’re close. Just make sure Flowise is actually sending the image URL as an image input to GPT-4o, not just as a string. Sometimes you need to tweak the node or use a custom function so the OpenAI API gets the image in the right format (some Flowise templates only handle text).
For audio, GPT-4o doesn’t transcribe audio by itself. You’ll need to send the audio file to OpenAI’s Whisper API (or another speech-to-text service such as Assembly which is a lot better tbh), then get the text back, and then pass that text into your Flowise/GPT-4o flow.
If you have few stuff in Flowise, I recommend you try n8n.