10 months ago
Hi, I'm using FlowiseAI hosted on Railway with GPT-4o (latest) and I want to handle user inputs like images and audio messages coming from WhatsApp.
Text-based flows work fine using the Conversational Retrieval QA Chain. However, I want to extend this to:
- Read and analyze image URLs (e.g., food photo, product label)
- Accept audio URLs and get transcription or direct response
So far, I’m sending structured content like this from my backend:
```json
[
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
]
3 Replies
10 months ago
Hey Badex,
I can't assist with this matter but I am wondering if there are other Flowise enthusiasts who can help. I have set a bounty of $20 to see if others can assist with getting you going. Best of success!
Thanks,
Angelo
10 months ago
Hello Badex,
So you want to accept the images and audio from whatsapp and run it through your backend before returning a response?
9 months ago
For images, you’re close. Just make sure Flowise is actually sending the image URL as an image input to GPT-4o, not just as a string. Sometimes you need to tweak the node or use a custom function so the OpenAI API gets the image in the right format (some Flowise templates only handle text).
For audio, GPT-4o doesn’t transcribe audio by itself. You’ll need to send the audio file to OpenAI’s Whisper API (or another speech-to-text service such as Assembly which is a lot better tbh), then get the text back, and then pass that text into your Flowise/GPT-4o flow.
If you have few stuff in Flowise, I recommend you try n8n.