Analyze images to detect objects, points, keypoints, or text
Interact with a multimodal chatbot using text and images