Transcribe audio files or YouTube videos into text
Interact with a multimodal chatbot that analyzes images and text