Photo by Pavel Danilyuk on Pexels
A user on Reddit is exploring AI solutions for video transcription, specifically seeking a model that can identify speakers using visual information. The poster has a screen recording of a Zoom meeting where individuals are visually distinct. They’re looking for a tool that can not only accurately transcribe the audio but also attribute the transcribed text to the correct speaker based on their on-screen presence. The user is also inquiring about any limitations on video length supported by such AI models. The original discussion can be found on Reddit: https://old.reddit.com/r/artificial/comments/1mysqgy/best_model_for_transcribing_videos/