upvote
If you use something like youtube-dlp you can download the audio from the meetings, and you could try things out in mistrals ai studio.

You could use their api (they have this snippet):

```curl -X POST "https://api.mistral.ai/v1/audio/transcriptions" \ -H "Authorization: Bearer $MISTRAL_API_KEY" \ -F model="voxtral-mini-latest" \ -F file=@"your-file.m4a" \ -F diarize=true \ -F timestamp_granularities="segment"```

In the api it took 18s to do a 20m audio file I had lying around where someone is reviewing a product.

There will, I'm sure, be ways of running this locally up and available soon (if they aren't in huggingface right now) but the API is $0.003/min. If it's something like 120 meetings (10 years of monthly ones) then it's roughly $20 if the meetings are 1hr each. Depending on whether they're 1 or 10 hours (or if they're weekly or monthly but 10 parallel sessions or something) then this might be a price you're willing to pay if you get the results back in an afternoon.

edit - their realtime model can be run with vllm, the batch model is not open

reply
- get an API key for this service

- make sure you have a list of all these YouTube meeting URLs somewhere

- ask your preferred coding assistant to write you up a script that downloads the audio for these videos with yt-dlp & calls Mixtrals' API

- ????

- profit

reply
If they are on Youtube, try Gemini 3 Flash first. Use AI studio, it lets you insert YouTube videos into context.
reply