undefined

points

[-]

Having built with and tried every voice model over the last three years, real time and non-real time... this is off the charts compared to anything I've seen before.

And open weight too! So grateful for this.

by Oras8 hours ago|

prev|

[-]

Thank you for the link! Their playground in Mistral does not have a microphone. it just uploads files, which does not demonstrate the speed and accuracy, but the link you shared does.

I tried speaking in 2 languages at once, and it picked it up correctly. Truly impressive for real-time.

by druskacik6 hours ago|

parent|

[-]

According to the announcement blog Le Chat is powered by the new model as well: https://chat.mistral.ai/chat

by TacticalCoder1 hours ago|

parent|

prev|

[-]

> Truly impressive for real-time.

Impressive indeed. Works way better than the speech recognition I first got demo'ed in... 1998? I remember you had to "click" on the mic everytime you wanted to speak and, well, not only the transcription was bad, it was so bad that it'd try to interpret the sound of the click as a word.

It was so bad I told several people not to invest in what was back then a national tech darling:

https://en.wikipedia.org/wiki/Lernout_%26_Hauspie

That turned out to be a massive fraud.

But ...

> I tried speaking in 2 languages at once, and it picked it up correctly.

I'm a native french speaker and I tried with a very simple sentence mixing french and english:

"Pour un pistolet je prefere un red dot mais pour une carabine je prefere un ACOG" (aka "For a pistol I prefer a red dot but for a carbine I prefer an ACOG")

And instead I got this:

"Je prépare un redote, mais pour une carabine, je préfère un ACOG."

"Je prépare un redote ..." doesn't mean anything and it's not at all what I said.

I like it, it's impressive, but literally the first sentence I tried it got the first half entirely wrong.

by skykooler5 hours ago|

prev|

[-]

Doesn't seem to work for me - tried in both Firefox and Chromium and I can see the waveform when I talk but the transcription just shows "Awaiting audio input".

by starkgoose4 hours ago|

parent|

[-]

Try disabling CSP for the page

by codethief5 hours ago|

parent|

prev|

[-]

Same here. In Chromium I don't even see the waveform.

by fragmede4 hours ago|

parent|

[-]

I had to turn off ad-block to get it to work.

by daemonologist7 hours ago|

prev|

[-]

404 on https://mistralai-voxtral-mini-realtime.hf.space/gradio_api/... for me (which shows up in the UI as a little red error in the top right).

by jaggederest7 hours ago|

prev|

[-]

It can transcribe Eminem's Rap God fast sequence, really, really impressive.

by rafram7 hours ago|

parent|

[-]

That's almost certainly in the training data, to be fair.

by keeganpoppen5 hours ago|

parent|

prev|

[-]

what a great test hahah

by GolDDranks36 minutes ago|

prev|

[-]

I can't get that demo to work. Tried with both Firefox and Chrome.

by pyprism7 hours ago|

prev|

[-]

Wow, that’s weird. I tried Bengali, but the text transcribed into Hindi!I know there are some similar words in these languages, but I used pure Bengali that is not similar to Hindi.

by derefr6 hours ago|

parent|

[-]

Well, on the linked page, it mentions "strong transcription performance in 13 languages, including [...] Hindi" but with no mention of Bengali. It probably doesn't know a lick of Bengali, and is just trying to snap your words into the closest language it does know.

by keeganpoppen5 hours ago|

parent|

[-]

it must have some exposure to bengali— just not enough for them to advertise it. otherwise it would have a damn hard time.

by carbocation6 hours ago|

prev|

[-]

This model was able to transcribe Bad Bunny lyrics over the sound of the background music, played casually from my speakers. Impressive, to me.

by sheepscreek6 hours ago|

prev|

[-]

I’ve been using AquaVoice for real-time transcription for a while now, and it has become a core part of my workflow. It gets everything, jargon, capitalization, everything. Now I’m looking forward to doing that with 100% local inference!

by darkwater3 hours ago|

prev|

[-]

It's really nice although I've got a sentence in French when I was speaking Italian but I corrected myself in the middle of a word.

But I'm definitely going to keep an eye on this for local-only TTS for Home Assistant.

by mentalgear3 hours ago|

prev|

[-]

Here European Multilingual-Intelligence truly shines!

by Barbing3 hours ago|

prev|

[-]

Doesn’t seem to work in Safari on iOS 26.2, iPhone 17 Pro, just about anything extra disabled.

by 7 hours ago|

prev|

[-]

deleted

by rafram7 hours ago|

prev|

[-]

Not terrible. It missed or mixed up a lot of words when I was speaking quickly (and not enunciating very well), but it does well with normal-paced speech.

by timhh2 hours ago|

parent|

[-]

Yeah it messed up a bit for me too when I didn't enunciate well. If I speak clearly it seems to work very well even with background noise. Remember Dragon Naturally Speaking? Imagine having this back then!

by colordrops2 hours ago|

prev|

[-]

is this demo running fully in the browser?

by simonw2 hours ago|

parent|

[-]

No, it's server-side.

Model is around 7.5 GB - once they get above 4 GB running them in a browser gets quite difficult I believe.

by th0ma57 hours ago|

prev|

[-]