So a couple of hours later I'd written a script that does transcription based editing: on the first pass it grabs a timestamped transcript and a plain text transcript for editing; you edit the words into any order you like and a second pass reassembles the video (it's just a couple of hundred lines of python wrapping whisper and ffmpeg). It also speeds up 4x any silences detected that sit within retained sequences in the video.
Matching up transcripts turns out to be not that hard; I normalise the text, split it, and then compare to the sequence of normalised words from the timestamped transcript. I find the longest common sequence, keep that, then recurse on the before/after sections (there's a little more detail, but not much). I also sent the transcription to ffmpeg to burn in as captions, because sometimes it makes the audio choppy and the captions make it easier to follow.
I know, tools have been doing this for years now. I just didn't have one to hand, and now I do, and I couldn't have done this without whisper.
Honestly, the capabilities of whisper is insane, the fact that it's free and open source is really a gift. Some of the things it can do feels almost sci-fi.
If you ever decide to release it publicly please let me know, sounds like a very useful tool.
https://gist.github.com/bazzargh/e1d2e2718af575a03206114a291...
So I am installing it through the link you provided, which directed me to a "install success" page saying "your purchase is successful" even if your app is free. Another obstacle to adoption :-)
Last, I was not informed on the page of the app' size. Seeing what it does and the time it takes to download I am afraid it could be huge? Third obstacle :-)
As for discoverability / the "your purchase is successful" message, I'm not sure what else I can do, I've set it to free, no ads etc in Google Play. Maybe I need to hit a few more keywords for transcription so it surfaces it more.
App info shows 218MB size, which I suppose is about what I'd expect for a model+app code :shrug:
I love the "free forever, no ads part..." But it obscures what the app is for. Maybe start with the "Speech to text transcription" to make it clearer.
Either way, that's just semantics. Great job
This way one can listen to the recording again, and correct such issues.
Do you have an idea about supporting languages other than English?
The average model and upwards should support all languages from the whisper models by default.
I haven't tested them all so I'm unsure of the quality, however it should in theory support the following:
---
Albanian
Amharic
Arabic
Armenian
Assamese
Azerbaijani
Bashkir
Basque
Belarusian
Bengali
Bosnian
Breton
Bulgarian
Cantonese
Catalan
Chinese
Croatian
Czech
Danish
Dutch
English
Estonian
Faroese
Finnish
French
Galician
Georgian
German
Greek
Gujarati
Haitian creole
Hausa
Hawaiian
Hebrew
Hindi
Hungarian
Icelandic
Indonesian
Italian
Japanese
Javanese
Kannada
Kazakh
Khmer
Korean
Lao
Latin
Latvian
Lingala
Lithuanian
Luxembourgish
Macedonian
Malagasy
Malay
Malayalam
Maltese
Maori
Marathi
Mongolian
Myanmar
Nepali
Norwegian
Nynorsk
Occitan
Pashto
Persian
Polish
Portuguese
Punjabi
Romanian
Russian
Sanskrit
Serbian
Shona
Sindhi
Sinhala
Slovak
Slovenian
Somali
Spanish
Sundanese
Swahili
Swedish
Tagalog
Tajik
Tamil
Tatar
Telugu
Thai
Tibetan
Turkish
Turkmen
Ukrainian
Urdu
Uzbek
Vietnamese
Welsh
Yiddish
Yoruba
---
Apologies for the formatting, not sure how to make it look nice in the comment.
A new bugfix update for the "Translate to English" toggle (which was functionally always set to on) should be available soon, it's just awaiting Play Store approval.
I have been using the iOS built in speechTranscriber and it is... not great, was gonna use a whisper API but running it on device would be amazing if it isn't too heavy.
Yes, everything is done locally, stick your phone in airplane mode if you want to be sure!
I'm not an Apple fan, but I have to say I've been testing it on an iPhone 15 and my god, the performance is insanely good, I was seriously blown away. I haven't dug into how much it impacts battery, but the transcription literally takes seconds for a minute of audio so it's not holding up your device.
The iOS version is built, ready to go, there's just some bug with my Apple account and it won't let me pay the £80 fee to signup (support ticket raised and waiting). As soon as that's sorted it'll be out on App Store for free as well.
Very surprised to hear the built in transcription is not great, anything specifically bad about it? The hardware is there.
You can download the desktop version from here (https://blazingbanana.com/apps/whistle/) if you want, still very much a WIP.
I have added the auto-copy to clipboard functionality that will come with the next Android release and be included in all others. Adding a hotkey / quickbar button is on the roadmap for the desktop versions.
If you want to give the Linux version a shot, you can download it from here - https://downloads.formait.app/whistle/linux/WhistleDesktop-l... - I've just stuck it in the same R2 bucket as another app, as I've not sorted the proper pipeline out yet.
I've been focused on getting functional parity across all OS's since the Android release. This is very close to being done and I just need to reach the milestone of it being available on all platforms before I move forward.
Hopefully you will take another look when the next update is out.
I did a complete overhaul to the pipeline so that it splits and processes at the end, this seems to have sorted it. I'm thinking about doing each transcription segment as it's coming in (with a bit of a buffer / overlap to keep context) much like the live transcription does, but for now performance is ok. Something I'll keep in mind once I've crossed some other things off the list.
We have a similar product in the construction space. Would love to talk to you about some of our challenges and possibly work together. Interested?
I believe you have to make the source code public (please correct me if I'm wrong). I'm more than happy to do so, I've used a whole bunch of open source stuff to build the app so it only seems fair, I just need to make it a bit less messy and something I don't mind being public.
if I am talking in german the text is translating it to english. Didn't expect that
There was a bug causing the "translate to english" to be always enabled. This should work correctly and translate to your native language.
Will be in the next update (in a day or two).