undefined

upvote

points

by bogtog56 days ago |

upvote

by Marsymars56 days ago|

[-]

> I'm not a particularly slow typer. I can go 70-90 WPM on a typing test. However, this speed drops quickly when I need to also think about what I'm saying. Typing that fast is also kinda tiring, whereas talking/thinking at 100-120 WPM feels comfortable.

This doesn't feel relatable at all to me. If my writing speed is bottlenecked by thinking about what I'm writing, and my talking speed is significantly faster, that just means I've removed the bottleneck by not thinking about what I'm saying.

reply

upvote

by eucyclos55 days ago|

[-]

It's often better to segregate creative and inhibitive systems even if you need the inhibitive systems to produce a finished work. There's a (probably apocryphal) conversation between George RR Martin and Stephen King that goes something like:

GRRM: How do you write so many books?... Don't you ever spend hours staring at the page, agonizing over which of two words to use, and asking 'am I actually any good at this?'

SK: Of course! But not when I'm writing.

reply

upvote

by theshrike7953 days ago|

[-]

It's not fully apocryphal, there's video of it: https://www.youtube.com/watch?v=xR7XMkjDGw0 - not those exact words, but the gist of it is there.

(Full video here: https://www.youtube.com/watch?v=v_PBqSPNTfg )

reply

upvote

by bogtog56 days ago|

[-]

That's fair. I sometimes find myself pausing or just talking in circles as I'm deciding what I want. I think when I'm speaking, I feel freer to use less precise/formal descriptions, but the model can still correctly interpret the technical meaning

In either case, different strokes for different folks, and what ultimately matters is whether you get good results. I think the upside is high, so I broadly suggest people try it out

reply

upvote

by hexaga56 days ago|

[-]

Alternatively: some people are just better at / more comfortable thinking in auditory mode than visual mode & vice versa.

In principle I don't see why they should have different amounts of thought. That'd be bounded by how much time it takes to produce the message, I think. Typing permits backtracking via editing, but speaking permits 'semantic backtracking' which isn't equivalent but definitely can do similar things. Language is powerful.

And importantly, to backtrack in visual media I tend to need to re-saccade through the text with physical eye motions, whereas with audio my brain just has an internal buffer I know at the speed of thought.

Typed messages might have higher _density_ of thought per token, though how valuable is that really, in LLM contexts? There are diminishing returns on how perfect you can get a prompt.

Also, audio permits a higher bandwidth mode: one can scan and speak at the same time.

reply

upvote

by mattmanser54 days ago|

[-]

It's kind of the point. If you start writing it, you'll start correcting it and moving things around and adding context and fiddling and more and more.

And your 5 minute prompt just turned I to 1/2 hour of typing

With voice you get on with it, and then start iterating, getting Claude to plan with you.

Not been impressed with agentic coding myself so far, but I did notice that using voice works a lot better imo, keeping me focused on getting on with letting the agent do the work.

I've also found it good for stopping me doing the same thing in slack messages. I ramble my general essay to ChatGPT/Claude, get them to summarize rewrite a few lines in my own voice. Stops me spending an hour crafting a slack message and tends to soften it.

reply

upvote

by buu70056 days ago|

[-]

I prefer writing myself, but I could see the appeal of producing a first draft of a prompt by dumping a verbal stream of consciousness into ChatGPT. That might actually be kind of fun to try while going on a walk or something.

reply

upvote

by theshrike7953 days ago|

[-]

You can feed all that into Claude and have a prototype ready while you get home.

The Claude App version works from your phone and has a virtual environment it can use to write code and push it to a github repo :)

reply

upvote

by buu70053 days ago|

[-]

That's definitely cool too. I was just suggesting an intermediary text prompt step as a compromise between 100% writing and 100% voice. So instead of getting home to actual code, you'd get home to a draft of relatively detailed requirements to review and revise before incurring the cost of throwing a coding agent at it.

reply

upvote

by dyauspitr56 days ago|

[-]

I don’t feel restricted by my typing speed, speaking is just so much easier and convenient. The vast majority of my ChatGPT usage is on my phone and that makes s2t a no brainer.

reply

upvote

by cjflog55 days ago|

[-]

100% this, I built laboratory.love almost entirely with my voice and (now-outdated) Claude models

My go-to prompt finisher, which I have mapped to a hotkey due to frequent use, is "Before writing any code, first analyze the problem and requirements and identify any ambiguities, contradictions, or issues. Ask me to clarify any questions you have, and then we'll proceed to writing the code"

reply

upvote

by Applejinx56 days ago|

[-]

It's an AI. You might do better by phrasing it, 'Make a plan, and have questions'. There's nobody there, but if it's specifically directed to 'have questions' you might find they are good questions! Why are you asking, if you figure it'd be better to get questions? Just say to have questions, and it will.

It's like a reasoning model. Don't ask, prompt 'and here is where you come up with apropos questions' and you shall have them, possibly even in a useful way.

reply

upvote

by dominotw56 days ago|

[-]

surprised ai companies are not making this workflow possible instead of leaving it upto users to figure out how to get voice text into prompt.

reply

upvote

by alwillis56 days ago|

[-]

> surprised ai companies are not making this workflow possible instead of leaving it upto users to figure out how to get voice text into prompt.

Claude on macOS and iOS have native voice to text transcription. Haven't tried it but since you can access Claude Code from the apps now, I wonder if you use the Claude app's transcription for input into Claude Code.

reply

upvote

by bogtog56 days ago|

[-]

> Claude on macOS and iOS have native voice to text transcription

Yeah, Claude/ChatGPT/Gemini all offer this, although Gemini's is basically unusable because it will immediately send the message if you stop talking for a few seconds

I imagine you totally could use the app transcript and paste it in, but keeping the friction to an absolute minimum (e.g., just needing to press one hotkey) feels nice

reply

upvote

by dyauspitr56 days ago|

[-]

All the mobile apps make this very easy.

reply

upvote

by johnfn56 days ago|

[-]

That's a fun idea. How do you get the transcript into Claude Code (or whatever you use)? What transcription service do you use?

reply

upvote

by hn_throw202556 days ago|

[-]

I'm not the person you're replying to, but I use Whispering connected to the whisper-large-v3-turbo model on Groq.

It's incredibly cheap and works reliably for me.

I have got it to paste my voice transcriptions into Chrome (Gemini, Claude, ChatGPT) as well as Cursor.

https://github.com/EpicenterHQ/epicenter

reply

upvote

by rgbrgb56 days ago|

[-]

I use Handy with Claude code. Nice to just have a key combo to transcribe into whatever has focus.

https://github.com/cjpais/Handy

reply

upvote

by bonniesimon55 days ago|

[-]

Love handy. I use it too when dealing with LLMs. The other day I asked chatgpt to generate interview questions based on job description and then I answered using handy. So cool!

reply

upvote

by quinncom56 days ago|

[-]

I use Spokenly with local Parakeet 0.6B v3 model + Cerebras gpt-oss-120b for post-processing (cleaning up transcription errors and fixing technical mondegreens, e.g., `no JS` → `Node.js`). Almost imperceptible transcription and processing delay. Trigger transcription with right ⌥ key.

reply

upvote

by ctoth56 days ago|

[-]

According to Google this is the first time the phrase "technical mondegreens" was ever used. I really like it.

reply

upvote

by hurturue56 days ago|

[-]

your OS might have a built in dictation thing. Google for that and try it before online services.

reply

upvote

by thehours54 days ago|

[-]

I use the Raycast + Whisper Dictation. I don't think there is anything novel about it, but it integrates nicely into my workflow.

My main gripe is when the recording window loses focus, I haven't found a way to bring it back and continue the recorded session. So occasionally I have to start from scratch, which is particularly annoying if it happens during a long-winded brain dump.

reply

upvote

by primaprashant55 days ago|

[-]

I built my own open-source tool to do exactly this so that I can run something like `claude $(hns)` in my terminal and then I can start speaking, and after I'm done, claude receives the transcript and start working. See this workflow here: https://hns-cli.dev/docs/drive-coding-agents/

reply

upvote

by bogtog56 days ago|

[-]

There are a few apps nowadays for voice transcription. I've used Wispr Flow and Superwhisper, and both seem good. You can map some hotkey (e.g., ctrl + windows) to start recording, then when you press it again to stop, it'll get pasted into whatever text box you have open

Superwhisper offers some AI post-processing of the text (e.g., making nice bullets or grammar), but this doesn't seem necessary and just makes things a bit slower

reply

upvote

by rpwverheij54 days ago|

[-]

+1 for Superwhisper. It has an offline model for transcription. And it transcribes with very high accuracy for me and great speed.

reply

upvote

by elvin_d56 days ago|

[-]

made this tool to press double control to start and another ctrl to stop which copies to the cliboard

https://github.com/elv1n/para-speak/

reply

upvote

by erichi54 days ago|

[-]

So cool man! Had to add couple fixes to be able to use it on mac. Love it!

reply

upvote

by victorbjorklund55 days ago|

[-]

I do the same. On Mac I use macwhisper. The transcription does not have to be correct. Lots of times it writes the wrong word when talking about technical stuff but Claude understands which word I mean from context

reply

upvote

by singhrac55 days ago|

[-]

I use VoiceInk (needed some patches to get it to compile but Claude figured it out) and the Parakeet V3 model. It’s really good!

reply

upvote

by d4rkp4ttern55 days ago|

[-]

> if you talk in a winding way …

My regular workflow is to talk (I use VoiceInk for transcription) and then say “tell me what you understood” — this puts your words into a well structured format, and you can also make sure the cli-agent got it, and expressing it explicitly likely also helps it stay on track.

reply

upvote

by 56 days ago|

[-]

deleted

reply

upvote

by listic56 days ago|

[-]

Thanks for the advice! Could you please share how did you enable voice transcription for your setup and what it actually is?

reply

upvote

by binocarlos56 days ago|

[-]

I use https://github.com/braden-w/whispering with an OpenAI api key.

I use a keyboard shortcut to start and stop recording and it will put the transcription into the clipboard so I can paste into any app.

It's a huge productivity boost - OP is correct about not overthinking trying to be that coherent - the models are very good at knowing what you mean (Opus 4.5 with Claude Code in my case)

reply

upvote

by abdullahkhalids56 days ago|

[-]

I just installed this app and it is very nice. The UX is very clean and whatever I say it transcribes it correctly. In fact I'm transcribing this comment with this app just now.

I am using Whisper Medium. The only problem I see is that at the end of the message it sometimes puts a bye or a thank you which is kind of annoying.

reply

upvote

by listic56 days ago|

[-]

I am all ready to believe that with LLMs it's not worth it trying to be too coherent: I did successfully use LLMs to make sense of what incoherent-sounding people say. (in text)

reply

upvote

by mattmanser54 days ago|

[-]

Aquavoice, YC company, really good. Got it after doing a bit of research on here, there's something for Mac that's supposed to be good too.

If you want local transcription, locally running models aren't quite good enough yet.

They use right-ctrl as their trigger. I've set mine to double tap and then I can talk with long pauses/thinking and it just keeps listening till I tap to finish.

reply

upvote

by bogtog56 days ago|

[-]

I'm using Wispr flow, but I've also tried Superwhisper. Both are fine. I have a convenient hotkey to start/end recording with one hand. Having it just need one hand is nice. I'm using this with the Claude Code vscode extension in Cursor. If you go down this route, the Claude Code instance should be moved into a separate window outside your main editor or else it'll flicker a lot

reply

upvote

by pzo55 days ago|

[-]

another option is MacWhisper if someone is on macOS and doesn't want to pay for subscription (just one time payment) - pretty much all of those apps these days use paraspeech from NVIDIA which is the fastest and the best open source model that can run on edge devices.

Also haven't tried but on latest MacOS 26 apple updated their STT models so their build in voice dictation maybe is good enough.

reply

upvote

by kapnap56 days ago|

[-]

For me, on Mac, VoiceInk has been top notch. Got tired of superwhispr

reply

upvote

by lukax56 days ago|

[-]

Spokenly on macOS with Soniox model.

reply

upvote

by j4556 days ago|

[-]

Speech also uses a different part of the brain, and maybe less finger coordination.

reply

upvote

by journal56 days ago|

[-]

voice transcription is silly when someone is listening you talking to something that isn't exactly human, imagine explaining you were talking to AI. When it's more than one sentence I use voice too.

reply