my go to example of this is this talk by Saqib Shaikh (a blind software engineer at Microsoft) giving a talk about Visual Studio. Link is timestamped
I wish more people would watch videos like this just because having a realistic idea of how blind people do certain tasks can help you move from pity or even compassion to a more productive kind of understanding. I think sometimes when you haven't seen it, you can't really even imagine how it can be done.
What really frustrates me is watching/listening to discussion of music, because I am forced to listen to the talking at 1x because the music sounds wrong (and is wrong) at anything other than 1x.
Ideally it should be done while encoding.
Likewise, YouTube’s “premium” feature of not displaying ads is laughable when displaying content is literally an internal browser function.
I pay anyway, because I was going to pay for an on-demand streaming music service anyway.
Maybe it’s just a matter of practice.
It's not rare among the blind in general.
Unless you're completely technologically illiterate, the kind of person who has no idea how to install an app or sign up for an online account, you're probably doing something of the sort.
I'm not even sure what to say, but discoveries like this are why I use hackernews, I'd never have known this otherwise.
I can easily understand Eloquence (the speech synthesizer he's using) at that speed, but I struggled a bit with this one.
Whenever I'm watching lectures / talks / podcasts, I tend to watch/listen to them at 2x to 2.5x times speed.
I only need to lower it if someone flubs an important word in a definition, I'll replay that part at 1x speed.
If the person is talking particularly slowly (usually for international audiences) I put the speed up to 3x to 4x speed so it sounds like normal 2x to 2.5x speed.
---
My youtube muscle memory:
(standard video controls used by every video editor ever)
J = back 5s
K = play/pause
L = forwards 5s
(youtube specific controls)
Shift F = toggle fullscreen
Speed controls (this part is muscle memorised as fast as a password input):
1. Cmd/Ctrl Shift K: opens console
2. Up arrow: loads previous command, typically: document.querySelector('video').playbackRate = 2.5
3. Enter: runs command
You have to type in the command for the first time, after that to change the speed, change "2.5" to whatever number you want and console history will remember the change so you can go through the different values with up/down arrows before pressing enter.
You have two modes: "focus mode", where you can edit text in text fields and keys are passed straight to the browser, and "browse mode", where keys move a virtual cursor around the page.
In browse mode, navigating with just arrow keys all the time would be just as slow as you might imagine, so you use single-key keyboard shortcuts to move by role, E.G. to the next heading, button, table or unvisited link.
The keyboard layout is optimized for memorizability and not efficiency, you use the actual arrow keys instead of hjkl for example, but the concepts are eerily similar.
There are a couple of other approaches to solve this problem, Mac OS's Voice Over is much more Emacs-like for example, and each approach has its own pros and cons, but that's definitely one way to do it.
RIP kid https://youtu.be/fnH7AIwhpik
If he’d like your humor I like it too :dolphin:
We all do that, I mean unless you’re hearing impaired.
Everyone’s familiar with dropping a coin or such and knowing exactly where it landed without looking.
That’s more passive sonar though.
Do I recall seeing videos of guys mountain biking and making a hissing sound for an active sonar style echo location?
Or am I making that memory up.
Even better, fire up Orca (or whatever screenreader application your OS comes with) yourself and try to use your computer while shutting your eyes, kind of eye-opening (no pun intended) what kind of experience these sort of users typically get. And also, you quickly start to understand why they set the speech rate for their voice synthesizer to be so fast, it's almost unbearable navigating applications (and particularly lists) otherwise.
Unfortunately it seems impossible to get all that much funding for accessibility work :/ I wonder what ever happened to the Newton accessibility bus intended to supplement Wayland...
Hm, never heard about it, but now I'm wondering too. I just finished implementing proper accessibility support for my native app toolkit for Linux, macOS and Windows, but only done it for X11 so far, I was just gonna get started with Wayland. What is the accessibility story on Wayland, couldn't people rely on the same protocols as with X11? That was my impression, but haven't really dig into yet.
Thanks for considering a11y for your toolkit - it really makes a difference to those of us who are disabled. Are you implementing a11y separately for each platform? If you use accesskit[1] you only have to implement it once for all platforms. I recently vibe coded accessibility for the Swell toolkit[2] used by Reaper. I have a branch using accesskit and a branch implementing at-spi. Accesskit made things a lot easier and more performant.
Let me know if you would like a screen reader user to help with testing your toolkit.
[0] https://lwn.net/Articles/1025127/
[1] https://github.com/AccessKit/accesskit
[2] https://github.com/RDMurray/WDL/tree/accesskit
and my fork of accesskit with some features and fixes for unix: https://github.com/RDMurray/accesskit/tree/swell-fixes
There are apps I use semi-regularly that less-experienced screen reader users thought were inaccessible, and I couldn't even explain what they were doing wrong from memory. The ways of working around accessibility issues are just so ingrained in me that all I can usually remember is "yeah I did this somehow, but it was six months ago and I have absolutely no idea which specific tricks I needed for this one."
I imagine that for coding it also helps deal with the fundamental problem of an ephemeral stream rather than a persistent document that you can navigate visually in multiple dimensions. Working memory is limited, and getting more text in in a short period of time probably helps you work within that better. I also imagine that working with text via audio all the time gradually stretches and improves memory.
You can show a lot more info on a screen than you can transmit through speech in a short period of time. That doesn't mean you read faster than you listen, just that sighted people essentially use their eyeballs as an "input device" to decide what information to look at.
If there's an object on the screen that you want to examine but that you don't need to click, you can just "navigate to it" with your eyeballs, without ever touching a mouse or keyboard. We don't have that luxury.
This means we need a much more efficient system for navigating what's on the screen, but that only gets you so far. Eventually, the easiest way to deal with this problem is just to increase the bandwidth of your channel, and you do that by increasing the speech rate.
Wouldn’t opposite mean you listen at sub 1x speed.
Whereas as your definition seems to be ”I’m the same, but less so.”
Sure you can "learn" something from a Sports Podcast or a Comedy Podcast, but you could also say you are "learning" from a podcast which just reads out random numbers. You could "learn" at 33 minutes, 11 seconds, the number 6 is read out, then 8, then 1 but I wouldn't call that learning, or at least its pointless learning.
I'm not getting my hopes up though given apple's history with Siri, which is truly awful.
I don’t think the Google's tech has anything to do with these features.
This would had to have been in the works long before the Google announcement. Also, these are enhancements of existing iOS and macOS features. They don’t require an LLM anyway; these features use Apple’s Machine Learning models.
For example, creating subtitles for videos? iOS 16 introduced Live Captions for FaceTime calls in 2022 [1].
[1]: https://www.apple.com/newsroom/2022/05/apple-previews-innova...
This has been the typical pattern for Apple for the last few years. The flashy features are announced at WWDC, accessibility has a dedicated, earlier press release. Before this practice, accessibility announcements would usually be tucked in some WWDC slide that most people wouldn't even notice.
IIRC, it's timed to land around Global Accessibility Awareness Day (May 21).
I just would not wanna promise anything. Except “available for download this Friday“ once the gold master is passing tests.
"Coming later this year" means it's part of a publicly committed release — iOS 27, macOS 27, etc. — not vaporware.
The annual pre-WWDC accessibility announcement is a tradition, and with the conference less than a month away, expect more detail then. New a11y features have a good chance of appearing in the 10am PT keynote or the Platforms State of the Union, the developer-focused follow-up at 1pm PT.
That said, things are still fluid with three weeks to go — features can be added or pulled at any time. If something gets bumped from the main presentations, there will almost certainly be a dedicated video session covering it.
As for availability: some of these features will land in the iOS 27 and macOS 27 betas, which drop during WWDC for Apple Developer Program members. The public beta follows in July, and there's a free tier of the developer program if you want early access.
Don't expect everything at once, though. Some features won't arrive until the September release candidates — and even then, a few may ship labeled "beta" or "experimental," or hold for a future dot release.
Twenty years and text input & manipulation on iPhone sucks a big fat hair pair of dogs balls still.
The last time I daily drove Android was over two years ago and it was immeasurably less God-damn-I-wanna-dig-Jobs-corpse-up-n-give-the-guy-a-piece-of-my-mind, only problem is his grave is unmarked. Arsehole!
After a few more years of Thanksgivings and Christmases and Mothers' Days, we'll finally train her up to a reasonable speed lmao.
Whether that control you see visually is actually accessible to a blind user is a different matter entirely. Further, it maxes out at 2x, but a blind person would typically screen read at the equivalent of 3-6x.
Related, it seems like YouTube recently paywalled speed increase beyond 2x. Another way in which it's not cheap to lose sight, I guess.
True.
We can frame it even more strongly: "default societal practices actively discriminate against people with disabilities; they intentionally, consciously choose to make life harder for people who're disadvantaged".
Seems like it would be a win-win to have a user setting to opt out of video in exchange for ungating that feature.
Pretty sure there's enough blind people who don't listen to voice at insane speeds, because they listen in their non-native second language or for whatever other reason. What's wrong in using lowest common denominator that's 100% accessible to those people as well as people who want faster speeds? Unlike "too fast", "too slow" doesn't get entirely inaccessible, it's just boring.
Such a random reason to criticize for.
Some blind people listen to things at superhuman speeds, but not all blind people. Using a normal reading speed is a sensible choice for an ad trying to appeal to blind people since you don't want to intimidate those who don't use superhuman speeds.
Going from that to "heh a sighted person made this because it's normal speed" is simply incorrect.
It was the sort of statement an HNer might make to showcase some trivia they have about some other group, but they oversold it.
Yes, for lots of reasons. It takes practice to get up to a high speed with a given TTS. People who go blind later in life are just beginning, and it can take a long time for them to get up to really high speeds. You may also need to reset somewhat when you change from one TTS to another. And blind people's ears are subject to problems just like anyone else's; if your hearing isn't great you may need slower speeds or higher volumes or both. That's why even though most people use screenreaders at much higher speeds, the defaults when you turn on a new device are painfully slow. You have to set a conservative default so people with less experience/worse ears/whatever can get by.
Anyway I don't think it's a criticism. It's just noting that it doesn't depict how most people will use end up using it, and if you're curious about what typical usage sounds like, you should look for another example.
It's like how in videos that teach people a foreign language, everyone speaks slowly and uses simple words, even though native speakers don't talk like that at all. The GP is simply saying that an actual blind person would be way more efficient at it, but they made the video with inefficient settings so sighted people could understand what was going on.
What does this mean?