Apple unveils new accessibility features

upvote

Apple unveils new accessibility features

(www.apple.com)

625 points

by interpol_p16 hours ago |

upvote

by JeremyHerrman12 hours ago|

[-]

Apple loves to stealth test new tech in full public view by sneaking it into relatively mundane places, so debuting agentic AI via accessibility is very on brand.

A few other examples:

- The Touch Bar was much more than an OLED strip, it was Apple’s first move in the transition to Apple Silicon on macs. The Apple T1 chip in the 2016 Touch Bar MacBooks was the first solely Apple-designed processor to appear in a Mac and took over several responsibilities away from intel chipsets like power management, fans, sleep/wake, access to the camera & mic, and the secure enclave powering touch ID. Then the T2 added encryption of the SSD, audio management, image processing for the camera, and prevented tampering with the boot process

- The iPhone 3G shipped with a Liquidmetal SIM eject tool, which is made from a strong custom metal alloy which is "practically unbendable by hand unless you want to hurt or cut your fingers." Although Apple hasn’t released anything with the alloy since then, now nearly 20 years later Apple is rumored to be using liquid metal in their upcoming foldable iPhone.

- RealityKit had 3D scanning and a lot of other cool AR capabilities for years which didn’t make sense until the Apple Vision Pro was released.

reply

upvote

by jorvi6 hours ago|

[-]

You're reading way too much into it. These are just failed attempts at commercializing something.

- People hated the touchbar. Only years after it became liked, and only under tech enthusiasts that hacked and tweaked it to have much deeper functionality.

- Making the ejector out of an expensive alloy made no sense.

- Realitykit (and the Vision, which is also crashing and burning) is a solution looking for a problem.

- 3D touch had both discoverability and usability problems.

- etc etc.

reply

upvote

by JeremyHerrman6 hours ago|

[-]

You're underestimating Apple's meticulous planning, which has only become more intense in the Cook era. Bad feature/UX or not, each one of those decisions was calculated.

Read this ars quote from 2010 [0]:

>Apple used the small part—one that is not integral to the device’s functionality—to see if the company was capable or producing a custom design to Apple’s specifications. Typically, manufacturers prefer to have at least two sources for parts, so that a supply problem from one supplier won’t halt manufacturing. Since Liquidmetal is only available from one source, Apple needed to make sure the company could deliver.

For Apple Silicon, there was no way they'd make the switch in one go, so they had to figure out a way to hedge that bet. That's what the TouchBar really was, with all its warts and solutions for problems nobody had.

And as someone else in this thread pointed out, the first custom cellular chip wasn't released with a flagship model - they exclusively paired it with the budget iPhone 16e.

Apple is always calculating and hedging.

[0]: https://arstechnica.com/gadgets/2010/08/apple-tested-liquidm...

reply

upvote

by wetpaws4 hours ago|

[-]

[dead]

reply

upvote

by bdamm3 hours ago|

[-]

You're misunderstanding how difficult it is to make major architectural changes to products the way Apple can. One of the ways to do it is to hide the architectural change as something else, something niche, and only when it has survived the fire of deployment there try to scale it up to the full market. It's actually quite genious, and you can expect more of it now that Apple's hardware guru is the chief.

I can't help but wonder if this agentic-via-accessibility angle is the result of this new leadership. If it is, it's a very good sign for Apple, because software and especially the AI gap is Apple's achillies right now.

reply

upvote

by MBCook5 hours ago|

[-]

I liked the TouchBar. There were two problems with it:

1. It replaced the F keys. I suspect pros wouldn’t have complained so loudly if it didn’t. And it was too expensive for the cheaper computers where it may have been more popular.

2. They never changed it. Ok the first version wasn’t a big hit. Other than bringing back the escape key they never did anything. They sent it out to be a hit or to die and gave up there.

reply

upvote

by wolvoleo59 minutes ago|

[-]

#1 bothered me the most. A lot.

And the stupid thing was that there was plenty of space for a row of function keys and the touch bar.

reply

upvote

by sysworld4 hours ago|

[-]

no2 is what annoyed me the most. I liked it for the most part, but it was never updated, even on the software side we had very few changes. It could've been great.

reply

upvote

by shagie3 hours ago|

[-]

Even though it's showing its age (and support is being phased out in recent builds), I still liked IntelliJ with the Touch Bar laptop.

https://www.jetbrains.com/help/idea/touch-bar-support.html

Having the Touch Bar screen show up the relevant buttons for the context I was in was really nice compared to trying to remember which F key was which debug option.

A "I wish..." would have been a $200 usb bar and hub that could sit right behind my keyboard for a desktop.

reply

upvote

by MBCook2 hours ago|

[-]

That was one of my two primary uses!

The other was I made some Shortcuts that were very handy for me and set them up as buttons. It’s been over a year, I still miss them.

One would pop up a dialog I could type a Jira ticket number into and it would open it. I tried to do that with Salesforce but they’re insane so you can’t.

My favorite would open my next meeting. Know I have a meeting in 5 minutes? Hit the button and my browser would open the right Google Meet or Zoom and come to the front.

So useful.

The desktop problem is a real one too. It was great… as long as you only use your laptop as a laptop or your keyboard. Use anything else and you list it.

iMac? No. Mac Pro? No. Mac Mini? Don’t be stupid. No.

MBP only.

reply

upvote

by Fr0styMatt885 hours ago|

[-]

Even at the time I remember it was widely cited that the SIM eject tool was a test for their new manufacturing process.

reply

upvote

by taeric6 hours ago|

[-]

Vision is hilarious as it is more than just a solution looking for a problem. It was also desperately avoiding the current market that exists for it. Anything but games, it seemed.

reply

upvote

by socalgal21 hours ago|

[-]

Even more strange given 60-70% of all app store revenue for Apple is games - see Epic vs Apple trial for data

reply

upvote

by JV0011 hours ago|

[-]

Also their first own modem, shipped in their cheapest tier starting with the iPhone 16e.

reply

upvote

by greggsy3 minutes ago|

[-]

Interesting - I knew they’d been trying to get off Qualcomm for years, but didn’t realise that they actually managed to do it.

reply

upvote

by gnatolf9 hours ago|

[-]

'liquid metal' sounds cool. It's probably a metallic glass. I super dislike that it seemingly will be synonymous with the brand name by Apple even though that stuff has been around for decades.

Not that there are particularly many places where this is used - mostly because it really is just very expensive. In the awesome position that Apple is in, economic feasibility is so much easier to achieve, with like tens of millions of guaranteed parts to be preduced.

reply

upvote

by bayindirh8 hours ago|

[-]

It's not metallic glass. It's an injectable, super strong alloy. You can manufacture things like you're using injection molded plastic.

To be honest, British also has an injectable stainless steel, but its application domain is much more different.

reply

upvote

by s0rce8 hours ago|

[-]

Are you sure? Liquid metal was the name of a bulk metallic glass. There were usb flash drives using it as a case https://en.wikipedia.org/wiki/Liquidmetal. Wikipedia lists apple licensing this technology.

Metal injection molding is also a thing but I haven't heard it called liquid metal. Usually its MIM.

reply

upvote

by shawn_w2 hours ago|

[-]

Shoot, the article even outright says

>Liquidmetal has also notably been used for making the SIM ejector tool of some iPhone 3Gs made by Apple Inc., shipped in the US.

reply

upvote

by bayindirh8 hours ago|

[-]

Honestly, I didn't know that "amorphous metal alloy" is also called metallic glass. I computed it to something else entirely. So you're right on that front.

MIM is something else, that's right, but properties of Liquid Glass allows it to be injection molded AFAIK.

MIM process is completely different from casting Liquid Metal. MIM generally starts as a powder and heated and molded, Liquid Metal can be just "melted and molded".

I have a stainless steel razor built with MIM. Has no resemblence to SanDisk Titanium's feel (which I also have).

reply

upvote

by s0rce8 hours ago|

[-]

glass is the general materials science term for an amorphous non-crystalline solid

reply

upvote

by bayindirh8 hours ago|

[-]

TIL.

I did my Ph.D. by developing BEM evaluators for working on metals, but glasses (as in class of materials) were not in my domain, so I'm thick as a brick on that part of the materials science.

Edit: BEM methods is as fun as USB buses and PSU units.

reply

upvote

by paul798611 hours ago|

[-]

And their upcoming smart glasses are the best UX for almost everything they showed the user holding up her phone for.

I read their glasses when taking video or pics the lense will light up and or flash more prominently then Metas. Maybe that will help the whole privacy issue and also it's not Meta (do love my Meta or smart glasses as a whole will ditch Metas for Apple quickly as both pair of Metas broke & there's no store for support).

reply

upvote

by jareds11 hours ago|

[-]

I also am looking forward to Apple smart glasses. I use Meta glasses because thre useful since I'm totally blind. I'd much rather have on device recognission though if possible. I also trust Apple more then Meta. At least I'm technically enclined enough to realize I shouldn't wear them in the bathroom or bedroom. At least when humans look at my AI prompts there maily seeing food boxes or computer bios screens.

reply

upvote

by bookofjoe11 hours ago|

[-]

Same for me! Both pairs of Metas are now inoperative because I can't get them back to factory reset. Once you pull that little blue tab out of them when they're new, it's GAME OVER.

reply

upvote

by diseasedyak10 hours ago|

[-]

Dang, really? Why would that be? I mean, I believe you, I'm just shocked. I'm glad I read this, as I was thinking of getting some but not now.

reply

upvote

by RobMurray10 hours ago|

[-]

There is a factory reset procedure, I think you hold the shutter button while switching them off and on again.

reply

upvote

by paul79861 hours ago|

[-]

Well do want to add more detail. First pair bought in Oct 2023 and used them up until March 2025 where a software update hosed them. Then in April 2025 bought a new pair that lasted til the end of June 2025 cause of water splashes. So i two pairs of dumb sunglasses until March 2026 when my pair with the water damage came back alive.

Overall tho Meta doesn't make durable smart glasses and they only have two flagships store for support while Apple has tons of stores for tech support.

reply

upvote

by hawaiianbrah9 hours ago|

[-]

I’ve had excellent luck with their warranty process.

reply

upvote

by SmirkingRevenge11 hours ago|

[-]

Man I miss the touch bar. Never got why people hated it so much

reply

upvote

by mjamesaustin11 hours ago|

[-]

The hate was because it replaced function keys many people use by tactile touch, without looking. Doing the same on a touch screen is very difficult.

If the bar had been added on top of those, I don't think there would've been the same kind of hate for it.

reply

upvote

by landr0id11 hours ago|

[-]

I didn't really mind the fn keys being there. I rarely use function keys unless I'm RDP'd to a Windows machine.

What drove me crazy though was the escape key. They later added the physical escape key back but I think at that point it was a bit too late.

reply

upvote

by SoftTalker27 minutes ago|

[-]

I haven't used function keys since I used mainframe applications on a 3270 terminal.

reply

upvote

by phatskat11 hours ago|

[-]

I’ve always been a “remap capslock to escape” kind of guy (vim), so I didn’t mind much. Access to the brightness (screen and keyboard) and volume slider was neat but superfluous with the OG fn keys. Context-driven controls were probably the best thing about the touch bar, and I don’t think it got enough love to make that stick.

reply

upvote

by tokenscoper10 hours ago|

[-]

Adding to the list of grievances, the functionality and the options it presented differed from app to app. I understand that the function keys also change their function app to app, but the visual noise the Touch Bar (wow - the word even gets autocorrected to have the right capitalization!) added as you switched between apps was too distracting.

reply

upvote

by SmirkingRevenge11 hours ago|

[-]

Ah yea, I've only owned one with the physical escape key. That would be annoying.

reply

upvote

by prepend8 hours ago|

[-]

Even without tactile elements it was two keys to use function keys.

I would have been fine with the touchbar if it just default displayed function keys. Hitting fn+f5 to quicksave is annoying.

reply

upvote

by plufz2 hours ago|

[-]

But wasn’t that just a setting changing it to default to fn? It was some time since I last used them…

reply

upvote

by 0x45711 hours ago|

[-]

1) First generation made ESC button a touch button. Aside from ergonomics (or lack of them), at least for me, on a psychological level "abort" button needs to be something you can smash. Also, macOS already had the worst input handling under load, making it virtual button made it worse.

2) While "happy path" on macOS pretty much never requires you to use Fkeys, but my workflow does. Blindly using touch buttons is harder than real buttons.

3) I'm not huge media keys users, but I bet #2 applies here as well.

I liked the touchbar in every other sense. If it was just an addition to an existing keyboard, people wouldn't have hate it[];

[]: At that time it was hard to not be frustrated using mac (butterfly keyboard etc), so touchbar might have gotten more hate than it deserved because of overall frustration.

reply

upvote

by Tagbert11 hours ago|

[-]

Also, the Touch Bar seemed to be abandoned as soon as it launched. It only ever launched on the Pro line. There were never any feature updates. They never made it flexible enough for people to customize it.

reply

upvote

by phatskat11 hours ago|

[-]

> They never made it flexible enough for people to customize it.

I feel like it was fairly customizable - the Mac system settings let you do a lot of drag and drop of controls, and I recall iTerm having a similar interface for customizing the bar in its own settings.

I do think it should’ve been given a lot more love, but that’s Apple for ya I guess

reply

upvote

by dd8601fn10 hours ago|

[-]

I think a bigger issue was that so few applications used it in cool, interesting ways. It has the same appeal as the oled button boxes some people have, except it’s right there on the deck… but nobody did anything with it.

I sure did prefer the media controls on it, though. I still have a 16” here and am reminded of what could have been.

reply

upvote

by Fr0styMatt884 hours ago|

[-]

I actually think it would have done well if it was just like those button boxes / Stream Deck / etc. Like a row of transparent function keys with screens, but then that would have been a flexibility tradeoff.

reply

upvote

by Barbing9 hours ago|

[-]

Touchbar users, check BetterTouchTool for tons of options

reply

upvote

by sneak10 hours ago|

[-]

It was extremely flexible in customization options, and there were SDKs to make it do additional cool stuff in apps, but nobody really cared for the most part.

I honestly think it was mostly a "we have a custom secure coprocessor now, what can we do with it?" sort of thing, which also worked out for Touch ID and disk encryption.

reply

upvote

by 2 hours ago|

[-]

deleted

reply

upvote

by donkyrf9 hours ago|

[-]

I still have a personal Touch Bar MBP, and I find it annoying.

My problem is that I lightly rest my hands on the keyboard (including the f keys), and this habit is harmless on most Macs, but inadvertently activates the Touch Bar functions.

I actually like the idea a lot, and would probably love it if it required a little more pressure to activate.

reply

upvote

by cloverich11 hours ago|

[-]

One more on top of others. Many people felt it was a solution in search of a problem. As in, there was no problem i had that it solved. And it was forced on us, in place of something useful. From the start i read that as: This wont be here in a couple years. Which then made it annoying to deal with in the meantime (the hate).

Things that stick around, are generally value adding across a large or complete subset of their users. Touch bar was always niche, and thus always doomed. I think a good counter comparison is Apple VR headsets. For me, i have no use and little interest. But i can see them as a hedge at the very least, or as an enthusiast entrant into an emerging market, where future products in that segment may become interesting. And on top, it doesnt impact me - i can ignore their existence until it becomes useful.

If touch bar were launched like VR, i suspect it would have gotten similar level of dismisals, but less hate.

reply

upvote

by rjrjrjrj9 hours ago|

[-]

I didn't like it, and was happy when they got rid of it. But I didn't hate it.

I did hate the butterfly keyboard that was introduced at the same time. Probably Apple's biggest hardware mistake of the past 15 years or so.

reply

upvote

by codebje2 hours ago|

[-]

I am fortunate enough to have both a butterfly keyboard MBP and a touch bar MBP. Obviously the butterfly keyboard has the known issues, but the touch bar MBP also has the very common issues with that hardware.

I can replace the butterfly keycaps myself. It's something like $10 from aliexpress for a full set of keycaps and clips and a minute's work to pop the busted one and replace it. Annoying, but not fatal.

The touch bar needs a full battery, keyboard, track pad, and upper case replacement to fix. I just have to live that that thing flickering brightly at me every day, or spend AU$500+ to get it fixed.

IMO the touch bar is the bigger mistake.

reply

upvote

by 9 hours ago|

[-]

deleted

reply

upvote

by Melatonic8 hours ago|

[-]

Was "LiquidMetal" anything more than a good aluminum alloy ?

reply

upvote

by s0rce8 hours ago|

[-]

Yes, it was an amorphous metal alloy. I knew people from grad school that worked for the company.

They have interesting properties: https://www.youtube.com/watch?v=_51frrQzCYM

reply

upvote

by bayindirh8 hours ago|

[-]

Yes.

I have a Sandisk Titanium flash drive which is the first practical application of the alloy, shortly before Apple snapped it.

It's feels solid, not wearing down and pretty robust for what it is. It doesn't get scratches like aluminum alloys.

It's entirely something else.

reply

upvote

by Melatonic8 hours ago|

[-]

So it's Titanium? That's cool - what's the name of the flash drive and or do you have more info ? Always loved titanium stuff

reply

upvote

by bayindirh8 hours ago|

[-]

Nope, it's called/branded "Titanium". The thing was built from Liquid Metal.

Image of the thing: https://www.bhphotovideo.com/cdn-cgi/image/fit=scale-down,wi...

reply

upvote

by s0rce8 hours ago|

[-]

I think most of the commercial liquid metal like the sandisk drive were zirconium based.

reply

upvote

by lern_too_spel2 hours ago|

[-]

There's nothing agentic about these features. Live captions have existed on other platforms for more than 6 years now, before "agentic" had even been coined.

reply

upvote

by ibero11 hours ago|

[-]

[dead]

reply

upvote

by brightbeige14 hours ago|

[-]

A while ago I signed up as a sighted person on Be My Eyes. I didn't get as many calls as I had hoped, but I was glad to help out on the few that I could. One call was to read envelopes of incoming mail, another was to read pill bottles, and then there was the two funny guys on big cozy chairs with shopping bags of cereal boxes and wanted to know what was what. I remember one guy really didn't like one type. The app had a unique feature for the sighted person to turn on the camera of the vision impaired person.

https://www.bemyeyes.com

reply

upvote

by 3s10 hours ago|

[-]

I still have the Be My Eyes app installed but haven’t had a call in over a year - I think it’s a testament to how powerful AI vision models have become. I find it cool that AI works well enough for vision impaired people to solve all their problems.

However there was something very human and nice about helping out a stranger with a small random task from time to time. I fondly remember one older lady who spilled a box of blueberries on the kitchen floor and I helped her hunt them all down by guiding her around. It was 10 minutes of connection with a random person doing something fun and which is till remember fondly 4 years later

reply

upvote

by jareds11 hours ago|

[-]

Ever since Be My Eyes introduced there AI features it's my understanding that there's been a lot less need for volunteers. I'm totally blind, and started using the app once they added in AI. It works great for for things like reading food labels after my kids have movved things around, determining if the tv has been left on, etc. I think I would use the volunteer feature if I still lived alone, but I don't.

reply

upvote

by Angostura11 hours ago|

[-]

I haven’t had a single alert from it since the AI stuff rolled out

reply

upvote

by rigrassm2 hours ago|

[-]

Sadly, judging by the info I read on the website now, it seems to be focused on using AI assistance.

Not going to lie, it deeply worries me how quickly society is just accepting offloading what used to be meaningful human interactions to AI. I hate to imagine what a society looks like once the people in it solely rely on AI for navigating life.

Sorry for the bleak reply. I was genuinely excited to read about the service as you described it and would love to hear that the human assistance side of it still works even if the website only showcases the AI.

reply

upvote

by kdheiwns58 minutes ago|

[-]

I see it less bleakly and in a more hopeful manner. For people who signed up to help, I guess it might've felt like a social thing. But I imagine most of the people who relied on the services were looking less for the social aspect and wanted more independence in life. There are probably loads of blind people who are very thankful that they're increasingly able to go outside and live fully functional lives thanks to having a phone and some AI stuff to let them do whatever they want without feeling like they're depending on someone for assistance.

I'm someone who absolutely hates generative AI, but AI that assists people in living happy lives outside is something I'm completely in favor of.

reply

upvote

by latexr11 hours ago|

[-]

> I didn't get as many calls as I had hoped

They always had significantly more people willing to help than in need of help. Which is good, not going to knock that. I signed up for it many years ago but didn’t get a single call, so eventually I just uninstalled it.

reply

upvote

by nonethewiser12 hours ago|

[-]

Who called you? Blind people?

reply

upvote

by arijun12 hours ago|

[-]

That’s the app. Or vision-impaired, at least.

reply

upvote

by 12 hours ago|

[-]

deleted

reply

upvote

by nonethewiser12 hours ago|

[-]

But in practice it could just be anyone whose outsourcing vision right? It's basically a free manual vision service.

reply

upvote

by arijun11 hours ago|

[-]

There would have to be a person pretending to be blind on the other side of the phone call, recording your answers. At that point is it not just easier to have that person do the labeling themselves?

reply

upvote

by 11 hours ago|

[-]

deleted

reply

upvote

by Faaak12 hours ago|

[-]

In practice, it's also to help vision impaired people

reply

upvote

by agmater9 hours ago|

[-]

> Thank you for joining the Be My Eyes community! Be My Eyes connects blind and low-vision users with volunteers who can provide visual interpretation of smartphone images, to companies that provide them with services or employ them, and to artificial intelligence (“AI”) tools that analyze and describe images submitted by our users.

I thought you were being cynical, but yeah the ToS basically says that as well.

reply

upvote

by foobiekr11 hours ago|

[-]

Are you implying that it might be exploited by unethical tech bros looking to build data sets?

reply

upvote

by Tagbert11 hours ago|

[-]

It is a very manual process and would probably be an inefficient way to collect that data.

reply

upvote

by postalcoder14 hours ago|

[-]

One thing Apple really needs to get right is speech to text transcription. They've nailed accessibility in so many ways and yet it feels like they're a decade behind on properly transcribing voices. At least half a decade.

Input on the iPhone is so dreadful nowadays. Their palm rejection is definitely worse than before, so mistyping is more frequent. Their text-correction algorithm for typing is worse than before, and it frequently makes incorrect corrections to words that I don't notice, because they change words a few words back from where I typed. And STT hasn't improved. On top of that, my fingers are tired of the phone form factor. Please make the iphone not a chore to use, apple.

reply

upvote

by terabytest14 hours ago|

[-]

Wispr Flow is a masterclass in STT. Apple's solution feels like it's from the last century in comparison. Same applies with Apple's TTS when you have ElevenLabs and OpenAI running laps around it. All I need is for my iPhone to do those things natively at the same quality level (because in Apple's walled garden that's the only way to get them usable everywhere).

reply

upvote

by jjice13 hours ago|

[-]

But Apple's uses so few system resources and runs fully on device on newer iPhone models (16+ I believe). It's so efficient. I really enjoy using Handy with Parakeet as the model, but the system resource usage is a monster compared to Apple's (although very good).

Looks like Wispr Flow uses a cloud model [0]:

> Cloud based speech processing infrastructure for 1B users

It gets to be a messy comparison because my iPhone can do STT with no latency pretty well fully on device, but Wispr Flow requires a cloud model, but to be fair, older Apple devices do as well. It's not an apples and oranges comparison, but I think those technical details make this a non direct comparison in a few ways.

For on-device with low system resource usage, Apple's is pretty damn good.

[0] https://wisprflow.ai/post/technical-challenges

reply

upvote

by RobMurray11 hours ago|

[-]

Apple's stt has been on-device for a long time now, long before iPhone 16. I haven't noticed any improvements since my first ever iphone 5S. I'm pretty sure wispr flow can use on-device models. I use Voiceink[0] which can use parakeet models on-device and can optionally use cloud models.It's like night and day comparing Apple's to Voiceink. The only advantage I find to Apple's stt is less friction. 3rd party apps just can't integrate as smoothly with the system. There's a gesture to activate Appledictation when Voiceover is on.

reply

upvote

by georgel9 hours ago|

[-]

It's been around and available as an API to devs since at least 2021 in iOS. The problem was even on the best iPhone at that time, I could never get it past ~0.8x speed and after 15-20 minutes the device would heat up so much the display dimmed.

For context, I was working on a podcast app with on-device transcription, had to park that idea for years before it got to today's performance.

reply

upvote

by arijun12 hours ago|

[-]

Apple runs on-device on older models, too, just wimpier models.

reply

upvote

by Invictus011 hours ago|

[-]

human resources (my voice and time) are far more valuable than the system resources. going to the cloud is absolutely worth it to prevent a typo

reply

upvote

by rhdunn11 hours ago|

[-]

That doesn't work if you have limited or no connectivity (e.g. on a mountain range). There are also privacy concerns, e.g. a doctor using it to transcribe medical information.

reply

upvote

by adamcharnock13 hours ago|

[-]

FWIW - I also really like Wispr Flow, but I moved to running the 'Whisper Large' model locally using Handy (https://github.com/cjpais/Handy), which has been essentially as good, while also having lower latency.

reply

upvote

by dceddia11 hours ago|

[-]

Handy is great. It exposes a bunch of open models beyond Whipser too, and though I haven’t tried too many of them, I’ll throw in a rec for the Parakeet model which feels pretty much on par with Whisper for accuracy and is way way faster.

reply

upvote

by primaprashant9 hours ago|

[-]

I’d say STT is pretty much a solved problem. Everyday there is a new product and can be one-shotted by any current top of the line LLMs. Take a look at this [1]. Apple is just stuck in the past.

https://github.com/primaprashant/awesome-voice-typing

reply

upvote

by hedora11 hours ago|

[-]

Until siri can reliably handle "Navigate to <business that is a decade old>", offline and using pre-downloaded maps, I'm going to assume all the other, harder speech to text and conversational stuff is just vaporware.

I found another dreadful iPhone input "feature" yesterday. If you are browsing around in third party carplay apps, and ready to tap your selection, but instead press the accelerator first, it truncates the list to only a few items, and scrolls to the top.

Way to reduce driving distractions guys! What's next? If the car is moving, maps changes destinations?

I really wish human computer interaction research were more broadly applied, and if you do dumb stuff like all of the automotive / carplay world, then you'd be liable in court.

I once had a car that hid the backup cam behind a legal disclaimer every time you turned it on. I'm sure at least one pedestrian was hit by a car in reverse while that screen was on. The manufacturer should be 100% liable for the poor UI decision.

reply

upvote

by skygazer9 hours ago|

[-]

I think their intent is actually safety. They employ two touch interaction models: Flexible while not moving and simplified while driving. For instance, keyboard input becomes unavailable while moving and you must rely on Siri. I personally find it irritating, particularly when I am a passenger, but I get it.

reply

upvote

by gabeio8 hours ago|

[-]

> Until siri can reliably handle "Navigate to <business that is a decade old>", offline and using pre-downloaded maps

Yeah, that's unfortunate considering you can have it do nearly all of that (download maps, navigate to business all while offline), except asking siri to do it for you.

> I once had a car that hid the backup cam behind a legal disclaimer every time you turned it on.

My car pops up a dialog telling me (in a paragraph+) to pay attention while in semi-autopilot which I have to click "ok" on to get back to the map. It's very ironic, and extremely dangerous.

reply

upvote

by wolvoleo58 minutes ago|

[-]

Yeah they could just take the open whisper model which really is great especially if you use the larger parameter versions. I love it.

reply

upvote

by twoWhlsGud11 hours ago|

[-]

I don't think things have improved much on that front since Colin Hughes gave a run down on Voice Control's problems several years ago

https://www.theregister.com/on-prem/2023/08/16/those-who-rel...

Would be great if they could at least fix two major bugs:

* input simply fails (seemingly) randomly where it is supported and many apps from major vendors don't support dictation input at all (e.g. OneNote) (there should at least be a fallback (a la Dragon Dictate from decades ago) for those cases * capitalization is still random leaving you with many errors to correct

but Apple mostly seems to see accessibility as something to use to enable performative press releases not actual functionality...

reply

upvote

by phillco5 minutes ago|

[-]

I think the random capitalization problem has gotten much better in iOS 26, or one of its minor releases (I recall in the 26.0 beta it was still there). I would have sworn before they would never fix it…

The streaming dictation they also added in that release is also much appreciated although occasionally buggy.

reply

upvote

by leokennis7 hours ago|

[-]

All day every day my iPhone makes me feel like an idiot. I need to correct every other word I type (or at least what my iPhone thinks I typed). While correcting, autocorrect introduces new and even more baffling misspellings.

Sometimes it gets to “fever dream where you’re suddenly unable to successfully perform everyday tasks” levels of insanity.

And the worst part is: it used to be fine. I’d type more or less on full keyboard levels of speed and accuracy on my iPhone 4S.

reply

upvote

by jedberg4 hours ago|

[-]

One thing that helped me a lot to fix the iPhone keyboard was turning off slide to type. I learned that tip here on HN actually!

Open your Settings app. Tap on General. Scroll down and select Keyboard. Toggle off Slide to Type

reply

upvote

by divbzero11 hours ago|

[-]

It’d be amazing if speech-to-text could take into account context as well: Greek if I’m speaking Greek, Korean if I’m speaking Korean, or for (int i = 0; i < count; ++i) if I’m dictating code.

reply

upvote

by titzer11 hours ago|

[-]

Apple dictation on MacOS is actually pretty dang good. I've got it bound to a double-tap on fn and I use it pretty regularly.

reply

upvote

by Invictus011 hours ago|

[-]

try wisprflow and then tell us it's good

reply

upvote

by prepend8 hours ago|

[-]

Wisprflow is not $12/month better than ios.

I’d much rather have “cheap, dependable, and good enough” over oligarch pricing for what used to be a one time software purchase any day.

reply

upvote

by titzer11 hours ago|

[-]

I just installed this and already despise its pricing model. I trust this product approximately zero.

reply

upvote

by primaprashant9 hours ago|

[-]

Open-source STT apps are plenty and just as good. Pick one from this list:

https://github.com/primaprashant/awesome-voice-typing

reply

upvote

by wahnfrieden10 hours ago|

[-]

there are plenty of free alternatives using the same models

reply

upvote

by WorldPeas9 hours ago|

[-]

speaking of touch though they musn't have touched the swipe-typing feature in a while because somehow it works even better than the keyboard for me most of the time! No nonsense words like "oul" instead of "oil" constantly.

reply

upvote

by throw0317201914 hours ago|

[-]

I use Aqua Voice because Apple STT is so frustrating.

reply

upvote

by prepend8 hours ago|

[-]

I turned off my iphone’s autocorrect because it made too many stealth errors. Now I notice all my mistypes.

I have a friend named Zi in my contacts. For some reason ios kept autocorrecting “I” to “Zi” and would do it too far back for me to notice.

What’s weird is how this is such a dumb bug that Apple usually irons out.

reply

upvote

by OrvalWintermute7 hours ago|

[-]

I want to echo the comments that you just made.

One of my primary methods of interacting with an iPhone is through speech and the state of Apple speech transcription is pretty horrible. It bothers me greatly.

I know some of the workarounds and things but it does feel like it’s in the Stone ages.

I don’t think it’s a microphone issue since iPhone microphones are fairly decent and I don’t think it’s a CPU issue either because Apple Silicon seems to be some of the best on the market. Which leaves us with the software…

Maybe they should put that cash hoard to good use and buy up some of these transcription companies or license their IP so we get truly high-quality transcription.

reply

upvote

by port1112 hours ago|

[-]

There’s so much complaining about their keyboard issues, and it’s really an infuriating part of the iOS experience. The phone being hard to grip/slippery doesn't help, no…

reply

upvote

by mohsen115 hours ago|

[-]

Fun fact: This video was made accessible to sighted people because no blind person would ever listen to voice at that speed. Honestly if you ever observe a blind person using computers you'd impressed how they can listen to audio at unimaginable speeds.

reply

upvote

by asimovDev15 hours ago|

[-]

https://youtu.be/wKISPePFrIs?si=ahGfFp0U7-pTU9w6&t=43

my go to example of this is this talk by Saqib Shaikh (a blind software engineer at Microsoft) giving a talk about Visual Studio. Link is timestamped

reply

upvote

by isityettime14 hours ago|

[-]

I think it takes quite a lot of practice to reach this speed. It's not rare among blind developers, but I think it still takes a lot of work to get there. Pretty impressive!

I wish more people would watch videos like this just because having a realistic idea of how blind people do certain tasks can help you move from pity or even compassion to a more productive kind of understanding. I think sometimes when you haven't seen it, you can't really even imagine how it can be done.

reply

upvote

by Aboutplants14 hours ago|

[-]

I listen to a lot of podcasts and listen at 1.5-2.0 speed and it’s to the point that I literally cannot stand listening to 1.0 speed anymore as they go too slowly (depending on the content of course).

reply

upvote

by simondotau13 hours ago|

[-]

Same. Returning to 1x speed makes people sound (to my 2x-abused ears) drunk and slurring their works. If I want to listen to something slowly and carefully, I will just about tolerate 1.25x.

What really frustrates me is watching/listening to discussion of music, because I am forced to listen to the talking at 1x because the music sounds wrong (and is wrong) at anything other than 1x.

reply

upvote

by kevin_thibedeau13 hours ago|

[-]

The funny thing is that slow talkers sound normal at 2x speed. It's jarring when you hear their actual speech.

reply

upvote

by bonoboTP3 hours ago|

[-]

And then there is Dwarkesh who sounds like 2x on 1x.

reply

upvote

by ghaff12 hours ago|

[-]

I listen to podcasts at 1x. But there are a few people I've done podcasts with that I do various audio tricks to speed up.

reply

upvote

by BurningFrog13 hours ago|

[-]

Playing music at 1x should be a pretty simple feature to add to those apps.

Ideally it should be done while encoding.

reply

upvote

by ebiester13 hours ago|

[-]

I'm so glad YouTube and other podcast players have moved to support 3.0 speed. As I get comfortable with one, I move it up some. For things like sports and "did you know" content, I can go 2.5 if I'm not multitasking. For technical content, sometimes I'm stuck at 1.0.

reply

upvote

by kevin_thibedeau13 hours ago|

[-]

You can get browser extensions to do it for all media controls on any site. YouTube's "Premium" for 3x is laughable when it's an internal browser function.

reply

upvote

by webstrand12 hours ago|

[-]

Another fun thing is if you use an extension you can fast-forward through the advertisements too. For some channels I use around 3.5x playback speed.

reply

upvote

by gregoryl10 hours ago|

[-]

Ublock origin blocks the ads entirely on Firefox.

reply

upvote

by satvikpendem9 hours ago|

[-]

They're talking about in video sponsor ads, and those can be skipped using SponsorBlock or similar.

reply

upvote

by thrownthatway12 hours ago|

[-]

That’s an amusing observation.

Likewise, YouTube’s “premium” feature of not displaying ads is laughable when displaying content is literally an internal browser function.

I pay anyway, because I was going to pay for an on-demand streaming music service anyway.

reply

upvote

by LoganDark11 hours ago|

[-]

Premium is for up to 4x, not just 3x

reply

upvote

by bonoboTP3 hours ago|

[-]

Me too, but often it's more out of FOMO and a feeling of there being so much other stuff to listen to. But in truth it's either way just a fraction you can listen to from immense amounts of valuable stuff. On more reflection, I find that listening on 1x to something allows more thinking from my end, questioning the truth of what the speaker says, thinking about tangents or similar things I've heard elsewhere, pondering stuff etc. Just like reading a book fast isn't the best strategy. Sometimes you want to look up and just think about what you just read, etc.

reply

upvote

by michaelbuckbee13 hours ago|

[-]

Something that the Overcast podcast player does (and probably others) is silence removal, which in some ways is even better than the raw speedup.

reply

upvote

by runjake12 hours ago|

[-]

I am jealous. I can't listen and retain most podcasts at more than 1.0x. I even disable the podcast player functionality that eliminates pauses and silent sections.

reply

upvote

by bilater11 hours ago|

[-]

Same haha. But for me 1.5x is the sweet spot. Anything more and I find myself rewinding a lot. I want to feel comfortable absorbing info and not on constant alert.

reply

upvote

by lowercased12 hours ago|

[-]

I do the opposite in a few. There's some I follow weekly and it's only an hour or so each. I drop it to .7 or .8 because I want to get a bit more time with the hosts. Possibly stupid but I've sort of got used to some of these folks at that speed, and the normal speed is 'weird'. One is a political podcast, and when they play clips of Trump, he does always sound very drunk, but the hosts themselves (to me) don't sound drunk, just... measured. Some of it may be audio quality - I'm getting their microphone directly, often the audio clips are from field recordings.

reply

upvote

by thrownthatway12 hours ago|

[-]

Except Marc Andreessen, I can’t decode his speech at 2x

Maybe it’s just a matter of practice.

reply

upvote

by miki12321113 hours ago|

[-]

> It's not rare among blind developers

It's not rare among the blind in general.

Unless you're completely technologically illiterate, the kind of person who has no idea how to install an app or sign up for an online account, you're probably doing something of the sort.

reply

upvote

by gostsamo14 hours ago|

[-]

If you are dedicated, few weeks to few months of usage with regular ramp up. You should be careful with adjusting which symbols are read though and sometimes the programing languages matters because different symbols have different significance for understanding the code.

reply

upvote

by dijit14 hours ago|

[-]

Ho-ly cow. That is very impressive.

I'm not even sure what to say, but discoveries like this are why I use hackernews, I'd never have known this otherwise.

reply

upvote

by miki12321113 hours ago|

[-]

To be fair, the acoustics of the room that talk was given in are... not too great, to put it mildly.

I can easily understand Eloquence (the speech synthesizer he's using) at that speed, but I struggled a bit with this one.

reply

upvote

by spartanatreyu3 hours ago|

[-]

1x is too slow for me.

Whenever I'm watching lectures / talks / podcasts, I tend to watch/listen to them at 2x to 2.5x times speed.

I only need to lower it if someone flubs an important word in a definition, I'll replay that part at 1x speed.

If the person is talking particularly slowly (usually for international audiences) I put the speed up to 3x to 4x speed so it sounds like normal 2x to 2.5x speed.

---

My youtube muscle memory:

(standard video controls used by every video editor ever)

J = back 5s

K = play/pause

L = forwards 5s

(youtube specific controls)

Shift F = toggle fullscreen

Speed controls (this part is muscle memorised as fast as a password input):

1. Cmd/Ctrl Shift K: opens console

2. Up arrow: loads previous command, typically: document.querySelector('video').playbackRate = 2.5

3. Enter: runs command

You have to type in the command for the first time, after that to change the speed, change "2.5" to whatever number you want and console history will remember the change so you can go through the different values with up/down arrows before pressing enter.

reply

upvote

by peab12 hours ago|

[-]

Woah, this is really cool to see

reply

upvote

by throwatdem1231114 hours ago|

[-]

I did IT for a community Center way back in the day and the director was blind. I was blown away by how fast his screen reader read things out to him - completely incomprehensible to me - and his efficiency with keyboard shortcuts would put even vim/emacs elitists to shame.

reply

upvote

by miki12321113 hours ago|

[-]

The way (Windows) screen readers handle web navigation is basically Vim in disguise.

You have two modes: "focus mode", where you can edit text in text fields and keys are passed straight to the browser, and "browse mode", where keys move a virtual cursor around the page.

In browse mode, navigating with just arrow keys all the time would be just as slow as you might imagine, so you use single-key keyboard shortcuts to move by role, E.G. to the next heading, button, table or unvisited link.

The keyboard layout is optimized for memorizability and not efficiency, you use the actual arrow keys instead of hjkl for example, but the concepts are eerily similar.

There are a couple of other approaches to solve this problem, Mac OS's Voice Over is much more Emacs-like for example, and each approach has its own pros and cons, but that's definitely one way to do it.

reply

upvote

by isityettime15 hours ago|

[-]

Probably because it's an advertisement, and super fast robot voices can feel extremely harsh and annoying. Even blind people who rely on them find them overstimulating sometimes, lol.

reply

upvote

by freedomben14 hours ago|

[-]

Indeed, and not just fast, but often heavily robotic (which many sighted people struggle to understand even at 1.5x). I remember reading about a blind person who learned how to do echo-location using sound, and it seemed like such a cool superpower, that one of these days I'm going to take the plunge and unplug my monitor and start learning how to really use the tools. I worked with a blind person a few years back who got almost double the battery life from his laptop as the rest of us by having the screen off all the time, so that alone would be a nice feature. I may never get to the epic level of echo-location, but if I get even half-way there it would be awesome. With a bonus of being able to actually QA a11y changes.

reply

upvote

by Barbing13 hours ago|

[-]

> blind person who learned how to do echo-location using sound

RIP kid https://youtu.be/fnH7AIwhpik

reply

upvote

by thrownthatway12 hours ago|

[-]

I’m not gonna watch that as I’d rather stick to my head-canon that he had an altercation with a dolphin.

reply

upvote

by Barbing12 hours ago|

[-]

:) IIRC that video would have been fully produced/published during his lifetime (but 100% would have to avoid the comments!)

If he’d like your humor I like it too :dolphin:

reply

upvote

by thrownthatway12 hours ago|

[-]

> echo-location

We all do that, I mean unless you’re hearing impaired.

Everyone’s familiar with dropping a coin or such and knowing exactly where it landed without looking.

That’s more passive sonar though.

Do I recall seeing videos of guys mountain biking and making a hissing sound for an active sonar style echo location?

Or am I making that memory up.

reply

upvote

by thrownthatway12 hours ago|

[-]

Twenty years ago I took a level 1 tech support call from a visually impairment guy and it took about 3.2 seconds to realise his condition was no impediment for using a computer because of the screen reader tech he was using.

reply

upvote

by embedding-shape15 hours ago|

[-]

> Honestly if you ever observe a blind person using computers you'd impressed how they can listen to audio at unimaginable speeds.

Even better, fire up Orca (or whatever screenreader application your OS comes with) yourself and try to use your computer while shutting your eyes, kind of eye-opening (no pun intended) what kind of experience these sort of users typically get. And also, you quickly start to understand why they set the speech rate for their voice synthesizer to be so fast, it's almost unbearable navigating applications (and particularly lists) otherwise.

reply

upvote

by jchw14 hours ago|

[-]

When I was at Google, I'd periodically test our (internal-only) app with Chromevox with the display off. It's not that it sounded like it would be easy, but it really is a challenge, and I can only imagine the muscle memory built up over time of trying to work around accessibility bugs and strange behaviors.

Unfortunately it seems impossible to get all that much funding for accessibility work :/ I wonder what ever happened to the Newton accessibility bus intended to supplement Wayland...

reply

upvote

by kridsdale113 hours ago|

[-]

I’ve worked at Apple Facebook and Google. Apple was the only one that made a11y bugs and a face to face consultation with a blind developer to show you how your app sucked, mandatory before you could launch.

reply

upvote

by embedding-shape13 hours ago|

[-]

> I wonder what ever happened to the Newton accessibility bus intended to supplement Wayland...

Hm, never heard about it, but now I'm wondering too. I just finished implementing proper accessibility support for my native app toolkit for Linux, macOS and Windows, but only done it for X11 so far, I was just gonna get started with Wayland. What is the accessibility story on Wayland, couldn't people rely on the same protocols as with X11? That was my impression, but haven't really dig into yet.

reply

upvote

by RobMurray9 hours ago|

[-]

It's still AT-SPI for wayland, the main difference is how screen readers grab keyboard input events.[0] I don't think there is a big difference from a toolkit point of view. I don't personally have experience with Wayland because most blind people recommend Mate as being the most accessible desktop still.

Thanks for considering a11y for your toolkit - it really makes a difference to those of us who are disabled. Are you implementing a11y separately for each platform? If you use accesskit[1] you only have to implement it once for all platforms. I recently vibe coded accessibility for the Swell toolkit[2] used by Reaper. I have a branch using accesskit and a branch implementing at-spi. Accesskit made things a lot easier and more performant.

Let me know if you would like a screen reader user to help with testing your toolkit.

[0] https://lwn.net/Articles/1025127/

[1] https://github.com/AccessKit/accesskit

[2] https://github.com/RDMurray/WDL/tree/accesskit

and my fork of accesskit with some features and fixes for unix: https://github.com/RDMurray/accesskit/tree/swell-fixes

reply

upvote

by miki12321113 hours ago|

[-]

The muscle memory build-up is definitely real.

There are apps I use semi-regularly that less-experienced screen reader users thought were inaccessible, and I couldn't even explain what they were doing wrong from memory. The ways of working around accessibility issues are just so ingrained in me that all I can usually remember is "yeah I did this somehow, but it was six months ago and I have absolutely no idea which specific tricks I needed for this one."

reply

upvote

by seviu15 hours ago|

[-]

That time my Mac display broke and I had to log in taught me much about how important learning accessibility is even for non blind people.

reply

upvote

by isityettime14 hours ago|

[-]

> you quickly start to understand why they set the speech rate for their voice synthesizer to be so fast, it's almost unbearable navigating applications (and particularly lists) otherwise.

I imagine that for coding it also helps deal with the fundamental problem of an ephemeral stream rather than a persistent document that you can navigate visually in multiple dimensions. Working memory is limited, and getting more text in in a short period of time probably helps you work within that better. I also imagine that working with text via audio all the time gradually stretches and improves memory.

reply

upvote

by miki12321113 hours ago|

[-]

It's not the ephemeral stream that's the problem, it's the limited bandwidth.

You can show a lot more info on a screen than you can transmit through speech in a short period of time. That doesn't mean you read faster than you listen, just that sighted people essentially use their eyeballs as an "input device" to decide what information to look at.

If there's an object on the screen that you want to examine but that you don't need to click, you can just "navigate to it" with your eyeballs, without ever touching a mouse or keyboard. We don't have that luxury.

This means we need a much more efficient system for navigating what's on the screen, but that only gets you so far. Eventually, the easiest way to deal with this problem is just to increase the bandwidth of your channel, and you do that by increasing the speech rate.

reply

upvote

by satvikpendem13 hours ago|

[-]

I listen to a lot of podcasts and YouTube videos at 3x or 4x speed now, having slowly built up the skill over a few years. It's pretty nice now and saves time, and it's remarkable how well the human brain can adapt to such input.

reply

upvote

by bonoboTP3 hours ago|

[-]

I took a course in speed reading, learning to read straight down the middle of the page, and I was able to go through War and Peace in 20 minutes. It’s about Russia.

reply

upvote

by Rendello9 hours ago|

[-]

I watch most talks at 2x speed or 1.5x if it's a really technical topic. Bryan Cantrill excepted!

reply

upvote

by a01212 hours ago|

[-]

I’m the opposite, I can’t stand the fast speaking videos. But I also speed up 1.2x to 1.5x if the videos were too slow.

reply

upvote

by thrownthatway12 hours ago|

[-]

I’m struggling to understand your definition of opposite here.

Wouldn’t opposite mean you listen at sub 1x speed.

Whereas as your definition seems to be ”I’m the same, but less so.”

reply

upvote

by brador12 hours ago|

[-]

You recall nothing and you know it. You're just wasting time you could use for something useful or meaningful in your life. Kids call it "Anxiety cope" but I don't agree.

reply

upvote

by ragazzina12 hours ago|

[-]

Can you recall 3 lines of dialogue from the latest movie you watched?

reply

upvote

by brador11 hours ago|

[-]

If you're listening to a podcast at 3x you're trying to learn something. No one is trying to learn watching a movie.

reply

upvote

by HDBaseT2 hours ago|

[-]

Podcasts aren't entirely learning, a lot of podcasts are pure entertainment, such as comedy podcasts.

Sure you can "learn" something from a Sports Podcast or a Comedy Podcast, but you could also say you are "learning" from a podcast which just reads out random numbers. You could "learn" at 33 minutes, 11 seconds, the number 6 is read out, then 8, then 1 but I wouldn't call that learning, or at least its pointless learning.

reply

upvote

by satvikpendem10 hours ago|

[-]

Maybe you can't but I can recall whatever I need to.

reply

upvote

by thrownthatway12 hours ago|

[-]

[flagged]

reply

upvote

by dempedempe12 hours ago|

[-]

The difference is that the voice in the video is a natural, human voice. It's the robotic sounding voices that always pronounce the same letters the exact same way (mostly the Eloquence family of voices) that enable blind people to listen at superhuman speeds. You can't listen to a real voice that fast.

reply

upvote

by RobMurray14 hours ago|

[-]

I know plenty of blind people who have their voice speed unbearably slow and barely scratch the surface of what technology can do for them. I think an interface where you can tell your phone what to do in natural language will really help a lot of less technical people.

I'm not getting my hopes up though given apple's history with Siri, which is truly awful.

reply

upvote

by chipotle_coyote14 hours ago|

[-]

Apple's history with accessibility is, on the whole, pretty good. I strongly suspect that the "coming soon" part of this means "after we integrate Google Gemini models into the system," so I don't think you should use the current state of Siri as a yardstick. (I actually have decent luck with the current Siri, but I don't push it very much and have sort of adapted myself to its limitations; on the flip side, I have a lot of skepticism around LLMs, but they're really a quantum leap in natural language processing capability over what came before, and the use cases they're showing here seem to be right in the LLM wheelhouse -- with the asterisk of "you're still always going to have to check its work.")

reply

upvote

by alwillis10 hours ago|

[-]

> I strongly suspect that the "coming soon" part of this means "after we integrate Google Gemini models into the system…"

I don’t think the Google's tech has anything to do with these features.

This would had to have been in the works long before the Google announcement. Also, these are enhancements of existing iOS and macOS features. They don’t require an LLM anyway; these features use Apple’s Machine Learning models.

For example, creating subtitles for videos? iOS 16 introduced Live Captions for FaceTime calls in 2022 [1].

[1]: https://www.apple.com/newsroom/2022/05/apple-previews-innova...

reply

upvote

by miki12321113 hours ago|

[-]

Coming soon very likely means iOS 27.

This has been the typical pattern for Apple for the last few years. The flashy features are announced at WWDC, accessibility has a dedicated, earlier press release. Before this practice, accessibility announcements would usually be tucked in some WWDC slide that most people wouldn't even notice.

reply

upvote

by duskwuff10 hours ago|

[-]

> accessibility has a dedicated, earlier press release

IIRC, it's timed to land around Global Accessibility Awareness Day (May 21).

https://accessibility.day/

reply

upvote

by Barbing13 hours ago|

[-]

The thing that disappointed me about this amazing announcement was “coming later this year“. They should probably give us dates for a little while at least until we get the (<)$95 checks.

I just would not wanna promise anything. Except “available for download this Friday“ once the gold master is passing tests.

reply

upvote

by alwillis11 hours ago|

[-]

The "coming later this year" language is disappointing to some people, but that's just Apple propriety. Allow me to explain.

"Coming later this year" means it's part of a publicly committed release — iOS 27, macOS 27, etc. — not vaporware.

The annual pre-WWDC accessibility announcement is a tradition, and with the conference less than a month away, expect more detail then. New a11y features have a good chance of appearing in the 10am PT keynote or the Platforms State of the Union, the developer-focused follow-up at 1pm PT.

That said, things are still fluid with three weeks to go — features can be added or pulled at any time. If something gets bumped from the main presentations, there will almost certainly be a dedicated video session covering it.

As for availability: some of these features will land in the iOS 27 and macOS 27 betas, which drop during WWDC for Apple Developer Program members. The public beta follows in July, and there's a free tier of the developer program if you want early access.

Don't expect everything at once, though. Some features won't arrive until the September release candidates — and even then, a few may ship labeled "beta" or "experimental," or hold for a future dot release.

reply

upvote

by thrownthatway12 hours ago|

[-]

Being able-bodied and sighted is probably the biggest disadvantage for using iOS.

Twenty years and text input & manipulation on iPhone sucks a big fat hair pair of dogs balls still.

The last time I daily drove Android was over two years ago and it was immeasurably less God-damn-I-wanna-dig-Jobs-corpse-up-n-give-the-guy-a-piece-of-my-mind, only problem is his grave is unmarked. Arsehole!

reply

upvote

by isityettime14 hours ago|

[-]

Whenever my sister (blind) and I (visually impaired) visit my mom (blind) we secretly turn up the reading speed on her TV just a little because we can't stand how unbearably slow she keeps it, but if we turn it up quickly, she'll freak out.

After a few more years of Thanksgivings and Christmases and Mothers' Days, we'll finally train her up to a reasonable speed lmao.

reply

upvote

by kridsdale113 hours ago|

[-]

This is heartwarming. The audio equivalent to the practice of sighted people fixing the bad default settings on boomers’ televisions each Thanksgiving.

reply

upvote

by ShinyLeftPad15 hours ago|

[-]

Blind people can't change video speed? The control is available right there.

reply

upvote

by kochb14 hours ago|

[-]

Yes, the audio speed can be adjusted.

Whether that control you see visually is actually accessible to a blind user is a different matter entirely. Further, it maxes out at 2x, but a blind person would typically screen read at the equivalent of 3-6x.

reply

upvote

by ShinyLeftPad14 hours ago|

[-]

Huh, 2x is low even sometimes for sighted people.

Related, it seems like YouTube recently paywalled speed increase beyond 2x. Another way in which it's not cheap to lose sight, I guess.

reply

upvote

by the_other14 hours ago|

[-]

> Another way in which it's not cheap to lose sight, I guess.

True.

We can frame it even more strongly: "default societal practices actively discriminate against people with disabilities; they intentionally, consciously choose to make life harder for people who're disadvantaged".

reply

upvote

by thrownthatway12 hours ago|

[-]

[flagged]

reply

upvote

by entrope13 hours ago|

[-]

> Another way in which it's not cheap to lose sight, I guess.

Seems like it would be a win-win to have a user setting to opt out of video in exchange for ungating that feature.

reply

upvote

by jofzar15 hours ago|

[-]

No they are saying that the audio playing for tts would be at like 2.4x what's in the commercial.

reply

upvote

by ShinyLeftPad14 hours ago|

[-]

I don't get it. The speed of TTS can be adjusted, right?

Pretty sure there's enough blind people who don't listen to voice at insane speeds, because they listen in their non-native second language or for whatever other reason. What's wrong in using lowest common denominator that's 100% accessible to those people as well as people who want faster speeds? Unlike "too fast", "too slow" doesn't get entirely inaccessible, it's just boring.

Such a random reason to criticize for.

reply

upvote

by superchink14 hours ago|

[-]

I don’t think it’s meant to be criticism. It’s an interesting piece of information that gives a peek into how those with vision impairment consume content. There’s nothing wrong with it; but it was enlightening to consider the experience for those of us who have not been forced to.

reply

upvote

by ShinyLeftPad14 hours ago|

[-]

Seems like I brought my own negativity into this...

reply

upvote

by hombre_fatal13 hours ago|

[-]

I don't think you did.

Some blind people listen to things at superhuman speeds, but not all blind people. Using a normal reading speed is a sensible choice for an ad trying to appeal to blind people since you don't want to intimidate those who don't use superhuman speeds.

Going from that to "heh a sighted person made this because it's normal speed" is simply incorrect.

It was the sort of statement an HNer might make to showcase some trivia they have about some other group, but they oversold it.

reply

upvote

by isityettime14 hours ago|

[-]

> Pretty sure there's enough blind people who don't listen to voice at insane speeds, because they listen in their non-native second language or for whatever other reason.

Yes, for lots of reasons. It takes practice to get up to a high speed with a given TTS. People who go blind later in life are just beginning, and it can take a long time for them to get up to really high speeds. You may also need to reset somewhat when you change from one TTS to another. And blind people's ears are subject to problems just like anyone else's; if your hearing isn't great you may need slower speeds or higher volumes or both. That's why even though most people use screenreaders at much higher speeds, the defaults when you turn on a new device are painfully slow. You have to set a conservative default so people with less experience/worse ears/whatever can get by.

Anyway I don't think it's a criticism. It's just noting that it doesn't depict how most people will use end up using it, and if you're curious about what typical usage sounds like, you should look for another example.

reply

upvote

by stavros14 hours ago|

[-]

No. It's not criticism. What they're saying is that the video was shot with a default that a sighted person could understand, because any blind person would naturally have their speed set to much higher than that.

It's like how in videos that teach people a foreign language, everyone speaks slowly and uses simple words, even though native speakers don't talk like that at all. The GP is simply saying that an actual blind person would be way more efficient at it, but they made the video with inefficient settings so sighted people could understand what was going on.

reply

upvote

by UltraSane12 hours ago|

[-]

I briefly worked at a call center and I would hear supervisors listening to recorded calls at warp speed.

reply

upvote

by thrownthatway12 hours ago|

[-]

> boiler call center

What does this mean?

reply

upvote

by js28 hours ago|

[-]

https://en.wikipedia.org/wiki/Boiler_room_(business)

reply

upvote

by bitwize14 hours ago|

[-]

I've heard textual description tracks on television programs before. They come fast, but not screen-reader fast. To the untrained ear a blind person's screen reader sounds like when you somehow get the TI-99/4A's speech synthesizer to read from invalid memory.

reply

upvote

by isityettime14 hours ago|

[-]

The audio description tracks are a different genre than screenreadera perform. They're acting, by actors, carefully written and performed to fit into the gaps in the dialogue while preserving the mood and flow of the show. I think speeding them up or making them robotic would ruin them, while both of those traits are actually desirable for screenreaders.

reply

upvote

by RobMurray8 hours ago|

[-]

Ideally that is what AD should be like. too often you set the volume right for a movie so the characters can be heard, then the AD is like an insanely boomy voice that shakes the room. Plus for some reason the also turn the movie audio down, as if that would be necessary.

reply

upvote

by Barbing12 hours ago|

[-]

How did you come across those tracks? Never have myself.

reply

upvote

by bitwize8 hours ago|

[-]

My in-laws once misconfigured their television and it came blaring through.

reply

upvote

by Sweepi15 hours ago|

[-]

dont you worry, as a sighted person I am also infuriated by apples slooow reading speed, e.g. for "Announce Notifications".

reply

upvote

by hightrix13 hours ago|

[-]

Also as a sighted person, this is why I hate the modern trend of using the video format to show 3-4 bullet points. Just give me the text.

reply

upvote

by nechuchelo15 hours ago|

[-]

This looks like a genuinely useful application of LLMs.

I wish more companies focused on how they can help humans instead of replacing us or squeezing us as hard as possible in the name of productivity.

reply

upvote

by c0wb0yc0d3r15 hours ago|

[-]

I think we should reserve judgment until this lands in the hands of the people it helps.

My experience is limited to my elderly parents who have trouble seeing. With the text size Apple allows them to set it to, their phones are unreadable. Text runs off the screen in every app, 1st and 3rd party.

In their bill example, the user is told to confirm with the provider. Why not offer to call the number on the bill? Instead of telling them to use text detection, do it for them? Presumably Apple Intelligence would already have that capability. I’m afraid this will be a gimmick at best.

EDIT: Forgot to mention, the grip is good to see. Hopefully they don’t charge the apple tax on it.

reply

upvote

by kps12 hours ago|

[-]

Yeah, I used to use iOS with text one step above the default size, and text was often cut off.

I have a problem with astigmatic halation that makes ‘dark mode’ difficult to read. Since iOS 26, multiple aspects of the system have been made dark only, contrary to the system setting. Writing text correctly should be the lowest of low-hanging fruit.

I suspect this is more of a flashy ‘AI’ promotion rather than reflective of any real commitment.

reply

upvote

by skydhash12 hours ago|

[-]

I had to set macOS on high contrast to be able to differentiate ui elements at glance. But most electron-based apps do not get the hint or even provide a high contrast theme.

reply

upvote

by skydhash9 hours ago|

[-]

I checked teams and it has one, but it’s a dark theme which is a no go given my astigmatism.

reply

upvote

by kakugawa7 hours ago|

[-]

It's prob why they chose a11y features. They have more pain, so they're willing to tolerate more growing pains. (And prob more motivated to provide feedback.)

reply

upvote

by tiffanyh13 hours ago|

[-]

This is what Apple does best.

They treat new industry advancements as technology, not products itself.

AI will be a feature to improve the customer experience, not the product itself.

reply

upvote

by lern_too_spel13 hours ago|

[-]

These features have existed on Android devices for years. What Apple does best is marketing.

https://blog.google/products-and-platforms/platforms/android...

https://android-developers.googleblog.com/2024/09/talkback-u...

reply

upvote

by kube-system11 hours ago|

[-]

I think the above person was making a commentary about the things Apple chooses not to do. Apple strategy is often to be intentionally last to market, after the dust settles.

reply

upvote

by RobMurray8 hours ago|

[-]

Apple was first to market with Voiceover. Google took a very long time to come close to catching up.

reply

upvote

by lern_too_spel9 hours ago|

[-]

The dust settled on these accessibility features years ago. Why would Apple choose not to do these things? Live captions in particular is useful even for those who are not hard of hearing because it lets people watch uncaptioned videos in environments that are too noisy or that need to be quiet.

reply

upvote

by kube-system8 hours ago|

[-]

By "dust settled" I don't mean that the technology "exists" -- but rather that feature development has slowed down and most products have stabilized as feature complete and mature.

The on-device ML models that are being used by Google and Apple are both quite new and in active development.

Many of Apple's most successful products have shipped years or even a decade after their competitors. They have tried using first-mover advantage in the past but typically fail when using that strategy.

reply

upvote

by lern_too_spel2 hours ago|

[-]

They're in active development, but they already worked well at launch in English in 2019, serving enough customers to be very useful. I was using it myself.

reply

upvote

by kube-system36 minutes ago|

[-]

Yeah, Apple releases most of their products/features long after competitors have useful products/features of the same type. This really isn't any different.

reply

upvote

by bsanders34315 hours ago|

[-]

I agree. There seems to be a lot of potential in this space (from my outsider view). I really hope that this issue from an earlier article (https://news.ycombinator.com/item?id=48178378) doesn't become common enough to make useful functionality like this a danger. Seems unlikely in the short term but as use cases grow, so might the bad actors.

reply

upvote

by koolala15 hours ago|

[-]

Its with their servers right? Do they trust a iPhone with their life? Or they are trusting their data center?

reply

upvote

by nechuchelo15 hours ago|

[-]

Looks like some of the features might use on-device models. They mention subtitle generation works on-device.

reply

upvote

by bilbo0s15 hours ago|

[-]

Let's be honest, compare the amount of money a corporation can make helping visually impaired people to the amount of money they can make replacing software developers and financial analysts.

Don't get me wrong, Apple using these technologies to help humans who are in need of help is laudable. But let's not pretend we don't know why most corporations don't look into this kind of thing. I think if we're being honest, we all very much know why they leave this sort of thing to the always nebulous "others".

reply

upvote

by JimDabell14 hours ago|

[-]

Tim Cook has been pretty clear where he stands:

> “When we work on making our devices accessible by the blind,” he said, “I don’t consider the bloody ROI.” It was the same thing for environmental issues, worker safety, and other areas that don’t have an immediate profit. The company does “a lot of things for reasons besides profit motive. We want to leave the world better than we found it.”

— https://www.forbes.com/sites/stevedenning/2014/03/07/why-tim...

reply

upvote

by bilbo0s14 hours ago|

[-]

Again, it's absolutely great that Apple does these things!

I was just answering the question of why other corporations don't.

Money.

There's relatively little money in helping the visually impaired. You have to do it because you want to do it. Not because you're going to get rich.

reply

upvote

by lern_too_spel13 hours ago|

[-]

Apple's competitors have had these features for years (Android for 7, Windows for 1), so it's really an indictment of Apple. They give lip service to helping the visually impaired, and this press release is good marketing for the non-visually impaired people who don't know this.

reply

upvote

by RobMurray8 hours ago|

[-]

Really? I haven't used Android recently, but I very much doubt 7 year old Talkback was any where near as good as Voiceover. I also haven't seen a single accessibility improvement in Windows recently. The most accessible Windows apps are usually based on older toolkits like win32. Edge is very accessible, but 99% of that comes from Chrome.

reply

upvote

by lern_too_spel58 minutes ago|

[-]

https://blog.google/products-and-platforms/platforms/android... worked very well in English when it launched. I'm sure it supports more languages now.

It turns out Windows introduced this feature in 2022, not last year. https://www.elevenforum.com/t/turn-on-or-off-live-captions-i...

I see, you're interested in the screen reader improvement. Android added that in 2024. https://android-developers.googleblog.com/2024/09/talkback-u...

Windows added it in 2025. https://www.accessibility.org.au/narrator-update-brings-ai-d...

reply

upvote

by lotsofpulp14 hours ago|

[-]

>But let's not pretend we don't know why most corporations don't look into this kind of thing.

I assume almost everyone looks into spending less money than more money for equivalent goods and services.

reply

upvote

by nickpp11 hours ago|

[-]

> help humans instead of replacing us or squeezing us as hard as possible in the name of productivity

Increasing their productivity is helping humans.

reply

upvote

by jeffbee14 hours ago|

[-]

Aren't the LLM-based features of this announcement catch-up features? Describing the contents of the screen is something Gemini has been doing on Pixel phones for a while. It's a fairly obvious use case for a multimodal AI.

My one hope is that this eventually becomes widespread enough to stop alt text scolds.

reply

upvote

by micromacrofoot14 hours ago|

[-]

"looks like" there are a lot of automated accessibility systems that fall woefully short in practical use

this sort of thing really needs input from someone that uses it before we can judge it

reply

upvote

by Brajeshwar1 hours ago|

[-]

All of the brilliant video and voice over was expected, I love the final, “The Apple Logo," that is that taking care of the back of the fences. With AI-this-&-AI-that, the human intuition to think of the unnoticable subtle differentiation will be the thing that stands out of your cohort.

reply

upvote

by everforward12 hours ago|

[-]

Seems like everyone skipped over this part, but optical controls for motorized wheelchairs is a cool idea (at least to me, maybe that's an old idea).

Full VR hasn't done well, but it does continue to make me wonder if there's a market for a stripped and slimmed device. I'd maybe be interested in a device that does optical controls if it fit in regular-sized glasses. I'd be super interested if it had a HUD system (even a super basic one that can only show a handful of symbols). Better still if it had some basic audio, but maintaining the "regular glasses" form factor is more important to me than the HUD or audio.

reply

upvote

by willwade11 hours ago|

[-]

It's been done for a while - follow the links to who they reference. ie https://www.tolt.tech but it's their integration they've done into the OS is interesting.

reply

upvote

by jaybeavers5 hours ago|

[-]

Hi Will, thanks for the representation. And an actual link!

reply

upvote

by hannahstrawbrry11 hours ago|

[-]

Seems like a pretty strong indicator that AR glasses are still being worked on, this definitely feels like one of those features Apple ships to refine before the proper hardware is ready.

reply

upvote

by everforward9 hours ago|

[-]

I lack that confidence here, because it doesn't appear to really involve AR/VR. Eye tracking is the only feature this really uses afaict. The AR seems like a net negative here, it's just the only device Apple has that has a consistent and remotely convenient view of your eyes.

The device is large and makes the user look weird and non-present, which are net negatives.

The only benefit of the AR is showing the directional arrows, but they could get the same thing with much less weird looking non-prescription glasses with arrows sharpied on them. More realistically, anyone really using for mobility probably develops muscle memory for which direction to look to go where and then they don't even need that. At that point it's just a really expensive, really clunky camera.

reply

upvote

by runeks15 hours ago|

[-]

> The total amount due on the bill is $83.89. Please verify this amount with your utility provider or by using Text Detection before making a payment.

1. Use AI to determine how much a bill is for

2. Call up the people who billed you and ask them how much they billed you

3. Pay billed amount

reply

upvote

by tramc15 hours ago|

[-]

It’s still useful to get the information instantly and verify it later. Arguably asking someone you trust to read the number for you might be a better idea than calling the company. Not everybody has that option though.

reply

upvote

by Someone15 hours ago|

[-]

And not everybody wants to use that option all the time. Asking a human makes you feel dependent more than using a tool does.

reply

upvote

by kotaKat14 hours ago|

[-]

Aaaand the logistics of making that call to the company to confirm the amount on the bill can get awkward. IVR and hold-time hell just to get a human to have to explain your predicament as to why you're asking for such a mundane piece of information that was in fifty other touchpoints that you couldn't access as quickly or easily.

(I'm also picturing the poor CSR at the other end of the phone wading through hundreds and hundreds of call logs over the years for simple requests and managers up above screaming 'why is this guy calling us all the damn time costing us money'...)

reply

upvote

by dewey13 hours ago|

[-]

Once you paid the same bill for a few months you'll know how much your phone bill will roughly be and you'll not have to do. They obviously have to put that line in there, just like ChatGPT saying "Please verify everything we tell you" in the footer.

reply

upvote

by kube-system11 hours ago|

[-]

I presume that calling customer support is at least as frustrating for people with disabilities as it is for anyone.

reply

upvote

by martinflack12 hours ago|

[-]

It might be useful if it remembered a bill for, say, 60 days, and could also comment on percent difference since the last one. "The total amount due on the bill is $83.89 which is 4% higher than last month's bill from the same company."

reply

upvote

by 15 hours ago|

[-]

deleted

reply

upvote

by stellamariesays14 hours ago|

[-]

[flagged]

reply

upvote

by jonnyasmar3 hours ago|

[-]

Different angle from the developer side: Apple's a11y API at the OS level is genuinely good. It's the WebKit-embedded-in-native gap that breaks. Shipped a Tauri app where Monaco editor lived inside WKWebView and found out the hard way that VoiceOver's `accessibilitySupport: auto` mode silently breaks backward text selections in Monaco — only setting it to "off" gave us correct selections. Which meant choosing between functional text selection or VoiceOver support, and the answer was selection.

Rock-solid in AppKit/UIKit. Falls over at the embedded-WebView seam where most modern desktop apps actually live.

reply

upvote

by commandersaki1 hours ago|

[-]

Apple accessibility is the #1 reason why I'm in the Apple ecosystem. It does have its shortcomings, but it seems miles ahead of everything else in tech after having tried other ecosystems. It is hard to place a price on these features, but I imagine it is worth significantly more than the base cost of hardware.

reply

upvote

by zersiax14 hours ago|

[-]

Honestly as a blind person and blind developer myself, most of these features get a shrug at best. For one, there's already a bunch of third-party apps that do most if not all of this (Seeing AI, Envision AI, BeMyEyes, Aira, etc.). So at best, this does what all those apps are doing but faster and on-device, which may or may not mean it is also more inaccurate, we'll have to see. In the meantime, Mac OS's screen reader, VoiceOver, has been left to essentially exist in maintenance mode for years, where users have had to build, arguably impressive, third-party solutions to add features to the thing that comparable screen readers on Windows have had for a really long time.

Through that lens, this all looks a bit performative to me, but again, maybe I'll be pleasantly surprised.

The one thing I'm mildly excited to see is the improvement to Voice Control, as guessing what the programmatic name of a button is or having to constantly use a numbers grid to target elements doesn't sound fun.

To respond to what I see in some of the comments:

- On speech rate: It does take quite a bit of practice to crank up the speech rate and there's a degree of retraining you need to do when you switch voices. A lot of more "human" sounding voices are harder to follow at super high speeds which is why a lot of people prefer more robotic but consistent speech and generally aren't convinced by AI-powered TTS yet; they often fall apart if you raise the speech rate past a certain point. - Re: actually waiting for the target audience's verdict: This is so important. I see more and more companies, individuals etc. talk about accessibility, build accessibility solutions and evangelize AI for accessibility without EVER talking to the people they claim to help. This will almost certainly mean mistakes will be made, up to and including doing more harm than good. If you want to do accessibility right, that includes AI products of any kind, hire people with lived experience or you'll get the equivalent of machine-translated text, hackerproof security in one click or an AI-powered coffee bar that orders thousands of rubber gloves. Coincidental note: I have time for new projects right now :P

reply

upvote

by monkeywithdarts5 hours ago|

[-]

+1. Unless things have changed in the past hour since I first read this, this is the first blind/low vision individual with a top-level comment here.

And it was valuable to me as someone going from "bad but correctable" vision to low vision. I didn't know all those apps existed. I've been looking for exactly that sort of assistive technology.

reply

upvote

by Tox466 hours ago|

[-]

it's so validating getting the same conclusion we got to from someone that i've never met. it seems that they create these products without ever speaking with someone with that problem.

Funnily enough we're creating a competitor of these third party app that you mention, with the huge experience of my colleague that is son of blind parents.

We have an mvp online but it's not much yet and i really don't want to be the "do you know i have an app?" guy.

reply

upvote

by lurking_swe6 hours ago|

[-]

nice to hear an opinion from a primary source!

One thing confused me though - you felt like the on-device processing is likely a gimmick. I naively assumed this is a big deal because it means it always work, regardless of your cell service. On the subway, on an airplane, in the middle of nowhere, etc.

Unrelated, what app makes the biggest difference to you in your day to day life?

reply

upvote

by yreg14 hours ago|

[-]

It's a shame Apple removed the screen reader announcements ("the Apple logo") from the youtube version of the commercial.

https://www.youtube.com/watch?v=B3SmsSCvoss

Those made the ad stand out in my opinion.

reply

upvote

by Washuu12 hours ago|

[-]

Change the audio language to "English descriptive".

reply

upvote

by Darwins_Toffees15 hours ago|

[-]

"Vehicle Motion Cues come to visionOS, which can help reduce motion sickness for people who use Apple Vision Pro as a passenger in a moving vehicle. Vision Pro will also support face gestures for performing taps and system actions, plus a new way to select elements with one’s eyes while using Dwell Control."

Maybe just don't wear them in a car?

reply

upvote

by dmix14 hours ago|

[-]

Wearing a headset in the back of an Uber doesn't sound that crazy,

I use those motion cues on my iPhone even though I don't struggle with motion sickness https://www.youtube.com/shorts/OxbjggMcKrk

reply

upvote

by nozzlegear14 hours ago|

[-]

I use them as well. I'm usually the driver so I don't typically look at my phone while the car is moving, but I recently rode along with a family member to an event. They handed me their iPhone to look at something and I felt totally disoriented trying to look at a moving screen in a moving car. I had to resist the urge to turn on the motion cues.

reply

upvote

by caiusdurling13 hours ago|

[-]

It's really useful for having a decent screen up in front of me when I'm a passenger trying to do something on the laptop. Saves staring down at my lap, and removes any motion on my screen from the peripheral view of the driver.

Still somewhat odd when a bus drives out from behind your Terminal mind.

reply

upvote

by matthew-wegner12 hours ago|

[-]

From the article: "new features for controlling power wheelchairs with Apple Vision Pro"

Someone using this feature will want motion cues as well.

And in your quote: Dwell Control is a feature set to interact with an Apple Vision Pro using only your eyes. Lingering your gaze on a button will press it. An AVP is now more comfortable to use in more situations because of motion cues.

Maybe just rethink your "maybe just" comment...?

reply

upvote

by kridsdale113 hours ago|

[-]

Trains. Airplanes. TFA said vehicle, not car.

reply

upvote

by brookst14 hours ago|

[-]

Trains are a thing.

reply

upvote

by jclardy13 hours ago|

[-]

Planes? Trains? If you haven't used these motion dots, they actually do work wonders. My wife gets motion sickness and could barely ever look at her phone when riding as a passenger in the car, even just to type in directions. With the motion dots she does just fine.

reply

upvote

by 13 hours ago|

[-]

deleted

reply

upvote

by yreg14 hours ago|

[-]

>Maybe just don't wear them in a car?

Why not?

reply

upvote

by HDBaseT2 hours ago|

[-]

We probably do need laws to prevent people watching movies in an Apple Vision Pro whilst driving.

We have harsh laws on using phones whilst driving, a Vision Pro (if configured in a specific way) could entirely block your vision with a Movie or Show and this is dangerous.

reply

upvote

by throwaway13244813 hours ago|

[-]

Because the more we reject our shared reality and substitute it with each our own, the less humane we become.

reply

upvote

by 13 hours ago|

[-]

deleted

reply

upvote

by nozzlegear11 hours ago|

[-]

The AVP is primarily an AR device, not VR.

reply

upvote

by jkman13 hours ago|

[-]

God forbid a person rejects the shared reality of a boring 12 hour flight and substitutes it with their own. Some real deep thoughts here

reply

upvote

by throwaway13244813 hours ago|

[-]

I’ve met some very interesting people on flights. I’ve done some great work. I’ve had some great ideas.

Don’t be so scared of variety. You just keep subjecting yourself to more of the same. The unending familiarity makes you dull.

reply

upvote

by MYEUHD3 hours ago|

[-]

This feels like cart before the horse.

As of macOS 15 (and I don't think they fixed it in 26), you can only increase the font size of first-party apps on macOS.

The global font size setting doesn't apply to third-party apps, even those built using Apple's frameworks.

reply

upvote

by abhikul015 hours ago|

[-]

On-device video subtitles generation is exciting, should help with watching videos on mute. This seems like a low hanging fruit that should've already been grabbed by an app but I can't find any.

reply

upvote

by happyPersonR13 hours ago|

[-]

A lot of us forget it, but things like text to speech, subtitles etc are there for the differently abled

Without that, there wouldn’t really be great vlm and conversational models.

The AI companies might have paid for the dictation of some videos on their own but voice assistants etc wouldn’t have existed and our ability to have AI that eventually understands the world would be much much harder.

reply

upvote

by nonethewiser12 hours ago|

[-]

So we're blaming disabled people now.

reply

upvote

by happyPersonR12 hours ago|

[-]

lol I’m saying working on accessibility features has helped more than those of us that are sighted. Often times for a lot of us, it’s a drag and comes lower on the priority list, but without it AI, llms etc wouldn’t have the ability to programmatically understand the world.

You however…. Maybe need to switch to decaf?

reply

upvote

by latexr11 hours ago|

[-]

> A lot of us forget it, but things like text to speech, subtitles etc are there for the differently abled

They are there for everyone. You don’t need to have a permanent disability to benefit from accessibility features. A device designed to work one handed is useful to someone without an arm or a person with two arms who is holding a baby. Subtitles are useful to someone who can’t hear or someone lying to a sleeping spouse or in a noisy place.

“Accessibility needs can be permanent, temporary or situational.”

https://www.coursearc.com/accessibility-content-fundamentals...

reply

upvote

by randusername15 hours ago|

[-]

Accessibility features are such a great way to keep technology focused on real-world problems and real-world experiences.

I think the trap in creating anything is doing it for a crowd. Art, software, anything... it turns out better when it is made with a specific, named individual in-mind.

Accessibility features are almost always championed and field-tested with one specific loved one in mind and I think that's what keeps the technical solutions personable and grounded.

reply

upvote

by an_d_rew6 hours ago|

[-]

As someone slowly and idiopathically losing their hearing, and as someone just... getting older and losing visual acuity...

Thank you, Apple, for taking accessibility seriously and dedicating resources towards it.

I very much appreciate it, and the work of the entire accessibility team.

reply

upvote

by aucisson_masque6 hours ago|

[-]

Have you ever tried the accessibility feature on Android ? Would you recommend one or the other for your use case ?

And what about windows (if you use it) ?

I think that we should all be concerned by the accessibility feature, we never know what is going to happen in life.

reply

upvote

by an_d_rew6 hours ago|

[-]

I haven’t used Android or Windows in any meaningful way for years, so I cannot comment on them.

I can tell you that the hearing accommodations on the AirPod Pro 2/3 headphones brought literal tears to my eyes because of how fabulous it makes music sound for me.

This is a a LOT more work than just adding an equalizer because you have to do multiband real time compression and expansion, in relation to other frequency bands and respecting band-specific sound energy limits.

I know I might sound like I’m gushing, and I kinda’ am. They didn’t have to put in the time or energy to do that or maintain it and they did ... and for that, like I said, I am extraordinarily grateful.

reply

upvote

by nrmitchi12 hours ago|

[-]

These are great improvements, it's good to see Apple investing in improvements like this (especially with the Vision Pro) but I can't help but feel that they utility will remain very low until they make the Vision Pro look significantly less distopian than it does.

The form-factor is a significant issue for real-world usage, and it's kind of unclear if there is a plan for a future product line given its (pretty abysmal) initial receiption.

reply

upvote

by brokencode11 hours ago|

[-]

I don’t think abysmal is the right word. The hardware was widely praised except for being dorky looking and a few other complaints.

The price and lack of content and developer interest have been the main problems.

And ultimately, people just don’t seem that interested in this product category. Meta ran into the same issue, though at least they targeted gaming where there is a decent niche.

VR/AR tech seems cool and futuristic, but hasn’t quite found its killer app yet.

reply

upvote

by bigyabai11 hours ago|

[-]

Meta did sell over 20 million headsets. The Quest is definitely lower-margin hardware than Vision Pro, but in terms of install base that's an order of magnitude larger audience.

Apple really screwed themselves by only supporting WebXR for cross-platform VR experiences. Soon Valve will ship the Steam Frame, which will likely cost a fraction of the Vision Pro and support bog-standard PC games like H3VR, flight simulators and flatscreen PC titles. Meanwhile, AVP owners will have paid $3,500 for a more powerful chip/headset with a fraction of the content library and featureset that Valve and Meta offer. Vision Pro's lack of audience is entirely a self-imposed failure, it seems.

reply

upvote

by brokencode7 hours ago|

[-]

Yeah, the gaming market is a decent sized market. It’s not huge, though, and is not growing very fast.

It was a strategic mistake for Apple to not focus on gaming. But realistically, the AVP was always going to be way too expensive for basically anything.

Maybe if you could pick one up for like $800 and there was a lot of great 3D immersive content, it could take off. But even then, I feel like it’s just not a product category the average person is that excited about.

reply

upvote

by tempodox10 hours ago|

[-]

Generated subtitles for video does sound useful. Sometimes actors mumble so horribly, I don’t understand a word they’re saying.

reply

upvote

by dzhiurgis6 hours ago|

[-]

Wonder if it can translate it too.

My biggest gripe with Netflix is that they only have like 3 languages and no auto translation. And even bigger gripe is that it's because of the union racket. They apparently need to pay hundreds of thousands for something computers do for free. Insanity.

reply

upvote

by percentcer11 hours ago|

[-]

Unfortunate thumbnail on that embedded video

reply

upvote

by fckgw9 hours ago|

[-]

It's a picture of a blind person. The type of person these features are for.

reply

upvote

by sscaryterry11 hours ago|

[-]

Luckily the European Accessibility Act has pretty much made PDF/UA a requirement.

This should really be the last resort.

reply

upvote

by RobMurray11 hours ago|

[-]

Accessible PDFs are quite rare in reality. Especially if there are tables, graphics, maths, forms or anything more than plane text.

reply

upvote

by sscaryterry11 hours ago|

[-]

Agreed, it is a problem, but it is being legislated in as required for many businesses in the EU going forward.

reply

upvote

by dawnerd12 hours ago|

[-]

Didn’t they already have subtitle generation for uncaptioned video?

Edit: was thinking about this feature https://support.apple.com/guide/iphone/get-live-captions-of-...

reply

upvote

by gobdovan14 hours ago|

[-]

I'm not blind but I sometimes I can't process where things are, even if in front of me. Would be cool to just point to a messy table and see where the keys are. If they offer this as some Vision/Core ML feature, I'd implement the messy table app as soon as these features land. Probably already possible, but simpler if they release this.

reply

upvote

by abhinav-t12 hours ago|

[-]

These are pretty helpful features for differently abled people. I think it would be really cool if Apple made AI glasses that could communicate with the iPhone thus eliminating the need to point your phone at everything (especially, if you are moving outdoors or in a crowd).

reply

upvote

by asadotzler9 hours ago|

[-]

We use "disabled people" these days. Or, "people with disabilities." There's debate around person first or not, but I'll leave that to you all to read up one. Regardless of where you come down on DP vs PWD, "differently abled" is a thing of the past.

reply

upvote

by mistersquid14 hours ago|

[-]

> A new power wheelchair control feature leverages the precision eye-tracking system on Apple Vision Pro to offer a responsive input method for compatible alternative drive systems. [0]

The above caption for Apple Vision Pro is for a video that to me, as an Apple Vision Pro user, is discomforting.

More questions are raised than are answered by the short video: Is the user able to fit the Apple Vision Pro by him/herself? What happens when dwelling on a directional control misregisters? Can the user recalibrate the "Eyes and Hands" setting? Dwelling on a control displaces focus and there may be impeding objects in the path of the power wheelchair. Is this really a good idea?

To my sensibility, the video is unsettling (at best), especially given how cumbersome Apple Vision Pro is.

[0] https://www.apple.com/newsroom/2026/05/apple-unveils-new-acc...

reply

upvote

by jkman13 hours ago|

[-]

Your concerns are completely nonsensical. It's clearly being marketed as a healthcare tool for people with debilitating injuries that preclude the use of hand-powered wheelchair controls, severe situations where there's no neck-down control and users would be limited to controls like head-tilt or mouth actuated systems. These people obviously require daily care to simply get them out of bed and into the chair and back again every single day - their nurse could just put on their Vision Pro for them! This seems like an incredible leap forward for people in this situation, if they iterate on this and it gets better then this could be a very viable wheelchair control system in the future.

reply

upvote

by mistersquid6 hours ago|

[-]

> Your concerns are completely nonsensical.

With all due respect, my concerns are not nonsensical but borne of my daily use with Apple Vision Pro and my awareness of the limitations of dwell control.

Iterating on this idea with a device lighter than Apple Vision Pro and improvements to dwell control would likely be required before this could ship to larger populations of disabled users, but that is not what is depicted in the video.

My sense is that the possibility of an accessibility affordance with people who are severely disabled is driving opinions in this case more than the reality of what’s available.

To my mind, much of these AX announcements are reminiscent of the circumstance that led John Gruber to author “Something Is Rotten in the State of Cupertino”, which is that these are not shipping features but ones slated for “some time later this year”.

I’m a huge AX fan and work directly in the domain space, but something about that video in particular coupled with my near-daily use of Apple Vision Pro doesn’t feel right.

reply

upvote

by jaybeavers38 minutes ago|

[-]

You are correct, the driving controls in AVP don’t use dwell, that is the wrong (and dangerous) approach. They use something more akin to hover activation.

It’s the hardware I designed coupling the power wheelchair to the AVP, and I’ve driven it myself.

reply

upvote

by jkman1 hours ago|

[-]

That still doesn't make sense. "improvements... would likely be required before this could ship to larger populations" so what? Are they claiming that everyone everywhere should use this immediately?

"possibility of an accessibility affordance" what do you mean possibility, that is literally the case. Even if it's not perfect (which nothing truly is, obviously), it is undeniably a novel control system for its target audience.

"doesn’t feel right" So your point is simply that your subjective opinion is that it 'doesn't feel right'? What does that even mean? I'm not saying, and the announcement is not saying, that this is some platonic ideal of accessibility controls. Not sure what you are getting at at all.

reply

upvote

by halapro11 hours ago|

[-]

Are these already available? I regularly read these announcements and years later I still don't know where to find them, or are not actually functional.

reply

upvote

by dgllghr14 hours ago|

[-]

Putting aside the fact that no company should have direct access to anyone's brain, how cool would it be to be building toward VISOR (from TNG) instead of this. If we could translate sensor signals to the neural circuitry of the brain directly, we wouldn't even need an LLM in the mix. But to have it as an overlay, as supplementary data! With the ability to turn it off of course. (Would a person even be able to turn it off? In the same sense as whether someone can "turn off" social media?) If only we had meaningful human rights and institutions that really protected them... I still can't fully give up the techno-optimism that made me love tech in the first place (and TNG for that matter).

reply

upvote

by dagmx13 hours ago|

[-]

Brain Control Interface support was already announced last year and afaik is part of iOS already.

https://developer.apple.com/documentation/accessibility/brai...

reply

upvote

by 14 hours ago|

[-]

deleted

reply

upvote

by exitb15 hours ago|

[-]

As Apple shifts towards services and fancy software features, I wonder how do they expect to stay competitive by only releasing them for a subset of languages.

reply

upvote

by layer812 hours ago|

[-]

They roughly know how many of their users use a particular language.

reply

upvote

by 14 hours ago|

[-]

deleted

reply

upvote

by diogenescynic2 hours ago|

[-]

Cool, but why is Apple making the new iPhone 17e chips in Israel when Israel just coordinated a mass terrorist attack using pagers? I personally don't want my iPhone being used as a bomb if I say something to criticize Israel. Israel is the last place I want to have anything to do with my phone.

reply

upvote

by dzonga11 hours ago|

[-]

actual useful & impactful A.I features not the snake oil being sold daily.

reply

upvote

by celsoazevedo10 hours ago|

[-]

Nice. It would be good if Apple could find the time to improve the readability of the white text on green bubbles too.

edit: it seems that asking Apple to follow their own accessibility guidelines isn't popular on HN :-(

reply

upvote

by skiing_crawling9 hours ago|

[-]

They won't, its literally part of their sales funnel. They've specifically engineered a bad experience for anyone outside the ecosystem by making it all of their friend's problem too. Its very important for their stock price that text messages sent by non apple products are just slightly more difficult to read.

reply

upvote

by kps9 hours ago|

[-]

That wouldn't let them say “AI”.

reply

upvote

by OhMeadhbh8 hours ago|

[-]

I miss Apple during it's hey-day. There was a time when Apple was the sine qua non for #a11y and #hci. Then Steve came back.

reply

upvote

by cybercatgurrl2 hours ago|

[-]

this is exactly how you fight people’s notion that AI is bad. you change their lives with it. make it something indispensable for a subset of users so that being anti-AI is indefensible. for instance, “how dare you threaten people’s ability to navigate the world independently”

reply

upvote

by Almondsetat15 hours ago|

[-]

I have difficulty trusting this. There are plenty of videos online of LLMs making up stuff like "I just ate a hot dog, is there mustard around my mouth?" "No, everything is clean" while there is a big yellow stain om the guy's face

reply

upvote

by WarmWash14 hours ago|

[-]

The problem is using a language model to assess images.

Probably 80% of "LLM's are below expectation" complaints (from the general population) involves some form of image analyses.

Image tokenization is hard because unlike language tokenization, where every token is extremely dense with meaning, image tokens tends to be meaningless or irrelevant but are processed all the same.

Give an SOTA LLM a picture of toothpicks and ask it to move one to make a square, and it will probably struggle and fumble it. But give a mid-size LLM from 2 years ago the same problem in verbal form, and it will nail it almost every time.

That takeaway is, do everything you can to avoid having the LLM need to rely on images for the answer.

reply

upvote

by gruez13 hours ago|

[-]

I thought all the recent models are "multimodal"? Is the image part just sticking an image recognizer in front of the text model?

reply

upvote

by RobMurray11 hours ago|

[-]

Most of those videos are chatGPT voice mode, which still used gpt 4o last time I checked. it is far from SOTA.

reply

upvote

by postalrat14 hours ago|

[-]

Like coding, creating images or text, maybe the alternative of doing it yourself is too easy or enjoyable for you. Don't expect that will be true for everyone.

reply

upvote

by Almondsetat12 hours ago|

[-]

Did you reply to the wrong person? What are you even trying to say here?

reply

upvote

by postalrat1 hours ago|

[-]

You say you don't trust it but whats your alternative assuming you lost your vision?

reply

upvote

by seeeeebt14 hours ago|

[-]

Surely a blind person relies a lot on audio input?

reply

upvote

by isityettime14 hours ago|

[-]

Maybe on a smartphone, but usually not on a computer. Keyboards are pretty good.

The other thing is that if you're around others, voice input means you have no privacy. Even if you're not doing anything particularly private, it's a bit awkward and potentially embarrassing. If you use touch input in conjunction with a screen reader, you can be more like a "normal" user in that what you're doing is just between you and your phone.

reply

upvote

by asadotzler9 hours ago|

[-]

Audio input is far more commonly used by people with mobility difficulties. Imagine that your hands shake a lot or that you don't have limbs. That makes using a keyboard and mouse difficult or impossible and voice input can help. Blind users generally use keyboards for input, the typical ones you find on a PC but also sometimes the keys on their Braille display.

reply

upvote

by 11 hours ago|

[-]

deleted

reply

upvote

by 11 hours ago|

[-]

deleted

reply

upvote

by devinprater14 hours ago|

[-]

There's my dopamine hit for the year.

reply

upvote

by jansan15 hours ago|

[-]

Since Apple uses Gemini to power its AI, are those features actually powered by Google Gemini?

reply

upvote

by jjice15 hours ago|

[-]

They don't get, but they will be using Gemini derived models with iOS 27. For now it's all their own models.

reply

upvote

by k4rnaj1k15 hours ago|

[-]

[dead]

reply

upvote

by nikhilpareek1314 hours ago|

[-]

Most apps have terrible accessibility labels because developer don't bother, which breaks every screen reader pipeline downstream. The Voice Control "say what you see" feature routes around that by letting users describe a button in plain language. That's a real fix for a problem caused by humans being lazy about ally.

reply

upvote

by jrm-veris14 hours ago|

[-]

this is such a great use case for the technology

reply

upvote

by jmyeet9 hours ago|

[-]

Can Apple “unveil” Touch ID as a “new” accessibility feature because Face ID is an accessibility nightmare?

reply

upvote

by TZubiri11 hours ago|

[-]

I feel that if we build the UI for blind users first, we would get much more powerful systems, rather than building UI for the seeing users and then slapping a CV to text model of what's shown on the screens.

Did not test it yet, but blind users may be more prone to dominate Command Line Interfaces, which are becoming increasingly popular due to its easy integration with LLM

reply

upvote

by brador11 hours ago|

[-]

Her phone in the thumbnail has oval camera bumps. It is also extra long. Mine has round camera bumps. Is that a new iphone?

reply

upvote

by tonymet11 hours ago|

[-]

Kudos to apple for providing some of the best accessibility features across their devices. I’ve always appreciated the consistency of reduce transparency, increased contrast, reduced motion, reduced white point, touch areas, color blindness support. And they work well across third party apps. That demands a lot of effort on the API and UI framework to have broad support for something that is mostly a non-sellable feature.

reply

upvote

by MagicMoonlight14 hours ago|

[-]

And this is why androidlets will never win. They’re too busy selling your data to ever think of disabled people or usability.

iOS is just painfully good. I can pause a video, put my finger on text inside the video, and copy it. Until they added it, I didn’t even know how much I needed that.

reply

upvote

by lern_too_spel2 hours ago|

[-]

Live captions has existed for more than 6 years on Android. https://ai.googleblog.com/2019/10/on-device-captioning-with-...

Selecting text from anything displayed on the screen for more than 7 years. https://www.thurrott.com/mobile/android/165834/android-9-pie...

I guess if you knew people using Android, you would have known you needed it 7 years ago?

reply

upvote

by f33d517314 hours ago|

[-]

Until they added it, you didn't need it, then suddenly a phone was unusable without it.

reply

upvote

by NicuCalcea13 hours ago|

[-]

I can do that on my Pixel 6.

reply

upvote

by LocalPCGuy12 hours ago|

[-]

Features don't exist until Apple "invents" them /sarcasm

reply

upvote

by baxuz14 hours ago|

[-]

Now we know why the new AirPods will have cameras!

reply

upvote

by tekacs15 hours ago|

[-]

I'm super glad that they're doing this, but once again unexcited for another decade of Apple self-privileging on this stuff so they're the only ones allowed to touch or improve any of this surface, or UX outside an app's tiny box.

People talk a lot about how MacOS has gone downhill but I feel like it would have been a good start if developers could continue to patch over Apple's shortcomings like they used to be able to.

I imagine that we would be a few years into a spectrum of tools like this if they didn't lock it down like they do.

Totally aware that plenty of HN commenters are very glad that Apple keeps this locked down. I'm just the other opinion, that's all.

reply

upvote

by testfrequency14 hours ago|

[-]

I don’t want to discredit more advancements in accessibility, but this feels like accessibility porn.

I have fond memories of an old coworker 10 years ago who is blind. He would use his phone no problem, texting, going about his day, he was even on Tinder (credit to Tinder for making their app so accessible long ago). He would commute on his own, walk to the train station, even transfer to another train during peak rush hour. I’m not saying it was all easy for him, but nothing in this video really stood out to me more than what shirt was on the bed. I know other services/apps have long existed to be the “eyes” for people who need support, but this video feels….uneventful?

I may be cynical about this though, as I often hate how Apple’s marketing makes these emotional bids about how life-critical they are to society - which is fair to a degree..but it just feels cheap to be glamorising “look! we saved this person from pending doom, cool right??”

reply

upvote

by RobMurray11 hours ago|

[-]

for every person like your coworker, there are probably several who have a much harder time with technology and who would benefit from a simpler interface.

If this includes improvements to the screen recognition feature in Voice Over, it could provide accessibility for apps where the developer doesn't care about accessibility, which is extremely common.

The vision capabilities could be useful if they are done well, but I suspect that will always be covered better by 3rd party apps.

reply

upvote

by lwkl14 hours ago|

[-]

I mean even if it is marketing for them they still did the work and developed these features. I had some vision issues recently and was glad there were options to make text more legible to me.

Additionally I don't believe this is just marketing. This is adaption to a changing market. Apple's customer base is aging and having these kinds of features will allow them to keep using Apple products for a longer.

reply

upvote

by testfrequency14 hours ago|

[-]

They have done the work, but I don’t see much work that’s beyond what’s been previously capable without Apple Intelligence. The marketing of Apple Intelligence is weak here, not the foundational abilities.

reply