I experience this mostly when asking for music. Before gemini, mistakes were common but deterministic. It was easy to understand where the query had gone wrong and so how to fix it. Example:
"Hey google, play Blackstar"
(Plays the album blackstar by David Bowie, not what I wanted)
"Hey google, play "Blackstar by Radiohead"
(Plays the right thing).
Now:
"Hey Google, play Blackstar by Radiohead" can result in playing... something vaguely semantically related with no way to course correct. In this exact instance (happened yesterday!) it played an album by the hip hop due Black Star.
I will admit that there are some superpowers hidden in Gemini that were not present in the previous AI assistant. I recently discovered that Gemini can manipulate the navigation app, and a prompt like "Mute alerts" works, which is kind of cool. However like OP said, it's incredibly verbose, which is super annoying.
"Hey Google, call my wife"
------> Immediately calls wife
With Gemini: "Hey Google, call my wife"
--50%-> "Is that XXXXXX?"
--25%-> Immediately calls wife
--20%-> "I'm sorry, I couldn't find a contact saved as 'wife'"
--05%-> Immediately calls someone different from my wifeThat said I feel like no matter the music service it's a 50/50 shot each time if it plays the song I want. No matter how many times before I've played it or asked for it.
The latest Gemini update for Android Auto also absolutely ruined the voice control for Waze. It was already bad, but now Android Auto is basically unusable by voice.
PMs keep trying to make them "smarter," and it just makes the core user journeys worse.
Surely they think they're inventing cars when we're griping about buggy whips. But it really feels like voice assistants peaked ~10y ago for the things people actually want them for.
My wife and I both would love for the voice assistants to do more. They just won't. Even with weather, anything more than "what's the weather like today" will usually not get a good result. "When will it rain today?" gets OK responses.
As soon as I figure out how to put a decent GPU into my old rack server(s), I'll see how far I can get with HomeAssistant. I suspect it'll be some effort, but it'll be better at the end of the day.
I use a timer every day to brew my coffee. With a voice assistant I can set a timer, but with the lack of a screen I can't see how much time is left. One day I thought, "I'm going to finally get around to digging into this voice interface and see what the options are," hoping for something like, "Hey Dingus, set a five minute timer and notify me when there's 10 seconds left."
Or better yet, "Hey Dingus, five minute timer, with notify at 10 seconds."
Notice that this almost maps 1:1 onto a shell command with option flags, just verbally interfaced: "$ timer 5m --with-notify 10s"
Notice also a complete willingness on my part to learn how the thing works and change how I'm using it accordingly. This is the opposite of end user laziness. I'm willing to invest effort in becoming a "Power User" of my voice assistant.
So I looked for documentation, ready to read and use my brain to understand it and do what it tells me in order to start and stop my timers with greater proficiency.
...I found none.
Literally, there isn't any. They don't have documentation. Nowhere is there, even for someone motivated and willing to learn, the ability to do so.
Ok, well that is understandable if these things are changing rapidly. Maybe there's the equivalent to a "$ timer --help" baked into the assistant itself. Maybe it can tell me what options exist so I can use them. I ask the assistant to explain itself. "What are my options when setting a timer?"
It can't parse my question. Literally, it doesn't know what I'm asking for. Because nobody ever considered that a user might ask that question or even want to know that information.
At that point, on the spot, I gave up. Clearly this thing is not designed or intended to be a thing that one could gain skill with. It's an utterly unserious product.
I would very happily learn an entire verbal DSL, a whole pidgin dialect of English, purely for interacting with my voice assistant. "Hey Dingus, five minute staged timer: thirty seconds, two minutes, one minute, one minute, remainder, with countdown from five seconds" is not "natural language" anymore. But you can bet you'd hear me saying it, if that's all it took to make the voice assistant run my coffee brewing recipe with nothing but my voice. And then, hey bonus, let me bind that to a personal shortcut so all I need to say is "Hey Dingus, coffee timer" and I don't even need to reach for my phone.
But you can't do that. It literally does not support being taken seriously. Or, if it can, the design is utterly hostile to me discovering how. So I've never even tried to do anything, since that very day, other than turning smart lights on and off.
My point is: I didn't fail the technology. I came to the table with willingness to tinker and experiment, willingness to change my expectations to suit the design as I discovered it, willingness to work to make it a part of my routine. The technology failed me.
Pre-gemini, you knew what you would get, basically the structured snippets that would appear at the top of the search results.
Now it's much more verbose.
My biggest gripe is that it basically stopped listening to me, since "upgrading" to Gemini, which is frustrating because I've used it to control the Hue lights for the past decade.
It listens to my partner though, so after it fails to listen to me, I have to ask her to ask Google to adjust the fing lights.
Welp
It also now seems to trigger its own barge-in about 50% of the time. It'll start reading the first syllable of a message, apparently confuse itself talking for me saying something, then just follow that with silence "listening" for my response until I physically have to hit the back button on the car.
Maybe you are saying "the whole internet is like that now, it's impossible to find good sites without obnoxious ads", but I don't think it's that bad yet (hacker news is a good counterexample). But if everyone keep visiting user hostile sites, the site operators will see no incentive to change.
Of course there are ways around this, but at that point I don’t bother which such blogs anymore. It is a bit ironic given the subject of this blog post.
I used Firefox on Android for probably 11 years before switching to Dolphin (RIP) and then stock Chrome when Firefox made it a huge pain to install extensions. I keep waiting for someone to fully enable extensions on Chrome for Android.
Lucky me :)
> Not because it was smart.
> Because it was useful.
I was half expecting "and that's bold" after that.
Dear lord.
I do not care about whatever stupid feature you want to build engagement around. Do what I asked you to do and then shut up.
And don't get me started on the Alexa Show that had the audacity to display ads.
When your options are a few competing BigCos and you don’t have incentive to try to build it for yourself because it straddles the annoying in-between space of “frustrating enough to do something” and “ not frustrating or valuable enough to actually solve the problem.”
Do you have a better recommendation of a smart speaker that can play spotify, youtube music, or tidal?
> The closest I could get to punching Google in the ear and ripping out its nose hairs.
As if giving more money to another shitty megacorp for another frustrating device is going to make any difference to the first shitty megacorp.
Companies don't care when nerds complain. We always complain. But when their normal user base starts jumping ship, then they could very well start listening.
I agree. I wasn't smart but it was useful in certain cases. Now it's just lobotomized.
A huge pet peeve of mine is when I’m in the car and want to know what song is playing on the radio. I run Shazam and my phone mutes the stereo to activate a microphone. I have to disconnect from CarPlay then run Shazam, then reconnect — it’s a passenger only operation.
Song recognition is built into both iOS and Android, the device should always use the internal mic instead of a CarPlay/Android Auto microphone over Bluetooth.
Side note: is there a good “dumb smart speaker” I can have run with a wake word connected to my own API? Speech to Text and Speech to Speech are fairly well supported for local AI workflows now, it would be great to have my own Home device without worrying about where the audio goes.
I’m sure it’s a very niche audience today, but I imagine giving this thing MCP for Wikipedia, a music app, and my recipes would be perfect.
https://futurism.com/artificial-intelligence/google-ceo-sund...
I know nobody did, but seeing as I was too young (and maybe not even alive?) I have always wanted to try it. I'm a Coca Cola enthusiast after all. I wish they'd release a "Throwback Experimental Coke" batch out. I assume it was their attempt to flavor coke without the coca leaves?
No idea how I managed to gaslight myself into liking Google products this long. I guess one just gets used to the overall brokenness, daily feature flag changes and the feeling of every interaction with their stuff being stored and analyzed forever.
I was amazed at my own level of anger at that. It was just a voice in my ears but I reacted to it viscerally like it was an assault. It didn't help that it was in the middle of a sequence of it telling one lie after another, like "yes, I can disconnect this conversation." Maybe what I had is a natural reaction to having a lying clueless asshole refuse to go away or shut up, which I haven't otherwise had to deal with lately.
The interface into the LLM is tokens in and out (text, images, audio). And the harness generally doesn't understand what you're passing in. The LLM has nothing to do other than to respond with tokens and empty responses (eg. just a stop token) have been aggressively trained out of it.
From the examples given I haven't seen any meaningful life improvement with them.
I'm just waiting for Google to send the email with the title like "An update about Assistant's future" to make the clock e-waste. On the phone I tried Gemini but that's slower because it needs a brain for simple text parsing..
perhaps, but people did ask for cheaper cars.
But more importantly... screens were put in more expensive cars first, and slowly trickled down to budget cars. It's a very weak argument that it was done for cost reasons. Screens are flashy and impress people during their 5 minute test drive. "Wow! Think of all the things I can do in my car that I couldn't do with a knob for changing fan speed." Sure, living with those screens tends to be a bit less enjoyable than those first impressions lead you to believe, but bright colorful animated screens helped to sell cars. If they're actually less expensive than knobs and buttons, that's just a bonus for the manufacturers.
Also keep in mind, when screens first appeared in (expensive) cars, they weren't actually cheaper than the knobs and buttons they replaced. Technology is, sorry was getting cheaper per unit of performance over time. Screens became commonplace and inexpensive to put in cars, but I suspect they were ten times more expensive than all the knobs and levers they replaced when they started appearing in luxury cars.
Thank you!
No…, you are.
(Although, based on the tone, I think it's Grok.)
Having used Google home assistant since it came out for all the things that it's good for, and watched its quality fluctuate, I find I really have to be more careful nowadays when I ask it questions because it can go overboard more easily.
Is there a term in AI research where it underappreciates the specificity that is being asked for?
Perhaps the AI could be default prompted, "you are a kitchen AI assistant and should tend to answer in facts and details and that are relevant to the current moment."
So then I have to stop her mid-sentence and say, "Alexa, stop." Only for her to misunderstand what I said and say, "Now playing 'On the Rocks' by [artist name]."
I'm in the process of going as analog as I can.
A DSLR camera is better than a phones camera, a voice assisting device seems replaceable however.
a) because some of them have semi-decent speakers for music
b) because it's nice not to have to pull out your phone (especially while cooking)
P.S. I hope that dehydration/headaches question was a poorly chosen example and not something someone over the age of 5 seriously needs an answer to.
Related from the last month:
Google changes its search box
https://news.ycombinator.com/item?id=48197370
Google Declaring War on the Web
https://news.ycombinator.com/item?id=48214449
Search engines alternatives now that Google isn't Google anymore
https://news.ycombinator.com/item?id=48266051
Google Hates You
https://news.ycombinator.com/item?id=48313538
You can no longer Google the word 'disregard'
https://news.ycombinator.com/item?id=48238351
The IBM-ification of Google?
https://www.thedrive.com/news/bmw-commits-to-subscriptions-e...
https://www.theverge.com/2022/7/12/23204950/bmw-subscription...
They rolled it back though afaik because the whole idea was a comically evil idea.
Karen, you mean you don't want those things. Stop confusing want and need.
- The Manager.