upvote
I asked it about designing a 12 V solar system for a garden shed and it got everything but the broadest of strokes wrong. It figured out there should be a solar panel, a solar charge controller, a battery and some loads, but the wiring was non-sensical and when I drilled in on the solar charge controller settings etc. it completely fell apart. Absolute non-starter for any information you plan on depending on, but good entertainment value and impressive execution.
reply
I have an old door on the back yard, been planning to make a bike shelter this week so I asked it to make me a plan. It drew a regular shed with an "upcycled door". But no sign of where a bike should fit into it. No bike would ever fit in that thing, and the only structure it showed how to construct didn't resemble the actual finished thing.

Like every other AI demo I've tried ever, impressive on the surface, but the system fundamentally doesn't understand what it is doing

reply
This is great, AI freeing us from bikeshedding.
reply
I decided to test it out myself.

Went to the website, typed in "Jeep Wrangler JK engine bay with components labeled" (Since I'm intimately familiar with JK engine bays). Seems like a pretty analogous test to what you did, if anything an even easier test.

Let's see what we get .. a very nice looking diagram of a wrangler engine bay with components labeled, looks good.

But wait ..

- The brake fluid reservoir is on the wrong side of the engine bay

- Where the brake fluid reservoir is, it's labeled as the coolant overflow tank, and while the actual coolant overflow tank does exist in the diagram, it has no label.

- The battery is on the wrong side of the engine bay.

- The top of the front grill is labeled as the "oil filter cap".

- The oil fill cap is in the wrong place.

- Half of the battery is labeled as the fuse box, when the fuse box is correctly shown, but unlabeled, on the other side of the engine bay.

- It shows two different windshield washer reservoirs next to each other.

I could keep going on ...

Now I tried clicking on the incorrectly labeled coolant overflow reservoir and it switches to a new page which now shows a completely different looking coolant overflow, but now it's at least located in the correct place in the engine bay.

But of course it doesn't look remotely like the actual coolant overflow container. It also shows the radiator cap as on the top of the coolant reservoir, when in reality it is very much on the top of the radiator itself.

Like .. I can find fault with every aspect of it. But of course, if you didn't actually know much about the topic it'd all look fairly believable. The story of LLMs basically.

reply
It does poorly on creative concepts as well.

I attempted to explore the works of Kinoko Nasu/TYPE-MOON through its characters and the relationships across works and it was mostly nonsense. Sure it had some broad relations correct, but it presented a tiny set of meaningful characters and only attempted to touch Fate/Stay-Night and Tsukihime.

Even more damning was that it produced garbled text for a few of the textual representations and often even if the lettering was clean, the grammar was off.

reply
To be fair, disentangling even just the Fate series is nearly impossible even for humans
reply
Now that you mention it, i didn't try "Metal Gear". Now that would be a ride.
reply
Do we ever simply accept that LLMs weren't made for this kind of detail-oriented work? I can't imagine something like this ever being anything other than a toy which can't be trusted.

Will Silicon Valley executives ever accept this reality? If we acquiesce and admit that LLMs are a good tool for prototyping and boilerplate-reduction, but not finished products-- is that when the bubble finally bursts?

reply
I think the unfortunate fact is that most jobs in the world do not require accuracy, so an inaccurate result has a negligible impact over an accurate one.

I used to feel job safety in the knowledge that AI labs weren't likely to solve the hallucination problem. Then it dawned on me that they don't need to — they just need to reduce our collective expectations.

reply
I had a tab on nuclear reactors open and so typed in "Pressurized Water Reactor" and the result while very visually appealing is completely nonsensical (connected the high/low pressure coolant loops together) and would definitely explode.

https://imgur.com/a/DEb3oD4

reply
I also replied because I asked it about a Mac Pro case I had right in front of me. Mostly right words, totally wrong visuals. And while I see what you mean by 'story of LLMs', I ask LLMs about things I know often, and for the last 12 months theyve been pretty dang accurate. This ai visual example is the strongest 'its just guessing' Ive seen in years. For a demo, pretty cool still though. Not sure why OP exaggerated, or simply doesnt know his car as well as he thinks he does.
reply
Does it make sense that maybe it has a model of the vehicle it can pull from its corpus wholesale but then the “guess the next letter” portion takes over for labeling and just guesses poorly?
reply
I have a Mac Pro 5,1 taken apart on my desk right in front of me. I asked it for a diagram of the 5,1 internals. While it was MacProish looking, it was wrong about every visual element. The text fields were right at first glace. Every click I did was basically all wrong too. Visually it looked cool, but actually the first time Ive seen AI be wrong constantly since maybe 2023.
reply
I queried "your mom" and it created a historical social timeline of motherhood superimposed with a placenta. I approve
reply
Since ecco the dolphin just had two remasters and a new game announced, I decided to ask for it to show me a map of the first stage of tides of time. Should be easy, it just has to search for it and then generate something off of it. The stage is mostly empty too, just an open area, then a large opening with an upward current that leads to a separate bay with a warp ring. Three spaces, some dolphins and a circle.

It did a diagram that has absolutely nothing to do with the actual stage, not even close. And tells me a complete whole slew of completely wrong information. It shows pod of dolphins that teach you to dash attack (you know it by default). It shows a power sonar crystal (the sonar is a default ability, there is a "power" sonar I guess, but it is not obtained from crystals, and while the game features crystals, there are none in the game until level 3 and they look nothing like the diagram's). It shows air pockets... which are just bubbles (In the game, there are actually air refilling bubbles, but air pockets would refer to a small bit of open air in an underwater tunnel, like, the actual, you know, real life geological feature.)There are some medusas far off in the background in the image (They're yellow. The ones in the game are clear. They are also not present until later levels). An exit cave leads to the sea of silence (An actual stage. Wrong game.). A random cave says "Health source" (???? You do heal by eating fish but???). There is no warp ring.

So basically, the ONLY correct elements in the diagram are the presence of dolphins and the fact the diagram is labeled "Home Bay". Every single other element on this is wrong and would be wrong for all iterations of the Home Bay.

For a visual search tool, this sucks at visuals.

reply
Interesting! To join the cavalcade of others sharing their experiences:

I first asked it "how big are geckos". It gave me a cool comparison diagram between three gecko extremes (leachianus, Jaragua dwarf gecko, and leopard gecko, if curious). Info all looked correct. Drilling into the Jaragua brought me to a less-impressive page with utter gibberish text and duplicated info boxes. So it goes. I drilled further, but they were more esoteric topics I'm less versed on (lamellar setae), I can't evaluate the accuracy without further research.

I also gave it something broader: "tokay gecko". More duplicate info boxes, and for some reason it "drew" two geckos on top of each other. Kind of cute, but tokays are extremely territorial, so happy cohabitation isn't their default (though it's not unheard of).

Still, despite the issues, I thought it was very neat.

reply