upvote
It does poorly on creative concepts as well.

I attempted to explore the works of Kinoko Nasu/TYPE-MOON through its characters and the relationships across works and it was mostly nonsense. Sure it had some broad relations correct, but it presented a tiny set of meaningful characters and only attempted to touch Fate/Stay-Night and Tsukihime.

Even more damning was that it produced garbled text for a few of the textual representations and often even if the lettering was clean, the grammar was off.

reply
To be fair, disentangling even just the Fate series is nearly impossible even for humans
reply
Now that you mention it, i didn't try "Metal Gear". Now that would be a ride.
reply
Do we ever simply accept that LLMs weren't made for this kind of detail-oriented work? I can't imagine something like this ever being anything other than a toy which can't be trusted.

Will Silicon Valley executives ever accept this reality? If we acquiesce and admit that LLMs are a good tool for prototyping and boilerplate-reduction, but not finished products-- is that when the bubble finally bursts?

reply
I think the unfortunate fact is that most jobs in the world do not require accuracy, so an inaccurate result has a negligible impact over an accurate one.

I used to feel job safety in the knowledge that AI labs weren't likely to solve the hallucination problem. Then it dawned on me that they don't need to — they just need to reduce our collective expectations.

reply
I had a tab on nuclear reactors open and so typed in "Pressurized Water Reactor" and the result while very visually appealing is completely nonsensical (connected the high/low pressure coolant loops together) and would definitely explode.

https://imgur.com/a/DEb3oD4

reply
I also replied because I asked it about a Mac Pro case I had right in front of me. Mostly right words, totally wrong visuals. And while I see what you mean by 'story of LLMs', I ask LLMs about things I know often, and for the last 12 months theyve been pretty dang accurate. This ai visual example is the strongest 'its just guessing' Ive seen in years. For a demo, pretty cool still though. Not sure why OP exaggerated, or simply doesnt know his car as well as he thinks he does.
reply
Does it make sense that maybe it has a model of the vehicle it can pull from its corpus wholesale but then the “guess the next letter” portion takes over for labeling and just guesses poorly?
reply