undefined

points

[-]

Idk about the health story, but in my use, ChatGPT has dramatically improved my understanding of my health issues and given sound and careful advice.

The second question sounds like a useless and artificial metric to judge on. The average person might miss such a “gotcha” logical quiz too, for the same reason - because they expect to be asked “is it walking distance.”

No one has ever relied on anyone else’s judgment, nor an AI, to answer “should I bring my car to the carwash.” Same for the ol’ “how many rocks shall I eat?” that people got the AI Overview tricked with.

I’m not saying anything categorically “is AGI” but by relying on jokes like this you’re lying to yourself about what’s relevant.

by hermanzegerman41 minutes ago|

parent|

[-]

It gave dangerous shitty advice to patients in critical conditions

https://www.bmj.com/content/392/bmj.s438

by bhouston1 hours ago|

prev|

[-]

I would accuse you of nitpicking. My experience is that LLMs are generally as smart as the average human +90% of the time. A lack of perfect to me doesn't mean it isn't AGI.

by phkahler1 hours ago|

parent|

[-]

>> My experience is that LLMs are generally as smart as the average human +90% of the time. A lack of perfect to me doesn't mean it isn't AGI.

In my experience, they contain more information than any human but they are actually quite stupid. Reasoning is not something they do well at all. But even if I skip that, they can not learn. Inference is separate from training, so they can not learn new things other than trying to work with words in a context window, and even then they will only be able to mimic rather than extrapolate anything new.

It's not the lack of perfect, it's the lack of reasoning and learning.

by bhouston1 hours ago|

parent|

[-]

I 100% agree that learning is missing. We make up for it in SKILLS.md and README.md files and RAGs of various types. And we train the LLMs to deal with these structures.

I've seen a lot of reasoning in the latest models while engaging in agentic coding. It is often decent at debugging and experimentational, but around 30% it goes does wrong paths and just adds unnecessary complexity via misdiagnoses.