undefined

points

by bigstrat200324 minutes ago|

[-]

Yeah this is exactly my view. We've had several years of work on the tech, and LLMs are just as prone to randomly spitting out garbage as they were the first day. They are not a tool which is fit for any serious work, because you need to be able to rely on your tools. A tool which is sometimes good and sometimes bad is worse than having no tool at all.

by dweinus2 hours ago|

prev|

[-]

I think there is a second reason people still type, and it's relevant to LLMs. Typing forces you to slow down and choose your words. When you want to edit, you are already typing, so it doesn't break the flow. In short, it has a fit to the work that speech-to-text doesn't.

LLMs create a new workflow wherever they are employed. Even if capable, that is not always a more desirable/efficient experience.

by sadeshmukh1 hours ago|

prev|

[-]

I type faster than I think, and being able to edit gives the edge over text to speech. I don't believe this is a fundamentally comparable analogy.

by johnfn2 hours ago|

prev|

[-]

I'm curious about the statement that hallucinations are "fundamentally unsolvable". I don't think an AI agent has left a hallucination in my code - by which I mean a reference to something which doesn't exist at all - in many months. I have had great luck driving hallucinations to effectively 0% by using a language with static typechecking, telling LLMs to iterate on type errors until there are none left, and of course having a robust unit and e2e test suite. I mean, sure, I run into other problems -- it does make logic errors at some rate, but those I would hardly categorize the same as hallucinations.

by bojan1 hours ago|

parent|

[-]

Maybe you're lucky. I had Opus 4.6 hallucinate a non-existing configuration key in a well known framework literally a few hours ago.

Granted, it fixed the problem in the very next prompt.

by johnfn57 minutes ago|

parent|

[-]

Couldn’t that problem be solved with static typechecking?

by bogzz1 hours ago|

parent|

prev|

[-]

ChatGPT 5.2 kept gaslighting me yesterday telling me that LLMs were explainable with Shapley values, and it kept referencing papers which talk about LLMs, and about SHAP, but talk about LLMs being used to explain the SHAP values of other ML models.

I encounter stuff like this every week, I don't know how you don't. I suppose a well-structured codebase in a statically typed language might not provide as much of a surface for hallucinations to present themselves? But like you say, logical problems of course still occur.

by johnfn56 minutes ago|

parent|

[-]

I mean to say that code generation never hallucinates. I suppose that was unclear.

by 56 minutes ago|

parent|

prev|

[-]

deleted

by gambiting1 hours ago|

parent|

prev|

[-]

>> I don't think an AI agent has left a hallucination in my code

I literally just went on Gemini, latest and best model and asked it "hey can you give me the best prices for 12TB hard drives available with the British retailer CeX?" and it went "sure, I just checked their live stock and here they are:". Every single one was made up. I pointed it out, it said sorry, I just checked again, here they are, definitely 100% correct now. Again, all of them were made up. This repeated a few times, I accused it of lying, then it went "you're right, I don't actually have the ability to check, so I just used products and values closest to what they should have in stock".

So yeah, hallucinations are still very much there and still very much feeding people garbage.

Not to mention I'm a part of multiple FB groups for car enthusiasts and the amount of AI misinformation that we have to correct daily is just staggering. I'm not talking political stuff - just people copy pasting responses from AI which confidently state that feature X exists or works in a certain way, where in reality it has never existed at all.

by johnfn4 minutes ago|

parent|

[-]

My comment was about code, not fact checking - that’s why I said they were a solved problem provided you use static typechecking and tests.