undefined

points

[-]

I was thinking about a self verification method on this principle, lately. Any specific-enough claim, e.g. „the Marathon crater was discovered by …” can be reformulated as a Jeopardy-style prompt. „This crater was discovered by …” and you can see a failure to match. You need some raw intelligence to break it down though.

by Night_Thastus3 days ago|

prev|

[-]

Without checking every answer it gives back to make sure it's factual, you may be ingesting tons of bullshit answers.

In this particular answer model A may get it wrong and model B may get it right, but that can be reversed for another question.

What do you do at that point? Pay to use all of them and find what's common in the answers? That won't work if most of them are wrong, like for this example.

If you're going to have to fact check everything anyways...why bother using them in the first place?

by CamperBob23 days ago|

parent|

[-]

If you're going to have to fact check everything anyways...why bother using them in the first place?

"If you're going to have to put gas in the tank, change the oil, and deal with gloves and hearing protection, why bother using a chain saw in the first place?"

Tool use is something humans are good at, but it's rarely trivial to master, and not all humans are equally good at it. There's nothing new under that particular sun.

by Night_Thastus3 days ago|

parent|

[-]

The difference is consistency. You can read a manual and know exactly how to oil and refill the tank on a chainsaw. You can inspect the blades to see if they are worn. You can listen to it and hear how it runs. If a part goes bad, you can easily replace it. If it's having troubles, it will be obvious - it will simply stop working - cutting wood more slowly or not at all.

The situation with an LLM is completely different. There's no way to tell that it has a wrong answer - aside from looking for the answer elsewhere which defeats its purpose. It'd be like using a chainsaw all day and not knowing how much wood you cut, or if it just stopped working in the middle of the day.

And even if you KNOW it has a wrong answer (in which case, why are you using it?), there's no clear way to 'fix' it. You can jiggle the prompt around, but that's not consistent or reliable. It may work for that prompt, but that won't help you with any subsequent ones.

by CamperBob23 days ago|

parent|

[-]

The thing is, nothing you've said is untrue for any search engine or user-driven web site. Only a reckless moron would paste code they find on Stack Overflow or Github into their project without at least looking it over. Same with code written by LLMs. The difference is, just as the LLM can write unit tests to help you deal with uncertainty, it can also cross-check the output of other LLMs.

You have to be careful when working with powerful tools. These tools are powerful enough to wreck your career as quickly as a chain saw can send you to the ER, so... have fun and be careful.

by skydhash3 days ago|

parent|

[-]

The nice thing about SO and Github is that there's little to no reason there for things to not work, at least in the context where you found the code. The steps are getting the context, assuming it's true based on various indicators (mostly reputation) and then continuing on with understanding the snippet.

But with LLMs, every word is a probability factor. Assuming the first paragraph is true has no impact on the rest.