undefined

points

[-]

How do you handle the problem of AI misleading by design? For example, Claude already lies on a regular basis specifically (and quite convincingly) in this case, in attempts to convince that what is actually broken isn't such a big deal after all or similar.

How can this product possibly improve the status quo of AI constantly, without end, trying to 'squeak things by' during any and all human and automated review processes? That is, you are giving the AI which already cheats like hell a massive finger on the scale to cheat harder. How does this not immediately make all related problems worse?

The bulk of difficulty in reviewing AI outputs is escaping the framing they never stop trying to apply. It's never just some code. It's always some code that is 'supposed to look like something', alongside a ton of convincing prose promising that it _really_ does do that thing and a bunch of reasons why checking the specific things that would tell you it doesn't isn't something you should do (hiding evidence, etc).

99% of the problem is that the AI already has too much control over presentation when it is motivated about the result of eval. How does giving AI more tools to frame things in a narrative form of its choice and telling you what to look at help? I'm at a loss.

The quantity of code has never been a problem. Or prose. It's that all of it is engineered to mislead / hide things in ways that require a ton of effort to detect. You can't trust it and there's no equivalent of a social cost of 'being caught bullshitting' like you have with real human coworkers. This product seems like it takes that problem and turns the dial to 11.

by cpan222 hours ago|

parent|

[-]

Thanks for sharing this, I do agree with a lot of what you said especially around trust around what its actually telling you

For me, I only run into problems of an agent misleading/lying to me when working on a large feature, where the agent has strong incentive to lie and pretend like the work is done. However, there doesn't seem to be this same incentive for a completely separate agent that is just generating a narrative of a pull request. Would love to hear what you think