undefined

points

[-]

yesterday I had to tell a frontier model to translate my code to tla+ to find a tricky cache invalidation bug which nothing could find - gpt 5.4, gemini 3.1, opus 4.6 all failed. translation took maybe 5 mins, the bug was found in seconds, total time to fix from idea to commit - about 15 mins.

if you can get a model to quickly translate a relevant subset of your code to lean to find tricky bugs and map lean fixes back to your codebase space, you've got yourself a huge unlock. (spoiler alert: you basically can, today)

by refulgentis3 hours ago|

parent|

[-]

Thanks for following up on this: I was really surprised by how much air this paeon to, idk, TDD, took out of the comments by getting off-topic.

Before you commented, I started poking at what you described for 15 minutes, then forget about it and fell asleep. Now I remembered, and I know it's viable and IIUC it's almost certainly going to make a big difference in my work practice moving forward. Cheers.

by rigorclaw7 hours ago|

parent|

prev|

[-]

[flagged]

by Paracompact18 hours ago|

prev|

[-]

I don't think he's referring to Lean specifically, but any sort of executable testing methodology. It removes the human in the loop in the confidence assurance story, or at least greatly reduces their labor. You cannot ever get such assurance just by saying, "Well this model seems really smart to me!" At best, you would wind up with AI-Jim.

(One way Lean or Rocq could help you directly, though, would be if you coded your program in it and then compiled it to C via their built-in support for it. Such is very difficult at the moment, however, and in the industry is mostly reserved for low-level, high-consequence systems.)

by trenchgun15 hours ago|

parent|

[-]

>Such is very difficult at the moment

What do you mean? It's a nice and simple language. Way easier to get started than OCaml or Haskell for example. And LLMs write programs in Lean4 with ease as well. Only issue is that there are not as many libraries (for software, for math proofs there is plenty).

But for example I worked with Claude Code and implemented a shell + most of unix coreutils in like a couple of hours. Claude did some simple proofs as well, but that part is obvs harder. But when the program is already in Lean4, you can start moving up the verification ladder up piece by piece.

by cjfd13 hours ago|

parent|

[-]

Well, if you do not need to care about performance everything can be extremely simple indeed. Let me show you some data structure in coq/rocq while switching off notations and diplaying low level content.

Require Import String.

Definition hello: string := "Hello world!".

Print hello.

hello = String (Ascii.Ascii false false false true false false true false) (String (Ascii.Ascii true false true false false true true false) (String (Ascii.Ascii false false true true false true true false) (String (Ascii.Ascii false false true true false true true false) (String (Ascii.Ascii true true true true false true true false) (String (Ascii.Ascii false false false false false true false false) (String (Ascii.Ascii true true true false true true true false) (String (Ascii.Ascii true true true true false true true false) (String (Ascii.Ascii false true false false true true true false) (String (Ascii.Ascii false false true true false true true false) (String (Ascii.Ascii false false true false false true true false) (String (Ascii.Ascii true false false false false true false false) EmptyString))))))))))) : string

by strongly-typed8 hours ago|

parent|

[-]

You know you could just define the verified specs in lean and if performance is a problem, use the lean spec to extract an interface and tests for a more performant language like rust. You could at least in theory use Lean as an orchestrator of verified interfaces.

by refulgentis18 hours ago|

parent|

prev|

[-]

But isn't that tantamount with "his comment is a complete non-sequitor"?

by Paracompact16 hours ago|

parent|

[-]

I don't think so? Lean is formal methods, so it makes sense to discuss the boons of formal and semiformal methods more generally.

I used to think that the only way we would be able to trust AI output would be by leaning heavily into proof-carrying code, but I've come to appreciate the other approaches as well.

by refulgentis5 hours ago|

parent|

[-]

But that's exactly my point. "It's natural to discuss the broader category" is doing a lot of heavy lifting here. The blog post is making a very specific claim: that formal proof, checked by Lean's kernel, is qualitatively different from testing, it lets you skip the human review loop entirely. cadamsdotcom's comment rounds that down to "executable specs good, markdown specs bad," which... sure, but that's been the TDD elevator pitch for 20 years.

If someone posted a breakthrough in cryptographic verification and the top comment was "yeah, unit tests are great," we'd all recognize that as missing the point. I don't think it's unrelated, I think it's almost related, which is worse, because it pattern-matches onto agreement while losing the actual insight.