1) Deterministic
- Using a tree-sitter/AST-like approach, I could extract types, functions, and perhaps comments, and put them into an index map.
- Cons:
- The tricky part of this approach is that what I extract can be pretty large per file, for example, comments.
- Then, I would probably need an agentic synthesis step for those comments anyway.
2) Agentic - Since Flash is dirt cheap, I wanted to experiment and skip #1, and go directly to #2.
- Because my tool is built for concurrency, when set to 32, it's super fast.
- The price is relatively low, perhaps $1 or $2 for 50k LOC, and 60 to 90 seconds, about 30 to 45 minutes of AI work.
- What I get back is relatively consistent by file, size-wise, and it's just one trip per file.
So, this is why I started with #2.And then, the results in real coding scenarios have been astonishing.
Way above what I expected.
The way those indexes get combined with the user prompt gets the right files 95% of the time, and with surprisingly high quality.
So, I might add deterministic aspects to it, but since I think I will need the agentic step anyway, I have deprioritized it.