upvote
Hey, Boris from the CC team here. I agree, we're working on consolidating these. Going forward it will just be the built-in /code-review skill.

Here's how to use the skill on the latest version:

/code-review # do a balanced code review. checks for bugs and inconsistencies, poor code quality, duplication, band aids, etc.

/code-review --fix # same as above, but also fix the issues

# choose an explicit effort level (defaults to your current effort level). all of these also accept --fix:

/code-review low

/code-review medium

/code-review high

/code-review xhigh

/code-review max

# do an expensive and extremely thorough review (reliably catches >99% of bugs, costs $3-20 per review depending on complexity):

/code-review ultra

Open to feedback if anyone has feedback or ideas for how to make these even nicer to use.

reply
Hi Boris, what is the advantage of using /code-review vs just asking Opus to “code review”?

As a casual user working on hobby projects, I struggle to keep up with the pace of changes and knowing what to use when. My default now is to use Opus for all coding (sonnet is fine but seems dumber) and to prompt it for everything I need. I’ve had great success with this but clearly I’m missing power user functions with the slash commands and such.

reply
The advantage is that /code-review supplies a structured idea of how to review and what that process should look like and then launches independent subagents to approach the issue from multiple angles.

It's analogous to how in the early days you could see benefits by telling the models to "think step by step". /code-review is something like "review angle by angle". "Consider removed behavior" and also "Look at language gotchas" and also "Look at test changes"...etc. Yes these are all somewhat implicitly already part of what "code review" means, but the models perform best with explicitness.

If you want my 2c as a power user: just don't think about it and use /code-review xhigh --fix. This will cover like 98% of what you want out of code review. It's a good skill.

reply
We've all spent time -fixing someone's bright idea of a -fix. I'm sceptical of the time saving of applying a -fix before I understand the problem(s).

Outsourcing comprehension to a machine is probably gonna cost you more time in the long run.

reply
I don't even bother looking at the code until I've run a code review pass on it. Why waste my time with trivial bug fixes? I find the best way to spend time right now is like:

- Defining the issue/ticket, what "success" looks like (if I have a good idea of this), high level approach guidance 50%

- Dispatch agent to work on it 5%

- Occasionally return and nudge agent + send /simplify or /code-review 5%

- Look at the code/session summary, divergences from the plan, ask followup questions 40%

Occasionally yes there is some solution the AI chose that is suboptimal and I would prefer fixed in a different way. Mostly though it's straightforward.

reply
Thank you I will try this!

Is there something equivalent when coding in the first place? Eg /code high “prompt”

reply
Are you thinking of the /effort level in Claude Code? I would just go with xhigh as a reasonable default. Most important thing in prompting is specifying what "done" and "success" looks like to you. Ask Claude to help you come up with a well formed request and spend most of your time on that, then paste that into a brand new session.
reply
As a general rule, I'd give the Markdown a read for any skills/commands you might find useful, it'll give you a good idea of the specifics it adds.

https://github.com/anthropics/claude-code/blob/main/plugins/...

reply
/code-review has a specific prompt that we've found is a good balance of precision, recall, and cost. You could totally roll your own prompt also.
reply
And why would someone use the various levels? Is a low code review even worth running? And how do I know what level to use in the first place?

This stuff all seems so nebulous to me and I’ve yet to see anything that says use x in y situation. So I default to higher effort levels than I likely need.

reply
> # do an expensive and extremely thorough review (reliably catches >99% of bugs, costs $3-20 per review depending on complexity):

/code-review ultra

main suggestion would be to sound a lot less optimistic about that it finds 99% of bugs or that its at all thorough, and instead list that it is time capped, and will only find bugs that you explicitly tell it to look for.

i used my three runs of ultrareview.

the first run with no other prompting found a couple typos in markdown only

the second one i prompted it with several themes of known open bugs in the code, and it found 6 items

and then the third one i ran after doing an actual long audit through gemini to make a much more detailed prompt about issues in the code

and for that one, instead of doing an exhaustive run, it just never started, so no idea if it worked

but the experience had no relation at all with the reliability or thoroughness claims

reply
[dead]
reply
Hey Boris, thanks for the great product and for listening!

I find the mix between slash commands that are programmatic harness configuration and control commands (/config, /model, /feedback, /fork, /usage, etc.) and ones that are little more than prompt template insertion (/code-review, /<skill>, etc.) to be a little confusing and unnecessary. A slash command should be one thing, and one thing only: a command for the harness, not the agent.

When I invoke a slash command like /code-review, I should be invoking some additional harness functionality, something above and beyond the agent's sphere of influence - not just pasting some hidden text into the next turn. Otherwise, why wouldn't I just say "Claude, review this code"?

Yet most of these "added value" commands bloating the slash command list, are just shortcuts for copy and paste. I don't want to go to have to learn the syntax of a special /code-review command (which options are positional args, which are --flags, etc.), and I'm much less likely to use or even be aware of a command like this, when I can just ask "Do a balanced code review and fix the issues", or use the GUI to set the effort level to xhigh before asking "Review my code." That way I can also be more specific about exactly what I need, rather than relying on what's in the canned prompt - a prompt which I'll probably never read and vet myself anyway. The value added by the slash command needs to be really high compared to just typing a prompt, for it to justify the friction of discovery and learning the syntax.

So I suppose I'm advocating for a different system. Keep slash commands for meta-level harness control and configuration, and add a new mechanism for canned prompt insertion, one which is tailor made for that purpose rather than overloading the slash command system. Let the user see what's in the canned prompts, and even make adjustments or edits as needed before sending them, one-time or persisted. Provide a GUI in the app with the user's favorite prompts, where the user can add, delete, and edit them, making it easy to invoke and insert them as needed. Or let the agent automatically discover and use them as needed, rather than requiring the user to remember and recall their magic shortcuts and their arguments. That's just one idea.

Skills, plugins, commands, and so on, need to be consolidated not just for code review of course but across the full architecture of how prompt templates are managed.

reply
Hey Boris, some feedback. I like the new /code-review skill but was disappointed you guys removed /simplify because I quite liked the focus on finding code reuse/efficiency opportunities.

I see now in 2.1.152 you added those focus areas back to /code-review, but still bundled with the correctness finding. It would be great to have more fine grained control over the /code-review angles beyond just effort level. Or maybe you would recommend that I just specify that as freeform input after effort level?

reply
Yep, you can add free-form input. Will update /simplify to only check for code quality and not bugs (the way it used to work), that's a good suggestion.
reply
> reliably catches >99% of bugs

In what scope?

reply
Thanks, Boris, for reading and reviewing :)
reply
> They are all just variations of "insert a canned prompt", varying only along the dimensions of (a) how and where the prompt is installed and from where it is sourced, and (b) which context or contexts the prompt runs in. There's not much advice here about which option is best, and no clear best practices seem to have emerged yet either. Personally, I find just asking Claude to review the code works well enough.

The subagent approach is structurally different from the others because it runs with clean context. That has three major effects:

1. All other things being equal, it will result in a lower cost-to-solution because of the quadratic cost scaling of an LLM session (input token or cached-input cost being paid with each new round).

2. The review model will not be able to 'cheat' by retaining assumptions from the main session, such as "x must be done like y." For people, this is why having a separate person perform code review (or, if not possible, reviewing code after a mind-clearing break) is handy; the applicability of this analogy to LLMs is vague but reasonable.

3. The main model will only see the results of the review, not the detailed reasoning that leads up to it. On one hand this avoids more context pollution, but on the other hand it might lead to duplicative logic to re-discover the mechanics behind bugs found.

> I checked the session logs to see how often the agents were actually invoking the LSP tools. The answer was they had invoked them literally once the entire time.

I think the intent behind 'install a language server plugin' is that these tools should lint automatically after every edit, without waiting for an explicit call from the LLM.

reply
> The subagent approach is structurally different from the others because it runs with clean context.

Yes, and this is what I mean by "which context the prompt runs in". The subagent approach is different and has pros and cons, and it may in some situations be better (but perhaps not in others). On the other hand, I can also just create a new conversation and paste my own review prompt into it; then take the last turn's summary output and feed it back into my main conversation thread in the unusual event I would need to do so. Spawning a subagent is a convenient shortcut for this, but ultimately, it's the same thing.

> I think the intent behind 'install a language server plugin' is that these tools should lint automatically after every edit, without waiting for an explicit call from the LLM.

This is a great point and I had only checked my session logs for explicit tool calls. I went back and looked for diagnostics injected automatically by the harness after every edit, and whether the agent made use of them.

Claude: neither the Rust or Dart LSPs ever inserted any diagnostic events, but Ty did. Across 627 sessions, ty-lsp injected diagnostics blocks in 186 sessions, with a total of 33 findings. Out of those 33, 32 were dismissed as unrelated (13) or pre-existing (19). Only 1 finding was acted upon. The model is in the habit of running the batch analysis tools (ruff, ty, cargo clippy etc.) and prek anyway, so it would have caught that diagnostic regardless.

Codex: no diagnostic events were inserted by any of the LSPs.

So I won't be reinstalling those LSPs.

reply
I just consider this temp phase because models are dumb and harnesses are not yet there.

When I need code review I should just say “review it”. Model should figure out what plugins, skills, etc. to use.

reply
Why does it need plugins/skills for a code review? Claude will just "review it" if you ask it to, and if you have particular preferences, they can go in CLAUDE.md
reply
Skills are effectively the same thing as asking it, just with more depth. So the skill is just a framework for a very precisely asked question. It often includes how you want Claude to respond, etc.

I’m not aware of anything fundamentally unique about skills or commands, they’re just more tokens to shape the llm

reply
Totally. You can do that now, and Claude will know to use /code-review.
reply
> They are all just variations of "insert a canned prompt", varying only along the dimensions of (a) how and where the prompt is installed and from where it is sourced, and (b) which context or contexts the prompt runs in.

Yes, yes, thank you, sometimes I feel like I'm taking crazy pills.

The industry and overall developer ecosystem has become absolutely mesmerized by the act of creating and popularizing little bits of protocol and machinery to dress up the act of inserting text into the machine. Yes, they're useful and provide some consistency, but I'm convinced that the main reason people like them so much is because they put a thin "I'm still a programmer wielding complicated tools that laypeople don't understand" coating over the fact that we're all just asking the AI nicely to do a thing.

reply
I imagine that the companies that earn money from input and output tokens really, really like excessive skills because of the sheer amount of potentially pointless constraints and instructions being sent back and forth ("don't store passwords as plaintext", "always check for syntax errors" and other obvious guidelines).
reply
My personal experience is the opposite. Lack of skills uses more tokens.
reply
[flagged]
reply