For example, applying diffs to files. Since the LLM uses tokenization for all its text input/output, sometimes the diffs it'll create to modify a file aren't quite right as it may slightly mess up the text which is before/after the change and/or might introduce a slight typo in text which is being removed, which may or may not cleanly apply in the edit. There's a variety of ways to deal with this but most of the agentic coding tools have this mostly solved now (I guess you could just copy their implementation?).
Also, sometimes the models will send you JSON or XML back from tool calls which isn't valid, so your tool will need to handle that.
These fun implementation details don't happen that often in a coding session, but they happen often enough that you'd probably get driven mad trying to use a tool which didn't handle them seamlessly if you're doing real work.
Start small, hit issues, fix them, add features, iterate, just like any other software.
There's also a handful of smaller open source agentic tools out there which you can start from, or just join their community, rather than writing your own.
ML related stuff isnt going to matter a ton since for most cases an LLM inference is you making an API call
web scraping is probably the most similar thing
It's probably not enough to have answer-prompt -> tool call -> result critic -> apply or refine, there might be a specific thing they're doing when they fine tune the loop to the model, or they might even train the model to improve the existing loop.
You would have to first look at their agent loop and then code it up from scratch.
edit: There's a tool, i haven't used it in forever, i think it was netsaint(?) that let you sniff https in clear text with some kind of proxy. The enabling requirement is sniffing traffic on localhost iirc which would be the case with CC
You think a single person can do better? I don't think that's possible. Opencode is better than Claude Code and they also have perhaps even more manhours.
It's a collaboration thing, ever improving.