This is true in a sense, but every little papercut at the lower levels of abstraction degrades performance at higher levels as the LLM needs to spend its efforts on hacking around jank in the Python interpreter instead of solving the real problem.
Is this mostly just for codemode where the MCP calls instead go through a Monty function call? Is it to do some quick maths or pre/post-processing to answer queries? Or maybe to implement CaMeL?
It feels like the power of terminal agents is partly because they can access the network/filesystem, and so sandboxed containers are a natural extension?
> Monty avoids the cost, latency, complexity and general faff of using full container based sandbox for running LLM generated code.
> Instead, it let's you safely run Python code written by an LLM embedded in your agent, with startup times measured in single digit microseconds not hundreds of milliseconds.
My models are writing code all day in 3/4 different languages, why would I want to:
a) Restrict them to Python
b) Restrict them to a cutdown, less-useful version of Python?
My models write me Typescript and C# and Python all day with zero issues. Why do I need this?
Only if the training data has enough Python code that doesn't use classes.
(We're in luck that these things are trained on Stackoverflow code snippets.)
(This kind of extremely weak criticism often seems to come from newly created Hacker News accounts, which makes me wonder if it's mostly the same person using sockpuppets.)