upvote
If you want the model to have function calls available you need to run it in an agentic harness that can do the proper sandboxing etc. to keep things safe and provide the spec and syntax in your system prompt. This is true of any model: AI inference on its own can only involve guessing, not exact compute.
reply
Thanks, I am very new to this and just run models in LMStudio. I think it would be very useful to have a system prompt telling the model to run python scripts to calculate things LLMs are particularly bad at and run those scripts. Can you recommend a harness that you like to use? I suppose safety of these solutions is its own can of worms, but I am willing to try it.
reply
I use Claude Code. Codex and Opencode both work too. You could even do it with VScode Copilot.
reply
These are typically coding oriented as opposed to general chat, so their system prompts may be needlessly heavy for that use case. I think the closest thing to a general solution is the emerging "claw" ecosystem, as silly as that sounds. Some of the newer "claws" do provide proper sandboxing.
reply
The date command is not wrong, it works on GNU date, if you are in MacOS try running gdate instead (if it is installed):

   gdate -u -d @1775060800
To install gdate and GNU coreutils:

  brew install coreutils
The date command still prints the incorrect value: Wed Apr 1 16:26:40 UTC 2026
reply
Good catch, I just ran it verbatim in iTerm2 on macOs:

date -u -d @1775060800

date: illegal option -- d

btw. how do you format commands in a HN comment correctly?

reply
Start the line indented with two or more spaces [1]

[1]: https://news.ycombinator.com/formatdoc

reply
Last paragraph made me chuckle
reply
Given the working script I don't follow how a broken verification step is supposed to lead to it being off by 1600 seconds?
reply
The model didn't run the script. As pointed out by @zozbot234 in another response, it would need to be run in an agentic harness. This prompt was executed in LMStudio, so just inference.
reply
I'm curious what the thinking trace looked like. Interesting that it can get that close to the answer yet still be off.
reply
[dead]
reply
deleted
reply