upvote
I use HugstonOne (that backend a personalized version of llama.cpp). Implemented it´s own double layer memory that recall the full or partial previous session/file with an ON/OFF switch (which picks up where left off in CLI or Server or both same time) and another that reads back a % of current tab if memory switch is off doing checkpoints every certain tokens, summarizing and referring back to it when needed (recalled by certain logics). There is more to it when involving local RAG (making it tripple memory layer) but thats a long story.

About the harness depends on for what you need it, but basically for a universal unit of measure, Harness is multilayered and logic and domain specific dependent. I would definitely include Type of Hardware, Model parameters/knowledge, Model Intelligence, Model size/context, type of conversion, type and quantization (models comes with some default tools), but adding your (domain specific), skills, tools, memory, logs, security, Rag, Online search... (which as scary as they sound are mostly simple logics in a txt file, like if this do that).

The full pack is Harness 10, every missing thing lower the harness score.

To answer to your question I would definitely recommend smth like HugstonOne (or anyway llama.cpp CLI) with Qwen 3.6 35B finetuned/distill (deepseek 4 or claude 4.7) with none of the current coding agents out there that are screaming internet connection and proprietary API and data collection. DO this, if you can find a tool that you can download and choose a local model (of your choice in whatever folder locally) and load it ready for inference without any need of internet connection that is the tool you should aim for. Right now there is none out there.

reply