Making a sentence like requires deeply understanding a problem space to the point where these sentences emerge, rather than any "craft" of writing.
So the craft is thinking through a topic, usually by writing about it, and then deleting everything you've written because you arrived at the self evident position, and then writing from the vantage point of that self evident statement.
I feel that writing is a personal craft and you must dig it out of yourself through the practice of it, rather than learn it from others. The usage of AI as a resource makes this much clearer to me. You must be confident in your own writing not because it is following best practices or techniques of others but because it is the best version of your own voice at the time of being written.
> Yes, there is a relative scale level...
> Yes, having the smartest model will...
> yes Chinese AI companies have ...
yes yes yes, I didn't say anything, why write in a way that insinuates that I was thinking that?
I mean it doesn't come off as AI slop, so that's yay in 2026. But why do you think it is so good?
I think he is referring to the art of refining an idea though, which I do have something to say on his comment.
I prefer to run inference on my own HW, with a harness that I control, so I can choose myself what compromise between speed and the quality of the results is appropriate for my needs.
When I have complete control, resulting in predictable performance, I can work more efficiently, even with slower HW and with somewhat inferior models, than when I am at the mercy of an external provider.
I have a few other computers with 64 GB DRAM each and with NVIDIA, Intel or AMD GPUs. Fortunately all that memory has been bought long ago, because today I could not afford to buy extra memory.
However, a very short time ago, i.e. the previous week, I have started to work at modifying llama.cpp to allow an optimized execution with weights stored in SSDs, e.g. by using a couple of PCIe 5.0 SSDs, in order to be able to use bigger models than those that can fit inside 128 GB, which is the limit to what I have tested until now.
By coincidence, this week there have been a few threads on HN that have reported similar work for running locally big models with weights stored in SSDs, so I believe that this will become more common in the near future.
The speeds previously achieved for running from SSDs hover around values from a token at a few seconds to a few tokens per second. While such speeds would be low for a chat application, they can be adequate for a coding assistant, if the improved code that is generated compensates the lower speed.
I'm honestly surprised how many people have subscriptions and are expecting anthropic to eat the cost lol
But your article is interesting. You think some of the degradation is because when I think I’m using Opus they’re giving me Sonnet invisibily?
Maybe they are giving Sonnet, or maybe a distilled Opus, or maybe Opus but with lower context, not quite sure but intelligence costs compute so less intelligence means cheaper compute.
The cost of switching is too low for them to be able to get away with the standard enshittification playbook. It takes all of 5 minutes to get a Codex subscription and it works almost exactly the same, down to using the same commands for most actions.