upvote
What is the $/Mtok that would make you choose your time vs savings of running stuff locally?

Just to be clear, it may sound like a snarky comment but I'm really curious from you or others how do you see it. I mean there are some batches long running tasks where ignoring electricity it's kind of free but usually local generation is slower (and worse quality) and we all kind of want some stuff to get done.

Or is it not about the cost at all, just about not pushing your data into the clouds.

reply
Good question. I agree with what I think you're implying, which is that local generation is not the right choice if you want to maximize results per time/$ spent. In my experience, models like Claude Opus 4.6 are just so effective that it's hard to justify using much else.

Nevertheless, I spend a lot of time with local models because of: 1. Pure engineering/academic curiosity. It's a blast to experiment with low-level settings/finetunes/lora's/etc. (I have a Cog Sci/ML/software eng background.) 2. I prefer not to share my data with 3rd party services, and it's also nice to not have to worry too much about accidentally pasting sensitive data into prompts (like personal health notes), or if I'm wasting $ with silly experiments, or if I'm accidentally poisoning some stateful cross-session 'memories' linked to an account. 3. It's nice to be able solve simple tasks without having to reason about any external 'side-effects' outside my machine.

reply
For me it's a combination of privacy and wanting to be able to experiment as much as I want without limits. I'd happily take something that is 80% as good as SOTA but I can run it locally 24/7. I don't think there's anything out there yet that would 100% obviate my desire to at least occasionally fall back to e.g. Claude, but I think most of it could be done locally if I had infinite tokens to throw at it.
reply
It’s a hard problem. I’ve been working on it for the better part of a year.

Well, granted my project is trying to do this in a way that works across multiple devices and supports multiple models to find the best “quality” and the best allocation. And this puts an exponential over the project.

But “quality” is the hard part. In this case I’m just choosing the largest quants.

reply
Supporting all the various devices does sound quite challenging.

I wouldn't expect a perfect single measurement of "quality" to exist, but it seems like it could be approximated enough to at least be directionally useful. (e.g. comparing subsequent releases of the same model family)

reply
LLMs are just special purpose calculators, as opposed to normal calculators which just do numbers and MUST be accurate. There aren't very good ways of knowing what you want because the people making the models can't read your mind and have different goals
reply