upvote
Probably you're focused on coding agents? I bet someone could use that kind of hardware to filter snarky comments
reply
Here is an example-- I'm running hermes + qwen3.6-27b on a workstation GPU (an older RTX A6000 which gets 55tok/s, though people run this model on more limited hardware).

A friend an I had previously worked on an entropy extraction scheme and he recently got around to making a writeup about our work: https://wuille.net/posts/binomial-randomness-extractors/

I instructed the agent to read the URL, implement the technique in C++ for 32-bit registers, then make a SIMD version that interleaves several extractors in parallel for better performance. It implemented it (not hard since there was an implementation there that it read), then wrote more extensive tests. Then it vectorized it. It got confused a few times during debugging because the algorithm uses some number theory tricks so that overflows of intermediate products don't matter and it was obviously trained a lot on ordinary code were such overflows are usually fatal. I instructed it to comment the code explaining why the overflows are fine and had it continue which mostly solved its confusion.

It successfully got the initial 12MB/s scalar implementation to about 48MB/s. Then I told it to keep optimizing until it reaches 100MB/s. I came back the next day and it had stopped after 6 hours when it achieved just over 100MB/s. Reading what it did: it went off looking at disassembly, figured out what hardware it was running on, and reading microarch timing tables online and made some better decisions, tried a lot of things that didn't work, etc. (And of course, the implementation is correct).

I'm pretty skeptical about AI and borderline hateful of many people who (ab)use it and are deluded by it-- but I think this experience shows that a small local model can be objectively useful.

(oh and this experience was also while I only had the model running at 19tok/s)

Running the model in a loop where it can get feedback from actually testing stuff allows you to make progress in spite of making many mistakes.

I could have done this work myself but I didn't have to and I certainly spent less time checking in and prodding it than it would have taken me to do it. In my case I wondered how much faster parallel extractors using SIMD might be-- an idle curiosity that would have gone unanswered if not for the AI.

reply
This is maybe the first time Ive seen someone claim to do something useful with such a small model.

Congrats, but you're in the 0.0001% thats not just frying their brains, fapping to their local models or doing various magic tricks like a toddler entertained by playing with velcro.

At the end of the day you lost an opportunity to improve yourself and excercise your brain, maybe the opportunity cost is worth it idk, but Im going to keep taking things slow.

Handmade swiss watches > mass manufactured immitations. Handmade clothes > walmart clothes.

reply
Sounds like you're coping for the vendor lock-in you cornered yourself into.
reply
This is a change that's been happening gradually over time-- I don't think I could have done this on a local model that could run on a consumer class gpu a couple months ago.

There are plenty of other uses that people have been making for a long time-- e.g. I know someone who uses a fine tuned local model to sort their incoming email and scan their outgoing messages for accidental privacy leaks.

I don't agree with your assessment on an opportunity lost-- I got my reps in on the original work, the AI gave an incremental step forward which made the whole exercise somewhat more valuable to me with minimal additional cost. I think this improves the cost vs benefit in a way that makes me more likely to try other pointless activities, knowing that when I run out of gas I can toss it to AI to try some variations.

Sometimes you're also 27 steps deep on a nested subproblem and you're really just trying to solve sometime. Even in finr craftsmanship not every step needs to be about maximum craftsmanship. :) Sometimes it's just good to get something done.

I think this is much like any other tool. One can carve furniture using only hand tools, but the benefits of a router are hard to dispute. Both approaches exist in the world and sometimes both are used in concert.

As far as people frying their brains with AI -- you don't need local models for that, plenty of people are driving themselves into deep personally and socially destructive delusion just using the chat interfaces.

reply
I do think post training smaller open source models for very narrow tasks is largely overlooked and there'll be lots of value there if one puts in the effort. However, in a lot of cases we're just compeleting a circle back to deterministic behavior at 1000x the memory/compute requirements just to avoid writing regex.

I agree with you, there's a way to use them responsibly like your router anology, I just think most aren't doing this correctly and its a slippery slope. I'll contend that you probably have used them responsibly in your example.

reply