upvote
I keep finding more and more usecases for Q3.6 27b (same league) and the best performance is, when answers to my question is already in the context.

The moment I'm trying something open-ended or ambitious, Claude/ChatGPT clearly take you to the goal quicker.

For things, where there's a way to build a knowledgebase though, the local llm definitely can be a true contender. Plus, having a big context and no worries about filling it over and over - you can get quite far.

I'm writing this, literally in between cooking a pasta, that the local llm ordered products for me online. I've built a grocery shopping skill, so that it roughly knows what I have in fridge (losely), my last 10 representative orders (general preferences plus rich info about shops and skus around me) and actual real-time in stock info. The last part has been my personal pet peeve for every product that promised cooking ingredient delivery (that is not packaged specifically for that).

This is what has been promised to us by every big tech company with an agent, and now a local llms actually solved that for me fully.

reply
It's also going to fail consistently. When calling Claude you don't know what version of the model you are talking to, it might be quantified sure to load or have been patched.
reply
This is true. The failure modes are simpler. And yes the ceiling is lower as well. Smaller models stability is lower over long sequences. And thus anything that needs a lot of CoT will be weaker. For example, I had a dumb lock + condvar with multiple defenses against lost wakeups in a N producer 1 consumer queue thing. Models generally need a lot of CoT before they realise they can switch it to a semaphore instead. Qwen typically isn't stable over such long CoTs and ends up adding more and more slop and band aids versus a larger model that outputs a large CoT and then realises it can swap 3 functions out with 2 lines if we use a semaphore.
reply