I'm assuming privacy is not a concern since you mentioned using Deepseek already. The cost of V4 Flash for small tasks is so minuscule as to be almost free, and you don't have to deal with a churning laptop (or even buying a high-end laptop, for someone who doesn't already have one).
I guess what I'm really asking is, what's the advantage of using these small local models if privacy isn't a concern?
Depending on use cases, but for me I found 2 use cases where a local model is a must and not optional:
- Running offline without internet access: for example, I have this project that allow transcribe and summarize audio in real time. I already used it in some events where wifi is not available: https://github.com/ngxson/llama.cpp-realtime-audio-recap
- Handle private personal data, for example health records. This is the same category of "privacy" that you mentioned, but I just want to bring up the fact that people value their privacy differently.