It sounds like you're trying to use them like ChatGPT, but I think that's not what they're for.
- On the keyboard on iphones some sort of tiny language model suggest what it thinks are the most likely follow up words when writing. You only have to pick a suggested next word if it matches what you were planning on typing.
- Speculative decoding is a technique which utilized smaller models to speed up the inference for bigger models.
I'm sure smart people will invent other future use cases too.
7b-8b are great coding assistants if all you need is dumb fast refactoring, that cannot quite be done with macros and standard editor functionality but still primitive, such as "rename all methods having at least one argument of type SomeType by prefixing their names with "ST_".
12b is a threshold where models start writing coherent prose such Mistral Nemo or Gemma 3 12b.