undefined

points

[-]

I am sure as ideation devices these can work fine. I treat this more like basic infra. I would absolutely love the future where most phones have some small LLM built in, kind of like a base layer of infra

by newswangerd225 days ago|

prev|

[-]

The best use case I've found for tiny models (<5bn params) as a reference tool for when I don't have WiFi. I've been using qwen on my MacBook Air as a replacement for Google while I'm writing code on flights. They work great for asking basic questions about syntax and documentation.

by concats225 days ago|

prev|

[-]

There are use cases where even low accuracy could be useful. I can't predict future products, but here are two that are already in place today:

- On the keyboard on iphones some sort of tiny language model suggest what it thinks are the most likely follow up words when writing. You only have to pick a suggested next word if it matches what you were planning on typing.

- Speculative decoding is a technique which utilized smaller models to speed up the inference for bigger models.

I'm sure smart people will invent other future use cases too.

by mkl225 days ago|

prev|

[-]

Qwen2.5-VL 7B is pretty impressive at turning printed or handwritten maths lecture notes into Latex code, and is small enough to run slowly on a laptop without enough VRAM. Gemma3 4B was useless at this though, and got stuck in loops or tried to solve the maths problems instead of just converting the working to Latex (but it was much faster as it fit into VRAM).

It sounds like you're trying to use them like ChatGPT, but I think that's not what they're for.

by eternityforest224 days ago|

prev|

[-]

Gemma3 4B can answer questions about 80% of the time given access to a ZIM file of Wikipedia.

Unfortunately it still takes 20 seconds to run on a CPU, so I can't think of many practical uses at the moment until we get cheap low power AI accelerators that are a bit easier to develop for....

by omgitspavel225 days ago|

prev|

[-]

I use gemma3:1b model (well, gemma3n:e2b since today) to summarize articles in my RSS reader. Works extremely well for such a simple task and runs on CPU on my hetzner server, so I don't have to pay electricity bill for running it on GPU at home

by iamnotagenius225 days ago|

prev|

[-]

Tiny, 4b or less models are designed for finetuning for some narrow tasks; this way can outperform large commercial models for a tiny fraction of price. Also great for code autocomplete.

7b-8b are great coding assistants if all you need is dumb fast refactoring, that cannot quite be done with macros and standard editor functionality but still primitive, such as "rename all methods having at least one argument of type SomeType by prefixing their names with "ST_".

12b is a threshold where models start writing coherent prose such Mistral Nemo or Gemma 3 12b.