undefined

points

[-]

It all depends on how cheap they can get. And another interesting thought: what if you could stack them? For example you have a base model module, then new ones come out that can work together with the old ones and expanding their capabilities.

by brainless14 hours ago|

prev|

[-]

New GPUs come out all the time. New phones come out (if you count all the manufacturers) all the time. We do not need to always buy the new one.

Current open weight models < 20B are already capable of being useful. With even 1K tokens/second, they would change what it means to interact with them or for models to interact with the computer.

by lm2846914 hours ago|

parent|

[-]

hm yeah I guess if they stick to shitty models it works out, I was talking about the models people use to actually do things instead of shitposting from openclaw and getting reminders about their next dentist appointment.

by brainless14 hours ago|

parent|

[-]

The trick with small models is what you ask them to do. I am working on a data extraction app (from emails and files) that works entirely local. I applied for Taalas API because it would be awesome fit.

dwata: Entirely Local Financial Data Extraction from Emails Using Ministral 3 3B with Ollama: https://youtu.be/LVT-jYlvM18

https://github.com/brainless/dwata

by imtringued13 hours ago|

parent|

prev|

[-]

Considering that enamel regrowth is still experimental (only curodont exists as a commercial product), those dentist appointments are probably the most important routine healthcare appointments in your life. Pick something that is actually useless.

by lm2846958 minutes ago|

parent|

[-]

If you need a full blown llm with root access to all your devices to remind you about an appointment something is very wrong with your life.

by NinjaTrance14 hours ago|

prev|

[-]

To run Llama 3.1 8B locally, you would need a GPU with a minimum of 16 GB of VRAM, such as an NVIDIA RTX 3090.

Talas promises a 10x higher throughtput, being 10x cheaper and using 10x less electricity.

Looks like a good value proposition.

by ac297 hours ago|

parent|

[-]

> To run Llama 3.1 8B locally, you would need a GPU with a minimum of 16 GB of VRAM, such as an NVIDIA RTX 3090

In full precision, yes. But this talaas chip uses a heavily quantized version (the article calls it "3/6 bit quant", probably similar to Q4_K_M). You dont even need a GPU to run that with reasonable performance, a CPU is fine.

by lm2846914 hours ago|

parent|

prev|

[-]

What do you do with 8b models ? They can't even reliably create a .txt file or do any kind of tool calling

by joquarky42 minutes ago|

parent|

[-]

Exploration, summarization, classification, translation

by sowbug5 hours ago|

prev|

[-]

Re-read Brave New World. Deltas and Epsilons have their place, even if Alphas and Betas got smarter overnight.

Roof! Roof!

by lancebeet14 hours ago|

prev|

[-]

You obviously don't believe that AGI is coming in two release cycles, and you also don't seem to have much faith in the new models containing massive improvements over the last ones. So the answer to who is going to pay for these custom chips seems to be you.

by lm2846913 hours ago|

parent|

[-]

Why would I buy chips to run handicapped models when the 10+ llms players all offer free tier access to their 1t+ parameters models ?

by grosswait11 hours ago|

parent|

[-]

Do you think the free gravy train will run forever?

by K0balt11 hours ago|

parent|

prev|

[-]

Not all applications are chatbots. Many potential uses for LLMs/VLAMs are latency constrained.

by amelius13 hours ago|

prev|

[-]

I'm guessing this development will make the fabrication of custom chips cheaper.

Exciting times.

by casey210 hours ago|

prev|

[-]

Probably the datacenters that serve those models?

by imtringued13 hours ago|

prev|

[-]

Almost all LLM companies have some sort of free tier that does nothing but lose them money.