undefined

points

[-]

> False, it creates consumer demand for inference chips, which will be badly utilised.

I think the opposite is true. Local inference doesn't have to go over the wire and through a bunch of firewalls and what have you. The performance from just regular consumer hardware with local, smaller models is already decent. You're utilizing the hardware you already have.

> The performance limitations are inherent to the limited compute and memory.

When you plug in a local LLM and inference engine into an agent that is built around the assumption of using a cloud/frontier model then that's true.

But agents can be built around local assumptions and more specific workflows and problems. That also includes the model orchestration and model choice per task (or even tool).

The Jevons Paradox comes into play with using cloud models. But when you have less resources you are forced to move into more deterministic workflows. That includes tighter control over what the agent can do at any point in time, but also per project/session workflows where you generate intermediate programs/scripts instead of letting the agent just do what ever it wants.

I give you an example:

When you ask a cloud based agent to do something and it wants more information, it will often do a series of tool calls to gather what it thinks it needs before proceeding. Very often you can front load that part, by first writing a testable program that gathers most of the necessary information up front and only then moving into an agentic workflow.

This approach can produce a bunch of .json, .md files or it can move things into a structured database or you can use embeddings or what have you.

This can save you a lot of inference, make things more reusable and you don't need a model that is as capable if its context is already available and tailored to a specific task.

by pama9 hours ago|

parent|

[-]

Parallel inference on large compute scales in superlinear ways. There is no way to beat the reduction in memory transfers that a data-center inference model provides with hardware that fits at anything called a home. It is much more energy efficient to process huge batches of parallel requests compared to having one or a handful of queries running on an accelerator.

by dudefeliciano8 hours ago|

parent|

[-]

Aren't data centers extremely energy inneficient due to network latency, memory bottlenecks and so on? I mean the models that run on them are extremely powerful compared to what you can run on consumer hardware, but I wouldn't call them efficient...

by Shorel5 hours ago|

parent|

[-]

I'm sorry to get into this conversation, but the performance of a model is some orders of magnitude lower (meaning it requires greater amounts of specific computing power) than all the network stack of all the nodes involved in the internet traffic of some particular request.

Meaning: these 5000 tokens consume tiny amounts of energy being moved all around from the data center to your PC, but enormous amounts of energy being generated at all. An equivalent webpage with the same amount of text as these tokens would be perceived as instant in any network configuration. Just some kilobytes of text. Much smaller than most background graphics. The two things can't be compared at all.

However, just last week there have been huge improvements on the hardware required to run some particular models, thanks to some very clever quantisation. This lowers the memory required 6x in our home hardware, which is great.

In the end, we spent more energy playing videogames during the last two decades, than all this AI craze, and it was never a problem. We surely can run models locally, and heat our homes in winter.

by txdv3 hours ago|

prev|

[-]

> False, it creates consumer demand for inference chips, which will be badly utilised.

There are so many CPUs, GPUs, RAM and SSDs which are underutilized. I have some in my closet doing 5% load at peek times. Why would inference chips be special once they become commodity hardware?

by iknowstuff3 hours ago|

parent|

[-]

Thats the point, they’re better utilized in the cloud

by locknitpicker10 hours ago|

prev|

[-]

> What makes you think that?

The fact that today's and yesterday's models are quite capable of handling mundane tasks, and even companies behind frontier models are investing heavily in strategies to manage context instead of blindly plowing through problems with brute-force generalist models.

But let's flip this around: what on earth even suggests to you that most users need frontier models?

by konschubert7 hours ago|

parent|

[-]

Everybody has difficult decisions to make in their daily lives and in their work.

Having access to a model that is drawing from good sources and takes time to think instead of hallucinating a response is important in many domains of life.

by locknitpicker6 hours ago|

parent|

[-]

[dead]

by 11 hours ago|

prev|

[-]

deleted

by ekianjo11 hours ago|

prev|

[-]

> What makes you think that?

Looking at actual users of LLMs

by konschubert7 hours ago|

parent|

[-]

While not everybody is a professional in YOUR domain, many people are professionals in SOME domain. And even outside of that, they deserve a smart conversation partner, for example on topics like health and politics.