undefined

points

[-]

> Apple is counting on something else: model shrink

The most powerful AI interactions I've had involved giving a model a task and then fucking off. At that point, I don't actually care if it takes 5 minutes or an hour. I've cued up a list of background tasks it can work on, and that I can circle back to when I have time. In that context, smaller isn't even the virtue at hand–user patience is. Having a machine that works on my bullshit questions and modelling projects at one tenth the speed of a datacentre could still work out to being a good deal even before considering the privacy and lock-in problems.

by jiggawatts15 hours ago|

parent|

[-]

What "tooling" do you use to let AIs work unattended for long periods?

by JumpCrisscross11 minutes ago|

parent|

[-]

> What "tooling" do you use to let AIs work unattended for long periods?

Claude and Kagi Assistant. I tried tooling up a multi-model environment in Ollama and it was annoying. It's just searching the web, building models and then running a test suite against the model to refine it.

by raincole16 hours ago|

parent|

prev|

[-]

Cool? And it has nothing to do with what kind of consumer hardware Apple should sell. If your use cases are literally "bigger model better" then the you should always use cloud. No matter how much computing power Apple squeezes into their device it won't be a mighty data center.

by gizajob15 hours ago|

parent|

[-]

For running the model once it’s been trained, all a datacenter does is give you lower latency. Once the devices have a large enough memory to host the model locally, then the need to pay datacenter bills is going to be questioned. I’d rather run OpenClaw on my device plugged into a local LLM rather than rely on OpenAI or Claude.

by 16 hours ago|

parent|

prev|

[-]

deleted

by root_axis19 hours ago|

prev|

[-]

> At some point a beefy Mac Studio and the "right sized" model is going to be what people want.

It's pretty clear that this isn't going to happen any time soon, if ever. You can't shrink the models without destroying their coherence, and this is a consistently robust observation across the board.

by sipjca18 hours ago|

parent|

[-]

I don’t think it’s about literally shrinking the models via quantization, but rather training smaller/more efficient models from scratch

Smaller models have gotten much more powerful the last 2 years. Qwen 3.5 is one example of this. The cost/compute requirements of running the same level intelligence is going down

by root_axis8 hours ago|

parent|

[-]

There are no practically useful small models, including Qwen 3.5. Yes, the small models of today are a lot more interesting than the small models of 2 years ago, but they remain broadly incoherent beyond demos and tinkering.

by HerbManic17 hours ago|

parent|

prev|

[-]

I have said for a while that we need a sort of big-little-big model situation.

The inputs are parsed with a large LLM. This gets passed on to a smaller hyper specific model. That outputs to a large LLM to make it readable.

Essentially you can blend two model type. Probabilistic Input > Deterministic function > Probabilistic Output. Have multiple little determainistic models that are choose for specific tasks. Now all of this is VERY easy to say, and VERY difficult to do.

But if it could be done, it would basically shrink all the models needed. Don't need a huge input/output model if it is more of an interpreter.

by kyboren17 hours ago|

parent|

prev|

[-]

Yes, but bigger models are still more capable. Models shrinking (iso-performance) just means that people will train and use more capable models with a longer context.

by sipjca15 hours ago|

parent|

[-]

Of course they are! Both are important and will be around and used for different reasons

by Forgeties7920 hours ago|

prev|

[-]

Cheaper than what you’d expect though. You could get a nice setup for $20-40k 6mo ago. As far as enterprise investments go, that’s a rounding error.

by a1o19 hours ago|

parent|

[-]

Not all enterprises are the same, I imagine many companies have different departments working with local optimums, so someone who could benefit from it to get more productivity might not have access to it because the department that is doing hardware acquisition is being measured in isolation.

by Forgeties797 hours ago|

parent|

[-]

I think it’s a little unnecessary to lecture somebody on HN about how enterprises come in different shapes and sizes. It’s pretty clear what I’m implying here if you aren’t actively trying to assume the most reduced, least charitable version of my statement.

by zer00eyz19 hours ago|

parent|

prev|

[-]

Drop that down to 5k, and make it useful.

Give every iPhone family a in house Siri that will deal with canceling services and pursuing refunds.

Your customer screw up results in your site getting an agent drive DDOS on its CS department till you give in.

Siri: "Hey User, here's your daily update, I see you haven't been to the gym, would you like me to harass their customer service department till they let you out of their onerous contract?"

by Forgeties797 hours ago|

parent|

[-]

I’m running modest setup using a mistral model (24B) on a 9070 (AMD) and 32gb of ram. $1800 machine at the time I built it. It ultimately boils down to what you want to do with it. For me, it’s basically a drafting tool. I use it to break through writer’s block, iterate, or just throw out some ideas. Sometimes summarize but that can be hit or misss.

I don’t need the latest and greatest and I fine tuned LM studio enough that I get acceptable results in 30 to 90 seconds that help me keep moving ahead. I am not a software engineer, I am definitely not as much of a “coder” as the average person on HN. So if I can do it for less than $2000, I bet a lot of (smarter/experience coding) people could see great results for $5000.

You can get an M3 ultra Mac studio with 96gb ram for $4000. If you’re willing to go up to $6k it’s 256gb. Wayyyyy more firepower than my setup. I imagine plenty powerful for a lot of people.