undefined

points

[-]

Especially for super constrained applications. I don't care if the language model that I use for my extremely specific business domain can solve PhD math or remember the works of Shakespeare. I'd trade all of that for pure task specific accuracy.

by arkmm7 hours ago|

parent|

[-]

Can you share more details about your use case? The good applications of fine tuning are usually pretty niche, which tends to make people feel like others might not be interested in hearing the details.

As a result it's really hard to read about real-world use cases online. I think a lot of people would love to hear more details - at least I know I would!

by _the_inflator6 hours ago|

prev|

[-]

I agree.

Also for certain use cases there are constraints like embedded hardware systems with no internet access. These LLMs have to be trained to specialize for clearly defined use cases under hardware constraints.

Frontier LLMs also are rarely function in isolation instead are orchestrating a system of special units aka subsystems and agents.

While costs and effort are one thing, being able to downsize these monster LLMs through finetuning itself in the first place is extremly valuable.

by andriy_koval5 hours ago|

prev|

[-]

> "Frontier LLMs can do it with enough context" is not really a strong argument against fine-tuning, because they're expensive to run.

I am not expert in this topic, but I am wondering if large cached context is actually cheap to run and frontier models would be cost efficient too in such setting?

by prettyblocks31 minutes ago|

parent|

[-]

I'd like to read more about that if anyone has any suggestions.

by derwiki10 hours ago|

prev|

[-]

Exactly, inference cost is a very good reason to fine tune with something like Qwen

by Me100010 hours ago|

prev|

[-]

Wouldn’t it be better to use a grammar in the token sampler? Tuning is fine, but doesn’t guarantee a syntactical correct structured output. But if the sampler is grammar aware it could.

by MillionOClock10 hours ago|

parent|

[-]

I think both should be done, they don't really serve the same purpose.

by butILoveLife10 hours ago|

prev|

[-]

This is literally what I'm waiting for. I want a ~8B model that works well with OpenClaw.

by prettyblocks10 hours ago|

parent|

[-]

I don't think you will get that anytime soon because for a model to work well with something like openclaw it needs a massive context window.

by butILoveLife10 hours ago|

parent|

[-]

but but but but unified memory! (jk, I don't actually believe in Apple marketing words)

There might be future optimizations. Like, have your small model do COT to find where to look for memory that is relevant.

by piyh10 hours ago|

parent|

prev|

[-]

Qwen 9B doesn't?

by butILoveLife10 hours ago|

parent|

[-]

Nothing is really usable outside Opus.

I've tried too. Wasted a few days trying out even high end paid models.

by throwaway697711 hours ago|

prev|

[-]

I agree- I'm currently trying to learn how I can embed a fine tuned tiny model into my c++ game so it can provide a narrative in prose of certain game-event logs. It needs to be as tiny as possible so it doesn't take resources away from the running game.

by hedgehog46 minutes ago|

parent|

[-]

There are a bunch of tutorials on how to use GRPO to fine tune a small Qwen. Depending what you're doing LoRA or even just prefix tuning can give pretty good results with no special hardware.

by lelanthran4 hours ago|

parent|

prev|

[-]

> I agree- I'm currently trying to learn how I can embed a fine tuned tiny model into my c++ game so it can provide a narrative in prose of certain game-event logs.

Unless your game states have combinatoral exlosion, would it not be better to generate all of that pre-build? If templated you can generate a few hundreds of thousands of templates to use for any circumstance, then instantiate and stitch together those templates during the game runtime.

by yw34107 hours ago|

parent|

prev|

[-]

How small a model are we talking? Don't even the smallest models which would work need gigabytes of memory?

by lelanthran4 hours ago|

parent|

[-]

> How small a model are we talking? Don't even the smallest models which would work need gigabytes of memory?

I dunno, for game prose I expect that a tiny highly quantized model would be sufficient (generating no more than a paragraph), so 300MB - 500MB maybe? Running on CPU not GPU is feasible too, I think.