upvote
>But I rarely see an objective accuracy comparison.

There are some perplexity comparison numbers for the previous gen - Orange pi 5 in link below.

Bit of a mixed bag, but doesn't seem catastrophic across the board. Some models are showing minimal perplexity loss at Q8...

https://github.com/invisiofficial/rk-llama.cpp/blob/rknpu2/g...

reply
The even more confounding factor is there are specific builds provided by every vendor of these Cix P1 systems: Radxa, Orange Pi, Minisforum, now MetaComputing... it is painful to try to sort it out, as someone who knows where to look.

I couldn't imagine recommending any of these boards to people who aren't already SBC tinkerers.

reply
I was also onboard until he got to the NPU downsides. I don't care about use for an LLM, but I would like to see the ability to run smallish ONNX models generated from a classical ML workflow. Not only is a GPU overkill for the tasks I'm considering, but I'm also concerned that unattended GPUs out on the edge will be repurposed for something else (video games, crypto mining, or just straight up ganked)
reply
just try to find some benchmark top_k, temp, etc parameters for llama.cpp. There's no consistent framing of any of these things. Temp should be effectively 0 so it's atleast deterministic in it's random probabilities.
reply
Right. There are countless parameters and seeds and whatnots to tweak. But theoretically if all the inputs are the same the outputs should be within Epsilon of a known good. I wouldn't even mandate temperature or any other parameter be a specific value, just that it's the same. That way you can make sure even the pseudorandom processes are the same, so long as nothing pulls from a hardware rng or something like that. Which seems reasonable for them to do so idk maybe an "insecure rng" mode
reply
>Temp should be effectively 0 so it's atleast deterministic in it's random probabilities.

Is this a thing? I read an article about how due to some implementation detail of GPUs, you don't actually get deterministic outputs even with temp 0.

But I don't understand that, and haven't experimented with it myself.

reply
By default CUDA isn't deterministic because of thread scheduling.

The main difference comes from rounding order of reduction difference.

It does make a small difference. Unless you have an unstable floating point algorithm, but if you have an unstable floating point algorithm on a GPU at low precision you were doomed from the start.

reply