undefined

points

by londons_explore10 hours ago |

comments

by gibspaulding4 hours ago|

[-]

I think you might like this:

https://www.usenix.org/system/files/1309_14-17_mickens.pdf

by gruturo10 hours ago|

prev|

[-]

You could run an LLM like this, and the temperature parameter would become an actual thing...

by boznz5 hours ago|

parent|

[-]

Totally logical, especially with some sort of thermal mass, as you can throttle down the clock when quiet to cool down after, I used this concept in my first sci-fi novel where the AI was aware of its temperature for these reasons. I run my Pico2 board in my MP3 jukebox at 250Mhz, it has been on for several weeks without missing a beat (pun intended)

by tliltocatl3 hours ago|

parent|

prev|

[-]

LLM are memory-bandwidth bound so higher core frequency would not help much.

by 8 hours ago|

prev|

[-]

deleted

by ssl-37 hours ago|

prev|

[-]

How do we know if a computation is a mistake? Do we verify every computation?

If so, then:

That seems like it would slow the ultimate computation to no more than rate rate at which they can be these computations can be verified.

That makes the verifier the ultimate bottleneck, and the other (fast, expensive -- like an NHRA drag car) pipeline becomes vestigial since it can't be trusted anyway.

by moffkalast6 hours ago|

parent|

[-]

Well the point is that verification can run in parallel, so if you can verify at 500 Mhz and have twenty of these units, you can run the core at 10 GHz. Minus of course the fixed single instruction verification time penalty, which gets more and more negligible the more parallel you go. Of course there is lots of overhead in that too, like GPUs painfully show.

by ssl-36 hours ago|

parent|

[-]

Right.

So we have 20 verifiers running at 500MHz, and this stack of verifiers is trustworthy. It does reliably-good work.

We also have a single 10GHz CPU core, and this CPU core is not trustworthy. It does spotty work (hence the verifiers).

And both of these things (the stack of verifiers, the single CPU core) peak out at exactly the same computational speeds. (Because otherwise, the CPU's output can't be verified.)

Sounds great! Except I can get even better performance from this system by just skipping the 10GHz CPU core, and doing all the work on the verifiers instead.

("Even better"? Yep. Unlike that glitch-ass CPU core, the verifiers' output is trustworthy. And the verifiers accomplish this reliable work without that extra step of occasionally wasting clock cycles to get things wrong.

If we know what the right answer is, then we already know the right answer. We don't need to have Mr. Spaz compute it in parallel -- or at all.)

by firefly20004 hours ago|

parent|

[-]

If the workload were perfectly parallelizable, your claim would be true. However, if it has serial dependency chains, it is absolutely worth it to compute it quickly and unreliably and verify in parallel

by magicalhippo2 hours ago|

parent|

[-]

This is exactly what speculative decoding for LLMs do, and it can yield a nice boost.

Small, hence fast, model predicts next tokens serially. Then a batch of tokens are validated by the main model in parallel. If there is a missmatch you reject the speculated token at that position and all subsequent speculated tokens, take the correct token from the main model and restart speculation from that.

If the predictions are good and the batch parallelism efficiency is high, you can get a significant boost.

by firefly20002 hours ago|

parent|

[-]

I have a question about what "validation" means exactly. Does this process work by having the main model compute the "probability" that it would generate the draft sequence, then probabilistically accepting the draft? Wondering if there is a better method that preserves the distribution of the main model.

by ant6n1 hours ago|

parent|

prev|

[-]

You can verify in 100-way parallel and without dependence, but you can’t do it with general computation.

by moffkalast6 hours ago|

parent|

prev|

[-]

Haha, well do you have a point there. I guess I had the P!=NP kind of verification in my head, where it's easy to check if something is right, but not as easy to compute the result. If one could make these verifiers on some kind of checksum basis or something it might still make sense, but I'm not sure if that's possible.

by hulitu9 hours ago|

prev|

[-]

> if you are happy to lose reliability.

The only problem here is that reliability is a statistical thing. You might be lucky, you might not.

by hnuser1234567 hours ago|

prev|

[-]

Side channel attacks don't stand a chance!

by Avlin677 hours ago|

prev|

[-]

you never had WHEA errors... or pll issue on cpu C state transition...