Hacker News
new
past
comments
ask
show
jobs
points
by
embedding-shape
5 hours ago
|
comments
by
tatef
3 hours ago
|
next
[-]
Hypura reads tensor weights from the GGUF file on NVMe into RAM/GPU memory pools, then compute happens entirely in RAM/GPU.
There is no writing to SSDs on inference with this architecture.
reply
by
embedding-shape
2 hours ago
|
parent
|
[-]
Even if there was a ton of writing, I'm not sure where NVMe even comes in the picture, write durability is about the flash cells on SSDs, nothing to do with the interface, someone correct me if I'm wrong.
reply
by
hrmtst93837
2 hours ago
|
prev
|
next
[-]
People talk about "SSD endurance", but enough parallel I/O on M1/M2 can make the NVMe controller choke, with very weird latncy spikes.
reply
by
Insanity
4 hours ago
|
prev
|
[-]
I had assumed heat generation on the controller if it's continuously reading. But maybe it's not actually bad.
reply
by
throwway120385
4 hours ago
|
parent
|
[-]
Just pop a heatsink on it and call it good.
reply