upvote
> The kernel's page fault handler is reactive — it doesn't know you're about to read layer 47's FFN weights, so it can't prefetch.

man 2 madvise

reply
That works for readahead but it's not good for random access. readv, aio, dispatch_io are better there.
reply
This claim is a bit apples and oranges (no pun intended!). madvise is all about providing hints to the kernel to tune the page cache and readahead (including possibly disabling readahead altogether). it's not about performing reads into private memory buffers, which is actually where the options you mentioned fit in.
reply
That assumes you have significant work to do between fetches (so you can prefetch while using the current data). With LLM decode you don't.
reply
deleted
reply