upvote
The extra L1 cache from a 64k page is on it's own a ~5-10% perf improvement (and it decreases power use by reducing the number of times you go out to L2.
reply
Funny, most of what you described sums up the Alpha architecture. 8KB pages + huge pages and, initially, only word-addressable memory, no byte access.

(Of course, it only took a few years for this to be rectified with the byte-word extension, which became required by ~all "real software" that supported Alpha)

It's also one of the only architectures Windows NT supported that didn't have 4KB pages, along with Itanium. I've wondered how (or if?) it handled programs that expect 4KB pages, especially in the x86 translation subsystem.

reply