That C64 demo doing sprite wizardy and 8088MPH comes to my mind. The latter one, as you most probably know, can't be emulated since it (ab)uses hardware directly. :D
As a trivia: After watching .the .product, I declared "if a computer can do this with a 64kB binary, and people can make a computer do this, I can do this", and high performance/efficient programming became my passion.
From any mundane utility to something performance sensitive, that demo is my northern star. The code I write shall be as small, performant and efficient as possible while cutting no corners. This doesn't mean everything is written in assembly, but utmost care is given how something I wrote works and feels while it's running.