(www.tomshardware.com)
Nothing since has packed nearly the impact with the exception of going from spinning disks to SSDs.
SSDs provided a huge bump in performance to each individual computer, but trickled their way into market saturation over a generation or two of computers, so you'd be effectively running the same software but in a much more responsive environment.
SSDs booted faster and launched programs faster and were a very nice change, but they weren't that same sort of night-and-day 80s/90s era change.
The software, in those days, was similarly making much bigger leaps every few years. 256 colors to millions, resolution, capabilities (real time spellcheck! a miracle at the time.) A chat app isn't a great comparison. Games are the most extreme example - Sim City to Sim City 2000; Doom to Quake; Unreal Tournament to Battlefield 1942 - but consider also a 1995 web browser vs a 1999 one.
The drag down of swapping became almost a non-issue with the SSD changeover.
I suppose going from a //e to a IIgs was that kind of leap but that was more about the whole computer than a cpu.
Now I have to say, swapping to an SSD on my windows machines at work was far less impressive than going to SSD with my macs. I sort of wrote that off as all the anivirus crap that was running. It was very disappointing compared to the transformation on mac. On my macs it was like I suddenly heard the hallelujah chorus when I powered on.
Also, going from Sim City to Sim City 2000 was pre-bloat. Over the course of five years, the new version was significantly better than the original, but they both target the same 486 processor generation, which was brand new when the original SimCity was released, but rather old by the time SimCity 2000 was released. Another five years later, Sim City 3000 added minimal functionality, but required not just a Pentium processor, but a fast one.
I guess what I'm getting at is that a faster CPU means programs released after it will run better, but faster storage means that all programs, old and new, will run better.
I think there's a difference between bloat and actually useful features or performance.
For example, I started making music with computers in the early 90s. They were only powerful enough to control external equipment like synthesizers.
Nowadays, I can do everything I could do with all that equipment on an iPad! I would not call that bloat.
On the other hand, comparing MS Teams to say ICQ, yeah, a lot of that is bloat.
Tell that to ScreamTracker!
And we were mostly ripping those samples from records on cassettes and CDs, or other mods.
https://www.c64-wiki.de/images/f/f1/rockmon3.png
Or also at https://www.youtube.com/watch?v=roBkg-iPrbw&t=400s in the video already linked below. And yes, I had to type in that listing.
For me they were.
I still remember the first PC I put together for someone with a SSD.
I had a quite beefy machine at the time and it would take 30 seconds or more to boot Windows, and around 45s to fully load Photoshop.
Built this machine someone with entirely low-end (think like "i3" not "Celeron") components, but it was more than enough for what they wanted it for. It would hit the desktop in around 10 seconds, and photoshop was ready to go in about 2 seconds.
(Or thereabouts--I did time it, but I'm remembering numbers from like a decade and a half ago.)
For a _lot_ of operations, the SSD made an order of magnitude difference. Blew my mind at the time.
So it was the only way to get that visceral improvement in user experience like CPU and platform upgrades were in the mid 90's to very early 00's.
The experience of just slapping a new SSD in a 3 year old machine was similar to a different generation of computer nerds.
Nothing could really match the night and day difference of an entire machine being double to triple the performance in a single upgrade though. Not even the upgrade from spinning disks to SSD. You'd go from a game being unplayable on your old PC to it being smooth as butter overnight. Not these 20% incremental improvements. Sure, load times didn't get too much better - but those started to matter more when the CPU upgrades were no longer a defining experience.
Would you take the SSD and a 500Mhz processor or a 2Ghz dual-core with a 7200k or 10000k HD? "Some operations are faster" vs "every single thing is wildly faster" of the every-few-years quadrupling+ of CPU perf, memory amounts, disk space, etc.
(45sec to load Photoshop also isn't tracking with my memory, though 30s-1min boot certainly is, but I'm not invested enough to go try to dig up my G4 PowerBook and test it out... :) )
Never witnessed anything before or after with that jump in specs
I'd say software never really "caught up" to the general slowness that we had to endure in the HDD era either. Even my 14 year old desktop starts Word in a few seconds compared to upwards of 60s in the 90s.
The closest I've seen is the shitty low end Samsung Android tablet we got for our kids. It's soooo slow and laggy. I suspect it's the storage. And that was actually and upgrade over the Amazon Fire tablet we used to have which was so slow it was literally unusable. Again I suspect slow storage is the culprit.
The only thing more impressive that hardware engineers' delivering continuous massive performance improvements for the past several decades is software engineers' ability to completely erase that with more and more bloated programs to do essentially the same thing.
One of the co-signers of the Agile Manifesto had previously stated that "The best way to get the right answer on the Internet is not to ask a question; it's to post the wrong answer." (https://en.wikipedia.org/w/index.php?title=Ward_Cunningham#L...) I'm convinced that the Agile Manifesto was an attempt to make an internet post of the most-wrong way to manage a software projects, in hopes someone would correct it with the right answer, but instead it was adopted as-is.
I feel this. Humanity has peaked.
Nowadays, you really don't get these magical moments when you upgrade, not on the device itself. The upgrade from Windows 10 to Windows 11 was basically just more ads. Games released today look about as good as games released 5-10 years ago. The music-making or photo-editing program you installed back then is still good. Your email works the same as before. In fact, I'm not sure I have a single program on my desktop that feels more capable or more responsive than it did in 2016.
There's some magic with AI, but that's all in the cloud.
Windows 11, Discord: 4GB are not enough to run it well.
FYI, Kopete allowed inline LaTeX, Youtube videos (low res, ok, 480p maybe, but it worked), emoticos, animations, videoconference, themes, maybe basic HTML tags and whatnot. And it ran fast.
"Bananas" core-counts gave me the same experience. Some year ago I moved to Ryzen Threadripper and experienced similar "Wow, compiling this project is now 4x faster" or "processing this TBs of data is now 8x faster", but of course it's very specific to specific workloads where concurrency and parallism is thought of from the ground up, not a general 2x speed up in everything.
About a week ago, completely out of the blue, YouTube recommended this old gem to me: https://www.youtube.com/watch?v=z0jQZxH7NgM
A Pentium 4, overclocked to 5GHz with liquid nitrogen cooling.
Watching this was such an amazing throwback. I remember clearly the last time I saw it, which was when an excited friend showed it to me on a PC at our schools library. A year or so before YouTube even existed.
By 2005, my Pentium 4 Prescott at home had some 3.6GHz without overclocking, 4GHz models for the consumer market were already announced (but plagued by delays), but surely 10GHz was "just a few more years away".
But with longer pipelines comes larger penalties when the pipeline needs to be flushed, so the P4 eventually hit a wall and Intel returned to the late Pentium 3 Tualatin core, refining it into the Pentium M which later evolved into the first Core CPUs.
https://www.tomshardware.com/pc-components/cpus/core-i9-1490...
Between IPC (~50 to 100-fold improvement) and clock speed increases (1000-fold alone), I estimated that single-thread performance has increased on the order of 50,000x - 100,000x since the 4.77 MHz 8088.
In human terms this is like one minute compared to one month!
But that was several years after the book cited by the GP was published (1994, shortly after the release of the original Pentium).
It took a long time before I felt a need to improve my PC's performance again after that.
I remember loading up Doom, plugging my shitty earplugs that had a barely long enough cable and hearing the “real” shotgun sound for the first time. Oo-wee
I didn't feel any huge speed boosts like that until the M1 MacBook in 2020.
I can see why you wouldn’t consider it as impactful if you weren’t into gaming at the time.
Up until the 486, the clock speed and bus speed were basically the same and topped out at about 33MHz (IIRC). The 486 started the thing of making the CPU speed a multiple of the bus speed eg 486dx2/66 (33MHz CPU, 66MHz bus), 486dx4/100 (25MHz CPU, 100MHz bus). And that's continued to this day (kind of).
But the point is the CPU became a lot faster than the IO speed, including memory. So these "overdrive" CPUs were faster but not 2-4x faster.
Also, in terms of impact, yeah there was a massive incrase in performance through the 1990s but let's not forget the first consumer GPUs, namely 3dfx Voodoo and later NVidia and ATI. Oh, Matrox Millenium anyone?
It's actually kind of wild that NVidia is now a trillion dollar company. It listed in 1998 for $12/share and adjusted for splits, Google is telling me it's ~3700x now.
The Apple Silicon chassis was allowed to finally house an appropriate cooling solution, too. They are much quieter than the same Intel laptops when dissipating the same power levels.
Apple’s power efficiency was a great bump forward, but the performance claims were a little exaggerated. I love my Apple Silicon devices but I still switch over to a desktop for GPU work because it’s so much faster, for example.
Apple had that famously misleading chart where they showed their M1 GPU keeping pace with a flagship nVidia card that misled everyone at launch. In practice they’re not even close to flagship desktop accelerators, unfortunately.
They have excellent idle power consumption though. Great for a laptop.
We had a hand-me-down DEC x86 desktop at home with a Pentium II running at 233 MHz until I want to say 2002? This was around the time I learned how to build a PC since doing that was cheaper than buying one and no-one in my family had the money for that!
I saved whatever money I could to buy a 128MB stick of RAM from Staples (maybe it was 256MB?), a few other things from TigerDirect/Newegg and _this processor_. With some help from my uncle and a guide I printed from somewhere whose website started with '3D' (it was quite popular back then; I don't think it exists anymore), I got it done.
Going from 233 MHz to this was like going from walking to flying in a jet! Everything was SO MUCH F**ING FASTER. Windows XP _flew_. (The DEC barely made the minimum requirements for it, and boy did I feel it.) Trying to install Longhorn on it a year or two later brought me back into walking again, though. :D
I remember my teen years, doing odd jobs to get some cash, buying a part at a time until the build was complete. Worrying that if you didn't scrap together enough parts soon there may be an architecture change. Finally getting it all together and the feeling of pure bliss installing the OS, troubleshooting drivers, installing this or that. Good times.
They were both "seventh generation" according to their marketing, but you could get an entire GHz+ Athlon XP machine for much less than half the $990 tray price from the article.
I distinctly remember the day work bought a 5 or 6 node cluster for $2000. (A local computer shop gave us a bulk discount and assembled it for them, so sadly, I didn't poke around inside the boxes much.)
We had a Solaris workstation that retailed for $10K in the same office. Its per-core speed was comparable to one Athlon machine, so the cluster ran circles around it for our workload.
Intel was completely missing in action at that point, despite being the market leader. They were about to release the Pentium 4, and didn't put anything decent out from then to the Core 2 Duo. (The Pentium 4 had high clock rates, but low instructions per cycle, so it didn't really matter. Then AMD beat Intel to market with 64 bit support.)
I suspect history is in the process of repeating itself. My $550 AMD box happily runs Qwen 3.5 (32B parameters). An nvidia board that can run that costs > 4x as much.
That same article also says that extending x86 to 64 bits "wasn't hard", which I'm not so sure about. There are plenty of mistakes AMD could have made and cleanups they could have missed, but they handled it all quite well AFAICS.
All the later CMOS fabrication processes, starting with the 90-nm process (in 2004), have provided only very small improvements in the clock frequency, so that now, 23 years later after 2003, the desktop CPUs have not reached a double clock frequency yet.
In the history of computers, the decade with the highest rate of clock frequency increase has been 1993 to 2003, during which the clock frequency has increased from 67 MHz in 1993 in the first Pentium, up to 3.2 GHz in the last Northwood Pentium 4. So the clock frequency had increased almost 50 times during that decade.
For comparison, in the previous decade, 1983 to 1993, the clock frequency in mass-produced CPUs had increased only around 5 times, i.e. at a rate about 10 times slower than in the next decade.
I'd argue you'd need to use AMD's Athlon XP or 64 bit processors, or either Pentium 3 / Core 2 Duo to figure out when clock speeds stopped increasing.
But, we can be slightly less pessimistic if we’re more specific. Already by the early 90’s, a lot of the clock speed increase came from strategies like pipelines, superscalar instructions, branch prediction. Instruction level parallelism. Then in 200X we started using additional parallelism strategies like multicore and SMT.
It isn’t a meaningless distinction. There’s a real difference between parallelism that the compiler and hardware can usually figure out, and parallelism that the programmer usually has to expose.
But there’s some artificiality to it. We’re talking about the ability of parallel hardware to provide the illusion of sequential execution. And we know that if we want full “single threaded” performance, we have to think about the instruction level parallelism. It’s just implicit rather than explicit like thread-level parallelism. And the explicit parallelism is right there in any modern compiler.
If the syntax of C was slightly different, to the point where it could automatically add OpenMP pragmas to all it’s for loops, we’d have 30GHz processors by now, haha.
It's not quite apples-to-apples, of course, due to floating point precision decreasing since then, vectorization, etc, but it's not like progress stopped in 2000!
I was in high school and was running a "computer games club" (~ Internet cafe for games and kids) since 1998 when we got a place, renovated it ourselves, got custom built furniture (cheap narrow desks) and initially 6 computers - AMDs at 300Mhz. By 2000 we broke a wall in the adjacent space and had ~15, cable + satellite internet for downloads and whatever video cards we could buy or scrap. It was wild.
Nah.. Cassettes, computers-in-a-keyboard, booting straight into BASIC.. THIS is where it all started, grandkids.
It could be done if either silicon will be replaced with another semiconductor or semiconductors will be replaced with something else for making logical gates, e.g. with organic molecules, to be able to design a logical gate atom by atom.
For the first variant, i.e. replacing silicon with another semiconductor, research is fairly advanced, but this would increase the fabrication cost so it will be done only when any methods for further improvements of silicon integrated circuits will become ineffective or too expensive, which is unlikely to happen earlier than a decade from now.
For the latter case, 6 GHz has been barely reached, in CPUs that cannot be produced in large quantities and whose reliability is dubious.
It's simply impossible at room temperatures without extreme cooling.
Also you will run into interconnect speed issues, since 10GHz corresponds to .1 nanoseconds which corresponds to 3 centimeters (assuming lightspeed, in reality this is lower).
So sadly, we'll be stuck in this "clock-speed winter" for a little longer.
Some neat startups to watch for in this space.
Newer process nodes decrease the per-gate capacitance, increasing the optimal operating frequency.
Maybe reversible computing will help unlock several more orders of magnitude of growth.
The current direction of adding more cores makes more sense, since this is really what CPU intensive programs generally need - more parallelism.
Their purpose is to provide parallel execution at a lower cost in die area and at a better energy efficiency than by multiplying the number of cores. For instance, having 16 cores with 8-wide vector execution units provides the same throughput as 128 cores, but at a much lower power consumption and at a much smaller die area. However, both structures need groups of 128 independent operations every clock cycle, to keep busy all execution units.
The terms "single-thread" performance vs. "multi-threaded" performance are not really correct.
What matters is the 2 performance values that characterize a CPU when executing a set of independent operations vs. executing a set of operations that are functionally-dependent, i.e. the result of each operation is an operand for the next operation.
When executing a chain of dependent operation, the performance is determined by the sum of the latencies of the operations and it is very difficult to improve the performance otherwise than by raising the clock frequency.
On the other hand, when the operations are independent, they can be executed concurrently and with enough execution units the performance may be limited only by the operation with the longest duration, no matter how many other operations are executed in parallel.
For parallel execution, there are many implementation methods that are used together, because for most of them there are limits for the maximum multiplication factor, caused by constraints like the lengths of the interconnection traces on the silicon die.
So some of the concurrently executed operations are executed in different stages of an execution pipeline, others are executed in different execution pipelines (superscalar execution), others are executed in different SIMD lanes of a vector execution pipeline, others are executed in different CPU cores of the same CPU complex, others are executed in different CPU cores that are located on separate dies in the same package, others are executed in CPU cores located in a different socket in the same motherboard, others in CPU cores located in other cases in the same rack, and so on.
Instead of the terms "single-thread performance" and "multi-threaded performance" it would have been better to talk about performance for dependent operations and performance for independent operations.
There is little if anything that can be done by a programmer to improve the performance for the execution of a chain of dependent instructions. This is determined by the design and the fabrication of the CPU.
On the other either the compiler or the programmer must ensure that the possibility of executing operations in parallel is exploited at the maximum extent possible, by using various means, e.g. creating multiple threads, which will be scheduled on different CPU cores, using the available SIMD instructions and interleaving any chains of dependent instructions, so that the adjacent instructions will be independent and they will be executed either in different pipeline stages or in different execution pipelines. Most modern CPUs use out-of-order execution, so the exact order of interleaved dependent instructions is not critical, because they will be reordered by the CPU, but some interleaving done by the compiler or by the programmer is still necessary, because the hardware uses a limited instruction window within which reordering is possible.
To browse the web is debatable. But for svchost.exe, Teams, Office 365 and Notepad, you definitely need one. /s
Programming is a lost art.
Couple a modern AAA title (like Battlefield 6, etc.) with a proper Atmos sound system and you will likely be pretty amazed. Even a simple 5.1 setup is pretty decent for hearing footsteps behind you/etc. which actually does help with gameplay.
I haven't kept up on it as my computer gaming area doesn't lend itself towards a proper speaker setup these days, but playing with headphones on lately has made me start to look into this again. I need to find some high quality tiny cube speakers or something to be able to put in weird spots on the ceilings/walls.
The speed was nice, and some competition helped lower prices.
It was the workstation on which I learned Logic Audio before, you know, Apple bought Emagic. I took that machine, running very low latency Reason to live gigs with my band.
Carting around a full-tower computer (not to mention the large CRT monitor we needed) next to a bunch of tube Fender & Ampeg amps was wild at the time. Finding a good drummer was hard; we turned that challenge into a lot of fun programming rhythm sections we could jam to, and control in real-time, live.
Fun fact #1: many today may not know that the only reason switched to the Pentium name was because a court ruled that they couldn't trademark a number and AMD had cross-licensed the microarchitecture and instruction set to AMD and Cyrix.
It was the Pentium 4 when clock speeds went insane and became a huge marketing point even though Pentium chips had lower IPC than Athlons (at that time). There was a belief that CPUs would keep going to 10GHz+. Instead they hit a ceiling at about ~3GHz, that's barely increased to this day (ignoring burst modes).
Intel originally intended to move workstations and servers to the EPIC architecture (eg Merced was an early chip in this series). This began in the 1990s but was years delayed and required writing software a very particular way. It never delievered on its promise.
And AMD, thanks to the earlier cross-licensing agreement, just ate Intel's lunch with the Athlon 64 starting in 2003 by adding the x86_64 instructions, which we still use today.
Fun Fact #2: it was the Pentium 3 that saved Intel's hide long after it was discontinued in favor of the Pentium 4.
The early 2000s were the nascent era of multi-core CPUs. The Pentium 3 had survived in mobile chips and become the Pentium-M and then the Core Duo (and Core 2 Duo later). This was the Centrino platform and included wireless (IIRC 802.11b/g). The Pentium 4 hit the Gigahertz ceiling and EPIC wasn't going to happen to Intel went back to the drawing board, revived the mobile Pentium-3 platform, adding AMD's 64 bit instructions and released their desktop CPUs. Even modern Intel CPUs are in many ways a derivation of the Pentium-3 [1].
[1]: https://en.wikipedia.org/wiki/List_of_Intel_Core_processors
The GHz barrier wasn't special. What was much more important was the fact that AMD was giving Intel a hard time and there was finally hard competition.
In reality, of course what you say is true and the fact that Athlon could previde a few extra hundreds of MHz in the clock frequency was not decisive.
Athlon had many improvements in microarchitecture in comparison with Pentium III, which ensured a much better performance even at equal clock frequency. For instance, Athlon was the first x86 CPU that was able to do both a floating-point multiplication and a floating-point addition in a single clock cycle. Pentium III, like all previous Intel Pentium CPUs required 2 clock cycles for this pair of operations.
This much better floating-point performance of Athlon vs. Intel contrasted with the previous generation, where AMD K6 had competitive integer performance with Intel, but its floating-point performance was well below that of the various Intel Pentium models (which had hurt its performance in some games).