upvote
That's not going to happen, but there's alternative research such as [1] where we get rid of the clock and use self-timed circuits.

[1]:https://arc.cecs.pdx.edu/

reply
Like any doubling rule, the buck has to stop somewhere. Higher energy usage + smaller geometry means much more exotic analog physics to worry about in chips. I’m not a silicon engineer by any means but I’d expect 10Ghz cycles will be optical or very exotically cooled or not coming at us at all.
reply
Reaching 10 GHz for a CPU will never be done in silicon.

It could be done if either silicon will be replaced with another semiconductor or semiconductors will be replaced with something else for making logical gates, e.g. with organic molecules, to be able to design a logical gate atom by atom.

For the first variant, i.e. replacing silicon with another semiconductor, research is fairly advanced, but this would increase the fabrication cost so it will be done only when any methods for further improvements of silicon integrated circuits will become ineffective or too expensive, which is unlikely to happen earlier than a decade from now.

reply
Overclockers are pretty close.
reply
What can be done by raising the power consumption per core to hundreds of watt, while cooling the package with liquid nitrogen, is completely irrelevant for what can be done with a CPU that must operate reliably for years and at an acceptable cost for the energy consumption.

For the latter case, 6 GHz has been barely reached, in CPUs that cannot be produced in large quantities and whose reliability is dubious.

reply
Having RAM read / write faster will be of way more benefit
reply
There have been overclockers who reached 9GHz using liquid helium.

It's simply impossible at room temperatures without extreme cooling.

Also you will run into interconnect speed issues, since 10GHz corresponds to .1 nanoseconds which corresponds to 3 centimeters (assuming lightspeed, in reality this is lower).

So sadly, we'll be stuck in this "clock-speed winter" for a little longer.

reply
None for normal.compute, since energy density is still fundamental. But the interesting option is cryogenic computing, which can have zero switching energy, and 10s of GHz clock rates

Some neat startups to watch for in this space.

reply
The energy consumed is cv^2f. It makes no sense to keep increasing frequency as you make power way worse.
reply
At lower frequencies, leakage current plays a larger role than gate capacitance, so for any given process node, there's a sweet spot. For medium to low loads, it takes less power to rapidly switch between cutting off power to a core, and running at a higher frequency than is needed, than to run at a lower frequency.

Newer process nodes decrease the per-gate capacitance, increasing the optimal operating frequency.

reply
So heat. There’s efforts to switch to optics which don’t have that heat problem so much but have the problem that it’s really hard to build an optical transistor. + anywhere your interfacing with the electrical world you’re back to the heat problem.

Maybe reversible computing will help unlock several more orders of magnitude of growth.

reply
What would be the benefit? You don't need a 10GHz processor to browse the web, or edit a spreadsheet, and in any case things like that are already multi-threaded.

The current direction of adding more cores makes more sense, since this is really what CPU intensive programs generally need - more parallelism.

reply
Because someone decided to write all the software in javascript and python, which don't benefit from the added cores.
reply
Single core speed is absolutely a thing that is needed and preferred to multicore. That's why we have avx, amx, etc.
reply
Meh, avx is also just parallelism. That won't get you around Amdahl's law.
reply
Not sure what you mean. It lets you do 64 operations with one instruction. Where's the diminishing returns?
reply
Vector or matrix instructions do not improve single-thread speed in the correct meaning of this term, because they cannot improve the speed of a program that executes a sequence of dependent operations.

Their purpose is to provide parallel execution at a lower cost in die area and at a better energy efficiency than by multiplying the number of cores. For instance, having 16 cores with 8-wide vector execution units provides the same throughput as 128 cores, but at a much lower power consumption and at a much smaller die area. However, both structures need groups of 128 independent operations every clock cycle, to keep busy all execution units.

The terms "single-thread" performance vs. "multi-threaded" performance are not really correct.

What matters is the 2 performance values that characterize a CPU when executing a set of independent operations vs. executing a set of operations that are functionally-dependent, i.e. the result of each operation is an operand for the next operation.

When executing a chain of dependent operation, the performance is determined by the sum of the latencies of the operations and it is very difficult to improve the performance otherwise than by raising the clock frequency.

On the other hand, when the operations are independent, they can be executed concurrently and with enough execution units the performance may be limited only by the operation with the longest duration, no matter how many other operations are executed in parallel.

For parallel execution, there are many implementation methods that are used together, because for most of them there are limits for the maximum multiplication factor, caused by constraints like the lengths of the interconnection traces on the silicon die.

So some of the concurrently executed operations are executed in different stages of an execution pipeline, others are executed in different execution pipelines (superscalar execution), others are executed in different SIMD lanes of a vector execution pipeline, others are executed in different CPU cores of the same CPU complex, others are executed in different CPU cores that are located on separate dies in the same package, others are executed in CPU cores located in a different socket in the same motherboard, others in CPU cores located in other cases in the same rack, and so on.

Instead of the terms "single-thread performance" and "multi-threaded performance" it would have been better to talk about performance for dependent operations and performance for independent operations.

There is little if anything that can be done by a programmer to improve the performance for the execution of a chain of dependent instructions. This is determined by the design and the fabrication of the CPU.

On the other either the compiler or the programmer must ensure that the possibility of executing operations in parallel is exploited at the maximum extent possible, by using various means, e.g. creating multiple threads, which will be scheduled on different CPU cores, using the available SIMD instructions and interleaving any chains of dependent instructions, so that the adjacent instructions will be independent and they will be executed either in different pipeline stages or in different execution pipelines. Most modern CPUs use out-of-order execution, so the exact order of interleaved dependent instructions is not critical, because they will be reordered by the CPU, but some interleaving done by the compiler or by the programmer is still necessary, because the hardware uses a limited instruction window within which reordering is possible.

reply
You technically don't even need a 300MHz processor for the use cases that you name. But Intel and others kept developing faster CPUs anyway.
reply
For parallelism we already have SIMD units like AVX and well... GPUs. CPUs need higher single thread speeds for tasks that simply cannot make effective use of it.
reply
> You don't need a 10GHz processor to browse the web, or edit a spreadsheet,

To browse the web is debatable. But for svchost.exe, Teams, Office 365 and Notepad, you definitely need one. /s

Programming is a lost art.

reply