Here is an idea for a CPU designer...
Observe that you can get way more performance (increased clock speed) or more performance per watt (lower core voltage) if you are happy to lose reliability.
Also observe that many CPU's do superscalar out of order execution, which requires having the ability to backtrack, and this is normally implemented with a queue and a 'commit' phase.
Finally, observe that verifying this commit queue is a fully parallel operation, and therefore can be checked slower and in a more power efficient way.
So, here's the idea. You run a blazing fast superscalar CPU, well past the safe clock speed limits that makes hundreds of computation or flow control mistakes per second. You have slow but parallel verification circuitry to verify the execution trace. Whenever a mistake is made, you put a pipeline bubble in the main CPU, clear the commit queue, you put in the correct result from the verification system, and continue - just like you would with a branch misprediction.
This happening a few hundred times per second will have a negligible impact on performance. (consider 100 cycles 'reset' penalty, 100*100 is a tiny fraction of 4Ghz)
The main fast CPU could also make deliberate mistakes - for example assuming floats aren't NaN, assuming division won't be by zero, etc. Trimming off rarely used logic makes the core smaller, making it easier to make it even faster or more power efficient (since wire length determines power consumption per bit).
If so, then:
That seems like it would slow the ultimate computation to no more than rate rate at which they can be these computations can be verified.
That makes the verifier the ultimate bottleneck, and the other (fast, expensive -- like an NHRA drag car) pipeline becomes vestigial since it can't be trusted anyway.
So we have 20 verifiers running at 500MHz, and this stack of verifiers is trustworthy. It does reliably-good work.
We also have a single 10GHz CPU core, and this CPU core is not trustworthy. It does spotty work (hence the verifiers).
And both of these things (the stack of verifiers, the single CPU core) peak out at exactly the same computational speeds. (Because otherwise, the CPU's output can't be verified.)
Sounds great! Except I can get even better performance from this system by just skipping the 10GHz CPU core, and doing all the work on the verifiers instead.
("Even better"? Yep. Unlike that glitch-ass CPU core, the verifiers' output is trustworthy. And the verifiers accomplish this reliable work without that extra step of occasionally wasting clock cycles to get things wrong.
If we know what the right answer is, then we already know the right answer. We don't need to have Mr. Spaz compute it in parallel -- or at all.)
The only problem here is that reliability is a statistical thing. You might be lucky, you might not.
It wouldn't be surprising if the RP2350 gets officially certified to run at something above the max supported clock at launch (150MHz), though obviously nothing close to 800MHz. That happened to the RP2040[1], which at launch nominally supported 133MHz but now it's up to 200MHz (the SDK still defaults to 125MHz for compatibility, but getting 200MHz is as simple as toggling a config flag[2]).
[1] https://www.tomshardware.com/raspberry-pi/the-raspberry-pi-p...
[2] https://github.com/raspberrypi/pico-sdk/releases/tag/2.1.1
Around 20 bucks for the Wifi variant. 1GHz, 256MB RAM, USB OTG, GPIO and full Linux support while drawing less than 1W without any power optimizations and even supports < 15$ 2.8" LCDs out of the box.
And Rust can be compiled to be used with it...
https://github.com/scpcom/LicheeSG-Nano-Build/
Take a look at the `best-practise.md`.
It is also the base board of NanoKVM[1]
no such thing, 5V tolerant buffers will run you more than that
I'm currently prototyping a tiny portable audio player[1] which battery life could benefit a lot from this.
That said: it's a bit sad there's so little (if anything) in the space between microcontrollers & feature-packed Linux capable SoC's.
I mean: these days a multi-core, 64 bit CPU & a few GB's of RAM seems to be the absolute minimum for smartphones, tablets etc, let alone desktop style work. But remember ~y2k masses of people were using single core, sub-1GHz CPU's with a few hundred MB RAM or less. And running full-featured GUI's, Quake1/2/3 & co, web surfing etc etc on that. GUI's have been done on sub-1MB RAM machines once.
Microcontrollers otoh seem to top out on ~512KB RAM. I for one would love a part with integrated: # Multi-core, but 32 bit CPU. 8+ cores cost 'nothing' in this context. # Say, 8 MB+ RAM (up to a couple hundred MB) # Simple 2D graphics, maybe a blitter, some sound hw etc # A few options for display output. Like, DisplayPort & VGA.
Read: relative low-complexity, but with the speed & power efficient integration of modern IC's. The RP2350pc goes in this direction, but just isn't (quite) there.
The PIO units on the RP2040 are... overrated. Very hard to configure, badly documented and there's only 8 total. WS2812 control from the Pico is unreliable at best in my experience.
Eventually it will be seen as a feature.
I recently turned turbo off on a small, lightly loaded Intel server. This reduced power by about a factor of 2, core temperature by 30-40C, and allowed running the fans much quieter. I’m baffled as to why the CPU didn’t do this on its own. (Apple gets these details right. Intel, not so much.)
This is a boring NVR workload with a bit of GPU usage, with total system utilization around 10% with turbo off. Apparently the default behavior is to turbo from the normal ~3GHz up to 5.4GHz, and I don’t know why the results were quite so poor.
This is an i9-13900H (Minisforum MS-01) machine, so maybe it has some weird tuning for gaming workloads? Still seems a bit pathetic. I have not tried monitoring the voltages with turbo on and off to understand exactly why it’s performing quite so inefficiently.
I bet if you designed a custom board it could do a little better
Credit where it's due: Mike is a wizard. He's been involved in some of our more adventurous tinkering, and his input on the more complex areas of our product software has been invaluable. Check out his GitHub for some really interesting projects: https://github.com/MichaelBell
Blatant plug: We have a wide range of boards based on the RP2350 for all sorts of projects! https://shop.pimoroni.com/collections/pico :-)
They're unstable enough at stock if taken outside an air conditioned room.