Why not? I understand not wanting to deal with unnecessary complexity as a hobbyist, but you'll find yourself creating far more complexity trying to implement all of this yourself (and vendors certainly don't want to support you in this). Secondly, I think the number of customers for chip vendors who are uncomfortable with setting up an embedded Linux environment, but perfectly confident in routing DDR and PCIe signals is approximately 0.
> What I am trying to point out is that there is a huge market gap.
> i.MX8 is not realtime and the support for running bare metal code is very much non existent.
This isn't quite true and is what I'm trying to get at. Most of these embedded SoCs contain a Cortex-M and a Cortex-A (not all but there are quite a lot). High performance DRAM, external PCIe devices, and large internal caches are fantastic for compute performance but most of the things you want to do with a PCIe device (networking, asynchronous compute) don't require cycle-accurate determinism. Generally there isn't much you need to do with such stringent timing requirements, so you can offload that work to secondary Cortex-M33 core with a shared memory interface to the main core and get the best of both worlds.
I see so many systems trying to take advantage of the impressive compute power of modern MCUs (which is really cool!) but often end up just re-inventing the cooperative multitasking OS, but worse.
Looking at e.g. STM32H755: 1x Cortex-M7 480 MHz, 1x Cortex-M4 240 MHz, USB2, 100 Mbit Ethernet, DAC.
Comparing to AM6421: 1x Cortex-A53 1GHz, 2x Cortex-R5F 800MHz, 1x Cortex-M4F 400MHz, 2x PRU, USB3, 1 Gbit Ethernet, PCIe Gen2.
I can hardly believe STM32H755 microcontroller is almost so costly as AM6421 SoC.
For example the AMD Xilinx UltraScale+, like in the AMD Kria modules and development kits (3-digit prices), include some Cortex-R5 cores, which provide deterministic operation, like Cortex-M.
Cortex-R5 are somewhat slower than Cortex-M7 at the same clock frequency, but they are available at a higher clock frequency than many Cortex-M7 implementations.
If you can implement some custom peripherals in the FPGA logic array, then you can obtain much higher performance than with a microcontroller alone.
So they are similar with an older Raspberry Pi and they have far more computational power than a Cortex-M7 or Cortex-M85 CPU, even if they are very slow in comparison with modern Cortex-A7x or Cortex-A7xx cores.
I have never heard of any FPGA containing better CPU cores than Cortex-A78, but even those with Cortex-A78 are extremely expensive, so they may be worthwhile only for their FPGA part, not for a CPU that is much slower than cheaper alternatives.
The same is true even for the cheaper modules with UltraScale+ FPGAs, like AMD Kria, which cost as much as one of the cheaper mini-PCs with a much faster Intel or AMD CPU, so they are worthwhile only if you can implement in the FPGA an essential part of the functionality.
There is however another advantage of the FPGAs with ARM cores, besides implementing fast peripherals with hard real-time requirements.
Unlike with most non-microcontroller ARM CPUs where the vendor keeps secret various things, including the boot loader, so you cannot be absolutely certain about what the vendor does, because ARM has followed the example of Intel and has introduced a potential Trojan horse in its CPUs, i.e. an execution mode controlled by the vendor, which is more privileged than even a hypervisor, in the FPGAs with ARM cores you have complete documentation and absolute control over what the CPU does, so you could implement with greater confidence some devices for which security is important.