The 25MHz number I cite as the performance expectation is "relaxed": I don't want to set unrealistic expectations on the core's performance, because I want everyone to have fun and be happy coding for it - even relatively new programmers.
However, with a combination of overclocking and optimization, higher speeds are definitely on the horizon. Someone on the Baochip Discord thought up a clever trick I hadn't considered that could potentially get toggle rates into the hundreds of MHz's. So, there's likely a lot to be discovered about the core that I don't even know about, once it gets into the hands of more people.
On rp2350 it is pio (wait for clock) -> pio (read address bus) -> dma (addr into lower bits of dma source for next channel) -> dma (Data from SRAM to PIO) -> pio (write data to data bus) chain and it barely keeps up.