It is objectively slow at around 100X slower than what most people consider usable.
The quality is also degraded severely to get that speed.
> but the point of this is that you can run cheap inference in bulk on very low-end hardware.
You always could, if you didn't care about speed or efficiency.
iPhone 17 Pro outperforms AMD’s Ryzen 9 9950X per https://www.igorslab.de/en/iphone-17-pro-a19-pro-chip-uebert...