That doesn’t match my experience or the numbers:
Plausibly, take all the Nvidia hype and multiply that by a factor and that's what 'Groq' could be worth.
And there is no real commodification - there's Nvidia, Cerebras, Groq ... not many otheres.
They're not really competing with Nvidia because 1) Nvidia owns their chips now, and 2) Nvidia is not really an inference provider.
Nvidia doesn't own them or all their IP now, we don't quite know the terms of the deal.
Was this comment created using quantized llama 3?
I love Groq, but across every single line break in your post there is a glaring issue that is easy to refute with in 15 seconds, even without 300t/s of throughput.
Groq is more performant for the growing categories of inference-based tasks, wherein Nvidia's advantage in inference depends bulk/batch processing which will make up a smaller category over time, in relative terms.
The future of AI Silicon is inference, and the cost structure of AI data centres is constrained around the current necessity to have 'high GPU utilization' otherwise, the cost / amortization of the chips doesn't work out.
That cost structure is a limitation of Nvidia architecture.
Groq serves a lot faster, and without the limiting batching requirement, which opens hosting arrangements common in most classical hosting scenarios aka without necessarily the high utilization requirements.
Groq has bespoke hardware, lack of CUDA, much lower memory desnsity obviously and they don't have the deep distribution networks and leverage over TSMC that Nvidia has - but pound for pound, were we to be able to 'fire up a server' for our inference needs, it would be Groq, not Nvidia that we'd turn to.
Were they not a later market entrant and didn't have those barriers to entry, they'd be gigantic.
Google's eight generation TPU inference chip has 384 MB of on-chip SRAM vs 500 MB for Groq's third generation LPU.