upvote
> they have a totally superior product to Nvidia for inference based tasks

They're not really competing with Nvidia because 1) Nvidia owns their chips now, and 2) Nvidia is not really an inference provider.

reply
Groq is a slicon maker, the inference provider stuff is a path to market, it's not really the reflection of their market potential.

Nvidia doesn't own them or all their IP now, we don't quite know the terms of the deal.

reply
AFAIK the terms were the chip-making + talent stuff went to Nvidia, and the api provider stuff gets to keep existing separately.
reply
Define “totally superior”?

Was this comment created using quantized llama 3?

I love Groq, but across every single line break in your post there is a glaring issue that is easy to refute with in 15 seconds, even without 300t/s of throughput.

reply
You wasted all of your commentary on snark and sadly unfunny humour, and yet still managed to add nothing.

Groq is more performant for the growing categories of inference-based tasks, wherein Nvidia's advantage in inference depends bulk/batch processing which will make up a smaller category over time, in relative terms.

The future of AI Silicon is inference, and the cost structure of AI data centres is constrained around the current necessity to have 'high GPU utilization' otherwise, the cost / amortization of the chips doesn't work out.

That cost structure is a limitation of Nvidia architecture.

Groq serves a lot faster, and without the limiting batching requirement, which opens hosting arrangements common in most classical hosting scenarios aka without necessarily the high utilization requirements.

Groq has bespoke hardware, lack of CUDA, much lower memory desnsity obviously and they don't have the deep distribution networks and leverage over TSMC that Nvidia has - but pound for pound, were we to be able to 'fire up a server' for our inference needs, it would be Groq, not Nvidia that we'd turn to.

Were they not a later market entrant and didn't have those barriers to entry, they'd be gigantic.

reply
is groq still using 6 racks to serve Llama3-70B or is that old news?
reply
The new chip isn't out yet so that's the only thing they could be doing.
reply
deleted
reply
Google has been releasing a new TPU generation every year since 2023 and the eight generation consists of a training and an inference optimized design.

Google's eight generation TPU inference chip has 384 MB of on-chip SRAM vs 500 MB for Groq's third generation LPU.

reply
[flagged]
reply
This is not about xAi.
reply
Ahh, well I stand corrected. But Groq sold out where are our revolutionary chips? locked behind a monopoly. :)
reply
Groq != grok
reply
I have the feeling quite some up and downvotes here did not take that into account.
reply