undefined

points

[-]

> they have a totally superior product to Nvidia for inference based tasks

They're not really competing with Nvidia because 1) Nvidia owns their chips now, and 2) Nvidia is not really an inference provider.

by bluegatty13 hours ago|

parent|

[-]

Groq is a slicon maker, the inference provider stuff is a path to market, it's not really the reflection of their market potential.

Nvidia doesn't own them or all their IP now, we don't quite know the terms of the deal.

by TurdF3rguson12 hours ago|

parent|

[-]

AFAIK the terms were the chip-making + talent stuff went to Nvidia, and the api provider stuff gets to keep existing separately.

by 7thpower14 hours ago|

prev|

[-]

Define “totally superior”?

Was this comment created using quantized llama 3?

I love Groq, but across every single line break in your post there is a glaring issue that is easy to refute with in 15 seconds, even without 300t/s of throughput.

by bluegatty13 hours ago|

parent|

[-]

You wasted all of your commentary on snark and sadly unfunny humour, and yet still managed to add nothing.

Groq is more performant for the growing categories of inference-based tasks, wherein Nvidia's advantage in inference depends bulk/batch processing which will make up a smaller category over time, in relative terms.

The future of AI Silicon is inference, and the cost structure of AI data centres is constrained around the current necessity to have 'high GPU utilization' otherwise, the cost / amortization of the chips doesn't work out.

That cost structure is a limitation of Nvidia architecture.

Groq serves a lot faster, and without the limiting batching requirement, which opens hosting arrangements common in most classical hosting scenarios aka without necessarily the high utilization requirements.

Groq has bespoke hardware, lack of CUDA, much lower memory desnsity obviously and they don't have the deep distribution networks and leverage over TSMC that Nvidia has - but pound for pound, were we to be able to 'fire up a server' for our inference needs, it would be Groq, not Nvidia that we'd turn to.

Were they not a later market entrant and didn't have those barriers to entry, they'd be gigantic.

by dnautics13 hours ago|

parent|

[-]

is groq still using 6 racks to serve Llama3-70B or is that old news?

by wmf11 hours ago|

parent|

[-]

The new chip isn't out yet so that's the only thing they could be doing.

by 13 hours ago|

parent|

prev|

[-]

deleted

by imtringued9 hours ago|

prev|

[-]

Google has been releasing a new TPU generation every year since 2023 and the eight generation consists of a training and an inference optimized design.

Google's eight generation TPU inference chip has 384 MB of on-chip SRAM vs 500 MB for Groq's third generation LPU.

by digitaltrees13 hours ago|

prev|

[-]

[flagged]

by Renaud13 hours ago|

parent|

[-]

This is not about xAi.

by digitaltrees12 hours ago|

parent|

[-]