undefined

points

by ViscountPenguin15 hours ago |

comments

by 12 hours ago|

[-]

deleted

by petesergeant9 hours ago|

prev|

[-]

> the performance is really poor for the token price

That doesn’t match my experience or the numbers:

https://openrouter.ai/openai/gpt-oss-120b?sort=throughput

by bluegatty15 hours ago|

prev|

[-]

Groq is considerably faster and better at inference, they have a totally superior product to Nvidia for inference based tasks, which will be the dominant concern in the future.

Plausibly, take all the Nvidia hype and multiply that by a factor and that's what 'Groq' could be worth.

And there is no real commodification - there's Nvidia, Cerebras, Groq ... not many otheres.

by TurdF3rguson14 hours ago|

parent|

[-]

> they have a totally superior product to Nvidia for inference based tasks

They're not really competing with Nvidia because 1) Nvidia owns their chips now, and 2) Nvidia is not really an inference provider.

by bluegatty13 hours ago|

parent|

[-]

Groq is a slicon maker, the inference provider stuff is a path to market, it's not really the reflection of their market potential.

Nvidia doesn't own them or all their IP now, we don't quite know the terms of the deal.

by TurdF3rguson12 hours ago|

parent|

[-]

AFAIK the terms were the chip-making + talent stuff went to Nvidia, and the api provider stuff gets to keep existing separately.

by 7thpower14 hours ago|

parent|

prev|

[-]

Define “totally superior”?

Was this comment created using quantized llama 3?

I love Groq, but across every single line break in your post there is a glaring issue that is easy to refute with in 15 seconds, even without 300t/s of throughput.

by bluegatty13 hours ago|

parent|

[-]

You wasted all of your commentary on snark and sadly unfunny humour, and yet still managed to add nothing.

Groq is more performant for the growing categories of inference-based tasks, wherein Nvidia's advantage in inference depends bulk/batch processing which will make up a smaller category over time, in relative terms.

The future of AI Silicon is inference, and the cost structure of AI data centres is constrained around the current necessity to have 'high GPU utilization' otherwise, the cost / amortization of the chips doesn't work out.

That cost structure is a limitation of Nvidia architecture.

Groq serves a lot faster, and without the limiting batching requirement, which opens hosting arrangements common in most classical hosting scenarios aka without necessarily the high utilization requirements.

Groq has bespoke hardware, lack of CUDA, much lower memory desnsity obviously and they don't have the deep distribution networks and leverage over TSMC that Nvidia has - but pound for pound, were we to be able to 'fire up a server' for our inference needs, it would be Groq, not Nvidia that we'd turn to.

Were they not a later market entrant and didn't have those barriers to entry, they'd be gigantic.

by dnautics13 hours ago|

parent|

[-]

is groq still using 6 racks to serve Llama3-70B or is that old news?

by wmf11 hours ago|

parent|

[-]

The new chip isn't out yet so that's the only thing they could be doing.

by 13 hours ago|

parent|

prev|

[-]

deleted

by imtringued9 hours ago|

parent|

prev|

[-]

Google has been releasing a new TPU generation every year since 2023 and the eight generation consists of a training and an inference optimized design.

Google's eight generation TPU inference chip has 384 MB of on-chip SRAM vs 500 MB for Groq's third generation LPU.

by digitaltrees13 hours ago|

parent|

prev|

[-]

[flagged]

by Renaud13 hours ago|

parent|

[-]

This is not about xAi.

by digitaltrees12 hours ago|

parent|