The answer wasn't dumb like others are getting. It was pretty comprehensive and useful.
While the idea of a feline submarine is adorable, please be aware that building a real submarine requires significant expertise, specialized equipment, and resources.Generate lots of solutions and mix and match. This allows a new way to look at LLMs.
What's the moat with with these giant data-centers that are being built with 100's of billions of dollars on nvidia chips?
If such chips can be built so easily, and offer this insane level of performance at 10x efficiency, then one thing is 100% sure: more such startups are coming... and with that, an entire new ecosystem.
(And people nowadays: "Who's Cisco?")
I need some smarts to route my question to the correct model. I wont care which that is. Selling commodities is notorious for slow and steady growth.
Me: "How many r's in strawberry?"
Jimmy: There are 2 r's in "strawberry".
Generated in 0.001s • 17,825 tok/s
The question is not about how fast it is. The real question(s) are: 1. How is this worth it over diffusion LLMs (No mention of diffusion LLMs at all in this thread)
(This also assumes that diffusion LLMs will get faster) 2. Will Talaas also work with reasoning models, especially those that are beyond 100B parameters and with the output being correct?
3. How long will it take to create newer models to be turned into silicon? (This industry moves faster than Talaas.)
4. How does this work when one needs to fine-tune the model, but still benefit from the speed advantages?I don't get these posts about ChatJimmy's intelligence. It's a heavily quantized Llama 3, using a custom quantization scheme because that was state of the art when they started. They claim they can update quickly (so I wonder why they didn't wait a few more months tbh and fab a newer model). Llama 3 wasn't very smart but so what, a lot of LLM use cases don't need smart, they need fast and cheap.
Also apparently they can run DeepSeek R1 also, and they have benchmarks for that. New models only require a couple of new masks so they're flexible.
Jimmy replied with, “2022 and 2023 openings:”
0_0
I can produce total jibberish even faster, doesn’t mean I produce Einstein level thought if I slow down
It isn't about model capability - it's about inference hardware. Same smarts, faster.