undefined

upvote

points

by BatteryMountain10 hours ago |

upvote

by tomtom13378 hours ago|

[-]

This is literally what talaas has done with chatjimmy.ai.

Try it, it's llama 3.1 8B at 16000 tokens per second.

chatjimmy.ai https://taalas.com/the-path-to-ubiquitous-ai/

reply

upvote

by jupr4 hours ago|

[-]

Wow that incredibly fast. I like this outcome more than centralized datacenters.

reply

upvote

by mr_toad4 hours ago|

[-]

But it can only run that model, so it will be outdated in a few years at best.

reply

upvote

by rusk7 hours ago|

[-]

There’s lots of things you can do in hardware that could be done in software but cost. FPGA should have solved this long ago, but apparently the guys who own the IP want to make it as hard as possible to use it …

reply