upvote
But practically AI inference requires substantial local computing resources. It's not some web app, it's a order of magnitude more compute needed
reply
Hopefully now you understand why people want smaller models.
reply
Not really, I run a production service on a basic server using these Gemma models, the server is weaker than my MacBook. Most people's laptops and even phones actually can run local models, most simply don't know how. Run Unsloth Studio and you'll see how easy it is.

As the sibling says this is why people want smaller but still performant models.

reply
deleted
reply