The benchmark game is wholly gamed, but the proof is in the pudding. I know people using Anthropic, OpenAI, and Gemini. Chinese models locally. But who uses Grok for anything but porn? Whatever the benchmarks might say, Grok is just trash in practice. They spent too much time teaching it to be edgy and not enough time teaching it to code.
reply