undefined

points

[-]

Kimi and GLM models have coined a new term: Thinkslop. They run a chain of thought that is up to 10x longer than other models and it seems that through a lookback mechanism they are able to use the CoT to reason about solutions to tasks they couldn't otherwise solve.

The downside is of course that they consume many more tokens off your plan, and also that they are significantly slower. Kimi K2.7 takes about 7x longer to finish the same benchmark tasks as DeepSeek V4 Pro on my router benchmarks (https://role-model.dev/).

So for now I'm happy with just two models: GPT and DeepSeek.

by PhilippGille10 hours ago|

parent|

[-]

> Kimi and GLM models have coined a new term: Thinkslop. > [...] > So for now I'm happy with just two models: GPT and DeepSeek.

1. DeepSeek V3.2, V4 Flash, V4 Pro, at high or max thinking, ... when recommending a model it should always be a precise model, not just an AI lab

2. DeepSeek V4 Flash at max thinking is the most verbose model (among top models) in the AA benchmarks. See the "Intelligence Index Token Use" chart: [1]

[1]: https://artificialanalysis.ai/models?models=gpt-5-5-high%2Cg...

by try-working6 hours ago|

parent|

[-]

I said specifically V4 Pro. Flash is not the most verbose, that's more likely to be Kimi.

by guybedo15 hours ago|

parent|

prev|

[-]

yeah Kimi K2.7 was doing ok but was painfully slow. The coding plan limits were good though.

I haven't tried deepseek yet, i should check this one out.

by try-working13 hours ago|

parent|

[-]

After the release of K2.7, the Kimi plan quotas have been reduced by about 80%.

by spwa413 hours ago|

parent|

prev|

[-]

Turning up the thinking (max time spent thinking) lever really changes model performance, even for tiny models. But it's really irritating because it adds a lot of time.

by jubilanti18 hours ago|

prev|

[-]

> The model is good, the plan is a scam

If it is needing to generate that many tokens to do the same tasks, then it probably has higher inference costs. So (for you) the model is bad, the plan is the same plan.

by thefourthchime3 hours ago|

prev|

[-]

I gave it my standard:

"Make a pac-man game in a single html page"

It went off and argued with itself for 20 minutes about how to lay out the map and then timed out.

by anatoliikmt17 hours ago|

prev|

[-]

What kind of tasks have you been using it for?