undefined

points

[-]

The H200 is a more powerful H100, and the H100 is far from obsolete - for example it is what Musk's Colossus-1 data center, currently being rented to Anthropic, uses.

The only difference between using a slower chip such as H100 (or Huawei's Ascend 750) vs NVIDIA's newer Blackwell chips (B200 etc) is that you need more of the slower chips to achieve the same total FLOPs in your cluster. It has zero effect on what models you can run on it.

by zrn9001 hours ago|

prev|

[-]

It's hard to understand: Do you think that having to use chips that were 20% less performant would lock China out of anything? Are you not aware that with the low costs they have, they can just stack ten times or more datacenters and run workloads in parallel to make up for that performance difference - even if there was actually one that high?

by dgellow13 hours ago|

prev|

[-]

No? They are actively in the race, what are you talking about

by solenoid093713 hours ago|

parent|

[-]

By "the race" I mean "the frontier, and the race to superintelligence." They are categorically behind. The best they can do with the capacity they have is to distill US models, but that doesn't enable them to reach the scale needed to leapfrog the US in the race to superintelligence.

by nl12 hours ago|

parent|

[-]

It isn't distillation that gave GLM 5.2 it's jump in performance.

To quote Pat Toulme:

There’s a big misconception about how GLM 5.2 was trained. Yes, they distilled Claude and GPT 5.5 — but distillation is not how they matched Opus quality. Distillation only fixed the cold start problem in RL.

RLing an agentic coding model isn’t rocket science. In simplified terms:

1. RL needs trajectories — rollouts where the model actually completed a task in some env

2. No successful trajectory on a task = zero gradient = you can’t RL it. This is the cold start problem

3. Distillation solves it. You seed your model with knowledge from a smarter one (Claude, GPT) on tasks it can’t do yet

4. Now it produces positive trajectories on those tasks

5. RL on those trajectories and hill climb agentic coding

6. At that point you no longer need to distill and can solely hill climb RL to better models

This is an interesting curve. I’d argue it’s harder to get to Opus 4.8 from scratch than to go from Opus 4.8 → Fable/Mythos tier.

GLM 5.2 is already producing positive trajectories, so they have plenty to RL on — they’ll keep climbing to Mythos quality without distilling any further. They no longer need American models.

https://x.com/PatrickToulme/status/2069211575437627743

Not exactly sure what the finish line in "the race to superintelligence" looks like and even moreso it's unclear why you think being there first is a critical benefit.

by aspenmartin7 hours ago|

parent|

[-]

Yes but in an equilibrium steady state, compute and data advantages are all you need to first order. China does not yet have a compute advantage. RL is indeed the magic sauce for coding agents but the bottleneck for how much progress you can make, for both the US and China, is compute. The US at least for the next few years has a clear advantage here.

by dgellow6 hours ago|

parent|

[-]

So, China is in the race. Just not leading yet

by aspenmartin5 hours ago|

parent|

[-]

Exactly -- the hope for US strategy is that you can slow them down a lot but not forever. That slowing them down is in itself enough to keep a strategic advantage over them both in terms of economic growth and offensive capabilities both in terms of cyber attacks, intelligence and things like drones, etc.