Interesting I didn't know about this, but it makes sense after reading the article. They are benchmarking on a single GPU on a 20bb param model. Does it scale across 60 H100s over NVLink/NVSwitch. I would be interested to see those benchmarks.
The idea that everyone is spinning up a $2 million in GPUs to scan their email inbox, search the web or avoid learning something is still ridiculous to me regardless.