points
TPUv5e with 16 tensor cores for 2 days for the 200M param model.
Claude reckons this is 60 hours on a 8xA100 rig, so very accessibile compared to LLMs for smaller labs