And someone has already converted it to onnx format: https://huggingface.co/eschmidbauer/cohere-transcribe-03-202... - so it can be run on CPU instead of GPU.
This kids make sense because "compiling" (training) the model cost inhibitly much, and we can still benefit from the artifacts.