I use openrouter.ai as the benchmark because it's the foundational API layer for innovator apps that are always the quickest to adopt new tech.
But I don't think either are very meaningful when there are actual benchmarks to measure the quality of models on specific tasks.