points
This example image was generated using the API on high, not the low reasoning version. (it is slow and takes 2 minutes lol)
The reasoning amount is part of the evaluation isn't it?