I asked Gemini to format some URLs into an XML format. It got halfway through and gave up. I asked if it truncated the output, and it said yes and then told _me_ to write a python script to do it.
Given that Gemini seems to have frequent availability issues, I wonder if this is a strategy to offload low-hanging fruit (from a human-effort pov) to the user. If it is, I think that's still kinda impressive.
Somehow I like this. I hate that current LLMs act like yes-men, you can't trust them to give unbiased results. If it told me my approach is stupid, and why, I would appreciate it.
I've noticed Gemini pushing back more as well, whereas Claude will just butter me up and happily march on unless I specifically request a critical evaluation.