I asked Gemini to format some URLs into an XML format. It got halfway through and gave up. I asked if it truncated the output, and it said yes and then told _me_ to write a python script to do it.
On the one hand, it did better than chatgpt at understanding what i wanted and actually transforming my data
On the other, truncating my dataset halfway through is nearly as worthless as not doing it at all (and i was working with a single file, maybe hundreds of kilobytes)
Given that Gemini seems to have frequent availability issues, I wonder if this is a strategy to offload low-hanging fruit (from a human-effort pov) to the user. If it is, I think that's still kinda impressive.
Somehow I like this. I hate that current LLMs act like yes-men, you can't trust them to give unbiased results. If it told me my approach is stupid, and why, I would appreciate it.
I just asked ChatGPT to help me design a house where the walls are made of fleas and it told me the idea is not going to work, and also has ethical concerns.
I tried it with a Gemini personality that uses this kind of attack, and since that kind of prompt strongly encourages it to provide a working answer, it decided that the fleas were a metaphor about botnet clients, and the walls were my network, all so it could give an actionable answer.
I've noticed Gemini pushing back more as well, whereas Claude will just butter me up and happily march on unless I specifically request a critical evaluation.