undefined

points

[-]

This isn't a good test for any model since LLMs can't math (even though frontier models can sometimes correctly simulate mathing), which is why one would always use a tool for this.

by 6persimmon3 hours ago|

prev|

[-]

Almost about to try it until I saw this. If it's Siri the Silly don't even make up for the opportunity cost.

by xp845 hours ago|

prev|

[-]

Real experience I've had:

"Text Carol bring me a glass of water please"

"I'm sorry, I don't see a 'Carol Bring' in your contacts"