upvote
This to me is the big leap from being good at coding to being good at many other tasks.

Coding could be treated as a low stakes (time & money consequences for retries) closed loop system where most other tasks cannot.

If it screws up booking your flight/hotel room, how does the agent verify this, and even if it verifies.. there is an actual cost to changes/cancellations.

Similar with agentic e-commerce, lots of ability to screw that up and just seems ripe for fraud / being picked off by bad actors.

reply
Seems like to make agents safe we need tentative, reversible transactions. How do you set up a travel plan and then review it? How do you modify it later?

Unfortunately, travel keeps getting less flexible, with worse cancelation policies.

reply
To reply to myself here..

I can STILL replicate this behavior in Google AI summaries 10% of the time:

"is <SOMEPLANT> ok for cats"

to which it replies: "Yes, <SOMEPLANT LONG SCIENTIFIC NAME VERBOSE PHRASING> is toxic for cats"

The other one going around this weekend: "how long hot dogs on grill"

Summary: "The hot dogs on your grill are likely around 5-6 inches long .. "

So scale this category of error to unsupervised agents with access to your credit card.

reply