upvote
Agents run tools in a loop.

Yes, you can give the Big Brain Thing a vague task and expect results, some times it'll do it right

But if you want repeatability, give it tools to determine what is a "disruption".

I have this exact system running on a WhateverClaw with a few simple tools: weather, train time tables (I commute via train) and a read-only version of my "will I be at the office or remote today" -calendar via a tool. Oh and a tool to notify me via Telegram.

It gets this information using the given tools and determines if it's worth notifying me.

TBH this doesn't need a "claw", I could just run the tools in cron, construct a prompt with that data and run any LLM on it.

reply
This is because for some reason all agentic systems think that slapping cron on it is enough, but that completely ignores decades of knowledge about prospective memory. Take a look at https://theredbeard.io/blog/the-missing-memory-type/ for a write-up on exactly that.
reply
“A programmer is going to the store and his wife tells him to buy a gallon of milk, and if there are eggs, buy a dozen. So the programmer goes shopping, does as she says, and returns home to show his wife what he bought. But she gets angry and asks, ‘Why’d you buy 13 gallons of milk?’ The programmer replies, ‘There were eggs!’”

You need to write a clearer prompt.

reply
"I need to fly to NY next weekend, make the necessary arrangement".

Your AI assistant orders an experimental jetpack from a random startup lab. Would you have honestly guessed that the prompt was "ambiguous" before you knew how the AI was going to act on it ?

reply
Did GP edit their comment? Or did you read the prompt they used somewhere else?
reply
Why not set your own evals and something like pi-mono for that? https://github.com/badlogic/pi-mono/

You'll define exactly what good looks like.

reply
Me too. It doesn't have ability to alert only on true positive. I has to also alert on true negative. So dumb
reply
This doesn't seem to hard to solve except for the ever so recurring llm output validation problem. If the true positive is rare you don't know if the earthquake alert system works until there's an earthquake.
reply
... just force the data into a structured format, then use "hard code" on the structure.

"Generate the following JSON formatted object array representing the interruptions in my daily traffic. If no results, emit []. Send this at 8am every morning. {some schema}. Then run jsonreporter.py"

Then just let jsonreporter.py discriminate however it likes. Keep the LLMs doing what they are good at, and keep hard code doing what it's good at.

reply