undefined

points

by cjbarber13 hours ago |

comments

by troupo13 hours ago|

[-]

> What are you using today? In my experience LLMs are already pretty good at this.

LLMS are good at "find me a two week vacation two months from now"?

Or at "do my taxes"?

> how to use Cowork.

Yes, and I taught my mom how to use Apple Books, and have to re-teach her every time Apple breaks the interface.

Ask your non-tech friends what they do with and how they feel about Cowork in a few weeks.

> I think where our view differs is I expect that models will be able to get good at making custom interfaces, and then help the user personalize it to their tasks.

How many users you see personalizing anything to their task? Why would they want every app to be personalized? There's insane value in consistency across apps and interfaces. How will apps personalize their UIs to every user? By collecting even more copious amounts of user data?

by baq13 hours ago|

parent|

[-]

> Or at "do my taxes"?

codex did my taxes this year (well it actually implemented a normalization pipeline and a tax computing engine which then did the taxes, but close enough)

by William_BB12 hours ago|

parent|

[-]

> well it actually implemented a normalization pipeline and a tax computing engine which then did the taxes, but close enough

You can't seriously believe laymen will try to implement their own tax calculators.

by baq12 hours ago|

parent|

[-]

of course not.

what I believe is that laymen will put all their tax docs into codex and tell it to 'do their taxes' and the tool will decide to implement the calculator, do the taxes and present only the final numbers. the layman won't even know there was a calculator implemented.

by William_BB12 hours ago|

parent|

[-]

Yeah, good luck trusting the output!

by baq12 hours ago|

parent|

[-]

check back in a couple of years!

by William_BB11 hours ago|

parent|

[-]

Ah right! Reminds me of AGI by 2025 :D

by tsimionescu12 hours ago|

parent|

prev|

[-]

If your prompt was more complex than "do my taxes", then this is irrelevant.

by baq12 hours ago|

parent|

[-]

it was many hours of working with codex, guidance and comparing to known-good outputs from previous years, but a sufficiently smart model would be able to just do it without any steering; it'd still take hours, but my input wouldn't be necessary. a harness for getting this done probably exists today, gastown perhaps or something that the frontier labs are sitting on.

by troupo11 hours ago|

parent|

[-]

> but a sufficiently smart model would be able to just do it without any steering;

Yeah, yeah, we've heard "our models will be doing everything" for close to three years now.

> a harness for getting this done probably exists today, gastown perhaps

That got a chuckle and a facepalm out of me. I would at least consider you half-serious if you said "openclaw", at least those people pretend to be attempting to automate their lives through LLMs (with zero tangible results, and with zero results available to non-tech people).

by ravenstine11 hours ago|

parent|

prev|

[-]

Sounds fascinating! If you wrote an article on this I bet it'd have a good shot at making it to the home page of HN.

by jeffgreco11 hours ago|

parent|

prev|

[-]

> LLMS are good at "find me a two week vacation two months from now"?

Yes?

===

edit: Just tested it with that exact prompt on Claude. It asked me who I was traveling with, what type of trip and budget (with multiple choice buttons) and gave me a detailed itinerary with links to buy the flights ( https://www.kayak.com/flights/ORD-LIS/2026-06-13/OPO-ORD/202... )

by troupo9 hours ago|

parent|

[-]

I'd love to try and replicate, but I'm not letting any of these tools anywhere near a real browser and capabilites :)

by 11 hours ago|

parent|

prev|

[-]

deleted