undefined

upvote

points

by rvnx23 hours ago |

upvote

by jazzyjackson22 hours ago|

[-]

I’ll explain it: these tools are non-deterministic and people have different experiences with them. For a few people every interaction is totally fumbled and they think the cheerleaders of gen AI must be lying, for others the chatbot hits one home run after another and lets them add microcontrollers to their CAN bus. When these people’s good luck runs out and they start getting mixed results like the average user, they assert the service must have been down graded

reply

upvote

by triMichael22 hours ago|

[-]

I'll add to that: you are more likely to have a good experience if it has a lot of relevant data that it was trained on. You are also more likely to have a good experience if errors don't cause major issues.

So one-shotting a game of Snake should be great (tons of training data, errors are easily caught because it's a small program). Similar with building a lot of web UI front end, or one-shotting a personal project. On the other hand, I haven't been convinced that it's good enough to maintain large codebases or assist with niche topics that are not very well documented.

reply

upvote

by thewebguyd21 hours ago|

[-]

> if it has a lot of relevant data that it was trained on

This became evident to me the moment I tried to have these models work on some PowerShell tasks for me. Even Opus today struggles with PowerShell.

Since anything in PS is probably some internal sysadmin tool, there's not much public code out there outside of Microsoft's documentation. Plus the Verb-Noun naming scheme makes it really easy to just hallucinate cmdlets (which it does, often). Its easier to have the LLM just do things in python using M365 Graph API than any of the provided PowerShell cmdlets.

OTOH, I've been using Claude for a lot of Swift & Swift UI work lately and it has no problems there, and I'd imagine there's even less publicly available training data for that so to be honest I'm not entirely sure why it fails so badly at powershell.

reply

upvote

by picofarad13 hours ago|

[-]

I have deepseek or grok write bash-likes in pwsh often enough to wonder what sort of things you're doing in pwsh...

I use it to wrap ping.exe with colors and fewer columns, for example. yt-dlp wrapper to fetch 480p bestaudio with English subtitles, no playlist, works on a surprising number of video sites.

It does make cmdlets up, you're right, there.

reply

upvote

by lowbloodsugar21 hours ago|

[-]

> On the other hand, I haven't been convinced that it's good enough to maintain large codebases or assist with niche topics that are not very well documented.

Same is true of humans. So far my experience is that addressing the issue with the help of AI is faster than not (ie comprehending the system and creating the documentation).

reply

upvote

by cauch19 hours ago|

[-]

I don't understand the comments of the kind of "same is true with human".

This feels a bit like whataboutism.

It also feels like people don't listen to each others.

For example, reading the previous comment, it feels like the thing that reduce the enthusiasm was that at first GenAI looks like it was "reading, understanding and using its own knowledge to answer the problem", but as soon as it is a ore niche or a more complex situation, GenAI looks like it "does not understand the code, just does the equivalent of a StackOverflow search and try to apply the solutions that it found there, and this is why it felt like it understood the code before".

It does not at all means that GenAI is not terribly useful. And even better than humans in some situations.

But it feels that answering "same with humans" is missing this point: that's the opposite, humans usually try to understand the code and are bad at covering a very large range of very well documented subjects. That's the "uncanny valley" they talk about: they assumed GenAI performance on a subject X is due to a "human-like" approach, and it feels very strange when this impression falls apart.

reply

upvote

by lowbloodsugar13 hours ago|

[-]

No I mean I’m in the camp that believes AI and the human brain are analogous and work the same way. Someone once replied, “then why do I need to supervise them?” and I pointed out that there a people whose job is literally ”supervisor”.

reply

upvote

by cauch33 minutes ago|

[-]

I don't think that it is what means the parent comment you answer.

The comment you answer to says that their experience is that AI and the human brain are not analogous and that AI is good to store large amount of knowledge and repeat it (or extrapolate based on pattern on the large amount of knowledge), but bad at understanding the code as a human does. Which explains why a human is more efficient when reacting on a thing that don't have a lot of documentation (on which the AI built its knowledge).

Humans are bad at storing large amount of knowledge, and this is why we need supervisor for human.

AI are bad to understand new stuff, they need to be able to connect the new stuff with a lot of examples they have been trained on (it does not mean the stuff is "identical", but it means "connected"), and this is why we need supervisor for AI.

We need supervisors for both human and AI, but for different uncorrelated reason.

reply

upvote

by dyauspitr22 hours ago|

[-]

I still don’t get it I can dictate a prompt and sometimes I do it so quickly the text looks like a drunken parrot dictated it and it still always gets exactly what I’m asking for. I’m just going to attribute malice to the naysayers.

reply

upvote

by bonoboTP22 hours ago|

[-]

Some people are really bad at specifying what they want to ask for. Or they already start prompting with the attitude that it can't possibly work so they don't even really try, or stop at the first failure to point and say how bad it is.

reply

upvote

by thewebguyd21 hours ago|

[-]

People are really, really bad at specifying what they actually want. I've worked in IT for my whole career, starting in help desk (now an IT manager). My days in the service desk was enough proof that people have no idea what they actually want, or at least, they really struggle to articulate it into words.

It's the famous "email broken, fix pls" but in the form of an LLM prompt.

reply

upvote

by bonoboTP19 hours ago|

[-]

Well, today's multimodal llm agents with tools would at least have a good chance to do something with even such an underspecified query. Because fixing things is simpler to specify, the agent could look at config, network settings, send a test email, take a screenshot etc and get a good idea of what's broken. But when you want some new feature or new app, you can't do without actually asking for specifics, or at least you shouldn't complain if it didn't read your mind correctly. Or at least accept that you have to iterate. I think many average people can get this if they are motivated, and they can incrementally say what they don't like even in vague terms and it can get better. But some just stop without trying to ask for changes.

It can be frustrating to observe people interacting with these things. But it was just as frustrating 20 years ago, so maybe it's just a constant.

reply

upvote

by rvnx20 hours ago|

[-]

Similarly, doing service desk, the thing that makes me flip the table is how people start by explaining what does not work, instead of explaining what they are trying to do.

reply

upvote

by bonoboTP19 hours ago|

[-]

It's hard even at the highest levels, such as in writing scientific papers or doing scientific conference talks. People just generally have a hard time to step outside of their context and think with the head of someone who has a different set of facts and assumptions in their context. It's hard to know how much context you both share, and how to tailor the explanation so you also don't start from Adam and Eve but you explain just enough context and strip irrelevant tangents.

I don't think this is just about intention and willingness, it's just simply hard.

reply

upvote

by skydhash20 hours ago|

[-]

Or maybe people see how complex the code is and all the failure points, and don’t feel it’s ethical to use the output. In most of the comments, the most relevant point is that the poster is not an expert in the domain they got helped. While they can observe the result, they don’t have a causal model of the situation.

reply

upvote

by camel_gopher22 hours ago|

[-]

It’s a probabilistic parrot

reply

upvote

by foobarbecue20 hours ago|

[-]

What's the difference (stochastic vs probabilistic)?

Or... were you illustrating?

reply

upvote

by amelius19 hours ago|

[-]

I still would like to hear a public apology from the stochastic parrot crowd for their deceptive framing. Or maybe it was just incompetence.

reply

upvote

by trumpdong10 hours ago|

[-]

"everyone who doesn't share my opinion is deceptive or maybe incompetent"

reply