undefined

upvote

points

by dkersten11 hours ago |

upvote

by coldtea10 hours ago|

[-]

>I would love to have a product sheet showing what each models strengths an weaknesses are, so that I can have a clear decision tree of "if this kind of work, use model X", or "model Y should be used in ways Z". But they all look the same from the outside and the only way to figure out which might be marginally better at what is to do extensive, time consuming, and perhaps expensive testing.

Think of it less like a static tool, and more like a human helper, where the same holds.

reply

upvote

by mahidhar6 hours ago|

[-]

Well, unlike a human, I cannot expect any these LLMs to take any ownership of the work they do. I cannot expect any given model and version (sonnet 4.6) to learn, improve and adapt over time. I cannot expect it's limitations to ever go away at the model level. So it is not like a human in most ways that I actually care about.

That said, I can't wait for LLMs to stop being AI and start being just another tool. Anything cursed with the "AI" label seems to go through this mess. In the earlier AI cycles, rules engines were considered "human-ish" and got hyped up, but today we just see then as just another tool available to us, and we're better off for it.

reply

upvote

by squidbeak2 hours ago|

[-]

You're on the hook for their work in the way a manager is for their staff's output. The insistence of AI being a mere tool very often comes with this strange desire to be free of responsibility for its work. People seem to forget that the big advantage in these things is the range they have for obscure insight and creative solutions, both impossible with determinism.

reply

upvote

by themgt3 hours ago|

[-]

That said, I can't wait for LLMs to stop being AI and start being just another tool.

From a horse's perspective, the internal combustion engine is just another tool for making scary noises and powering horse trailers to take me on fun horse adventures. So ... perhaps.

reply

upvote

by kolinko4 hours ago|

[-]

models don’t improve, but harnesses/tools/rules around them grow with the project.

reply

upvote

by ACCount379 hours ago|

[-]

One issue with that is that human helpers last longer. LLMs cycle in and out in months, and what held for Your Favorite LLM 6.7 may not hold for Your Favorite LLM 6.9.

reply

upvote

by renegade-otter7 hours ago|

[-]

Right, this is why I would slam the breaks on investing into your workflow all of your time and effort, because 2 months from now it may be out the window. Frontier models are also constantly being tweaked, so what worked yesterday may be off today.

ChatGPT was obedient with the grill-me technique, just wrote a plan. Yesterday it started jumping to implementation. Why?

reply

upvote

by HappySweeney6 hours ago|

[-]

I find that when an LLM jumps into tasks it was not told to do (or even worse, doing things it was explicitly told not to), it is a good sign the context is too full, and you should do a controlled hand-off to a new instance.

reply

upvote

by renegade-otter5 hours ago|

[-]

I wipe my context relentlessly. I never have long-running conversations. In and out like Seal Team Six.

reply

upvote

by madeofpalk9 hours ago|

[-]

Except, where every different model and version is like a different person where you need to learn their idiosyncrasies of how they work every other month.

It's a very very bizarre way to use a computer.

Personally, I just don't. I'll use and prompt the LLMs the way that feels natural to me and move on with my life. Maybe I don't always get completely optimal results from them, but im also not spending half my day pleading with the computer to do a task.

reply

upvote

by user439287 hours ago|

[-]

I also don't think I need to prompt Claude differently than Codex.

The most important thing to be aware of in my opinion would be that Claude is better at UI design, and leaves a lot more comments in the code.

Other than that the results seem similar, at least functionally. I do not usually review the code style.

reply

upvote

by cassianoleal8 hours ago|

[-]

They are not human. Humans have names, faces, voices, personality, a personal history, family, care for whatever they call their community.

With humans it's actually good and worthwhile to create and strengthen connections. With an LLM, that's psychosis.

reply

upvote

by tekne8 hours ago|

[-]

To be fair: a voice, personality, and personal history sounds a lot like training data.

I don't think LLMs are people in any sense, at least as they're constructed now -- but they very much have what we would call "culture" and "personality" in suitably alien forms.

This is not the same as, e.g., feelings, experience, or humanity, or actual opinions or ideas (versus essentially "distilled vibes") and I feel that AI will more and more force us to confront that (including if new AIs are ever developed that may have the latter, as well!)

reply

upvote

by epicepicurean5 hours ago|

[-]

They are not human, but it helps to prompt them similarly. See: https://www.anthropic.com/research/emotion-concepts-function

reply

upvote

by anthonyrstevens4 hours ago|

[-]

Good read. Thanks for sharing.

reply

upvote

by Wowfunhappy7 hours ago|

[-]

They're not human. But they are trained on human language, and thinking of them as similar to a human helps me work with them effectively.

reply

upvote

by malwrar8 hours ago|

[-]

These things passing the Turing Test makes anthropomorphizing their behavior awkward, but don’t forget it’s just an analogy to communicate an experience. If you convey a certain written voice to these models in your input, you get a somewhat consistent end effect. I think that’s all that is being communicated.

reply

upvote

by scotty798 hours ago|

[-]

If you have a toolbox full of similar but different tool getting to know them is a prudent thing to do, not a psychosis. There's no connection because the tool is immutable (except for adjustments you made) but you do develop a specific relation with that tool. Some people even love some of their tools at some level.

And if humans are anything, they are tool users.

reply

upvote

by coldtea7 hours ago|

[-]

>If you have a toolbox full of similar but different tool getting to know them is a prudent thing to do, not a psychosis

Can be both. Use of some tools like LLMs might be more inducing psychosis than others like plain compilers or hammers.

>And if humans are anything, they are tool users.

To the point of self-destruction sometimes.

reply

upvote

by scotty796 hours ago|

[-]

> Use of some tools like LLMs might be more inducing psychosis than others like plain compilers or hammers.

I really don't get it. Why the fact that it outputs words is so goddamn important for everybody? How does it suddenly make you so emotionally vulnerable? Does my brain work in a different way than the rest of humanity? Can't you disregard what's irrelevant? Is every programmer suddenly a trump supporter that has no ability to recognize empty words? To recognize lies about emotions and facts?

Words are just input. Mostly garbage. Emotion inducing words are garbage 10 times more often than any other. I could expect romance reader to be affected, or somebody with iq 70. But how the caste of some of the most technical people ever is afraid of catching psychosis just because they might read some words?

reply

upvote

by chadgpt35 hours ago|

[-]

It's a certain percentage of people and yes it's different for them because it outputs words and triggers some kind of emotional trust response.

reply

upvote

by scotty794 hours ago|

[-]

As good opportunity as any to acquire some emotional intelligence.

reply

upvote

by j-bos8 hours ago|

[-]

Yeah, AI tools bring software developers closer to the messy real world where 0 and 1 aren't always exactly 0 and 1.

reply

upvote

by skydhash4 hours ago|

[-]

Computing is useful for exactly going away from the messy real world of humans. I don’t need random errors in my financial transactions. I don’t want random errors when doctors are retrieving my medical history. And I don’t want random errors in my backup,… There’s plenty of non-deterministic things in my life, I don’t want my computer to follow suite.

reply

upvote

by gib4449 hours ago|

[-]

No, I won't anthropomorphise LLMs.

reply

upvote

by coldtea7 hours ago|

[-]

If there was anything that made sense to anthropomorphise it would be a machine meant to mimic talking, thinking and answering like a human, one that even passes the Turing test.

When we built the idea that anthropomorphising is wrong, we meant when doing it for rocks or trees or thunders or deer or some such.

reply

upvote

by TeMPOraL3 hours ago|

[-]

That's your prerogative, but be aware you'll continue to remain confused about LLMs. Anthropomorphizing them is what gives you the best high-level intuition about where and how to employ them, and where and how not to.

reply

upvote

by yeer27 hours ago|

[-]

This is so dumb and goes against all the principles that enabled computers and smartphones to achieve wide adoption - the technology should evolve to fit the human. Not the other way around.

reply

upvote

by duckmysick6 hours ago|

[-]

I'd argue the opposite. Technology in the past few decades was (is) limited and humans had to adapt to it.

We communicate with other humans using voice and three dimensional hand gestures. To use computers and early phones we had to learn to operate new input devices: keyboards and mice. Later with touchscreens we moved to two dimensional hand (finger) gestures. We're barely making voice commands work with our devices just recently.

Then, a large number of humans are figuratively tethered to their desks because the devices need power and stable internet connection. Mobile devices break this relationship a bit but you still need to charge them and be close to some sort of access point. In any case, the devices encourage sitting in one place for hours at time.

And this is just computers and smartphones. Humans adapted their entire lifestyles and transformed the landscape to cater to cars.

reply

upvote

by skydhash4 hours ago|

[-]

> Technology in the past few decades was (is) limited and humans had to adapt to it.

Was it? Think first about what it replaced. Lots of manual computation in bookkeeping and financial sectors. Telegrams and snail mail moved to email. Typesetting in books and magazines became easier and widely available,…

If there’s one thing that you can’t say about computers is that they’re limited.

reply

upvote

by duckmysick4 hours ago|

[-]

No doubt that computers enabled a lot of automation. We can both agree with that.

The context was that technology should evolve to fit the humans [not the other way around]. And if contemporary technology didn't have limitations, it would be correct.

But it did and humans had to adapt to the computers. Humans had to develop and learn special languages so they could communicate with computers to do all those useful things you mentioned. Why? They were limited in understanding (or parsing) human languages. It took us decades before we could talk to computers in human languages. We're getting pretty close - especially in the past few years - but there's still some friction.

reply

upvote

by skydhash3 hours ago|

[-]

> Humans had to develop and learn special languages so they could communicate with computers to do all those useful things you mentioned. Why? They were limited in understanding (or parsing) human languages

You may need to revisit your computation theory courses. Computers are the embodiment of a mathematical model and thus the inputs and outputs are formalized.

Do you just hold a pen and words are written automatically? Do you just hover your hands over a piano and have the moonlight sonata played? No, you have to do precise mechanical movements because that’s how the output is realized.

There’s no such things as words, sentences, keywords, statements at the computer level. What it does is symbol manipulation. You provide it a string of symbols, the rules for the manipulation, and it will provide a string of symbols as the output.

What symbols, what rules, are completely arbitrary . We just found that {1,0} are all that we needed as the set of symbols and that Context-Free Grammar is perfect for specifying the rules.

We still need to encode everything down to binary (ascii, unicode, bcd, floating points, pixel formats, PCM,…) and use a programming language (as defined by a grammar) to get the computer to do anything. Inference is made possible by those two mechanisms. It’s not a new computation model.

reply

upvote

by Wowfunhappy7 hours ago|

[-]

I mean, like, you can lament the state of the world all you want. It is what it is. Of course the AI labs would also like to make their models more consistent, but it's not how the technology works. They're black boxes to everybody.

reply

upvote

by dreambuffer8 hours ago|

[-]

Please do not think of LLMs like human helpers, that is a recipe for long term sociopathy.

reply

upvote

by egwor4 hours ago|

[-]

Maybe this is similar to web search too. We know how to get google to return the results we want, and when we use other tools like Bing we get other behaviour.

reply

upvote

by dotancohen10 hours ago|

[-]

Honestly, the differences between AI models always felt to me like the differences between coworkers or job candidates. They don't all share the same strengths and weaknesses - and they all have both good days and bad days.

Realising this made me respect the "I" in "AI" a bit more seriously.

reply

upvote

by amelius10 hours ago|

[-]

Yes, but benchmarks can be gamed.

Maybe we need better reviewers then?

reply

upvote

by yunohn6 hours ago|

[-]

> a product sheet showing what each models strengths an weaknesses are

This presumes that the labs themselves know how well their models perform. But all they have are overtuned benchmarks and hype vibes.

reply

upvote

by couscouspie10 hours ago|

[-]

That would be ideal, but AI is less like a tool and more like a human in this regard and you don't have character sheets for each of your colleagues, as well.

reply

upvote

by supergarfield9 hours ago|

[-]

If my coworker was part of a clone series of 100 million units, requesting a character sheet would be pretty reasonable

reply

upvote

by bluegatty9 hours ago|

[-]

These are $1 Trillion dollar companies that can't produce explicit details on how their products work? It's nonsense.

reply

upvote

by 4 hours ago|

[-]

deleted

reply

upvote

by sixothree3 hours ago|

[-]

I think if they could explain how they work, their strengths and weaknesses, they would reveal to the world whose data they've been appropriating.

reply

upvote

by bluegatty3 hours ago|

[-]

That's another thing altogether. They can characterize the behaviour without quite giving up who and where the data comes from.

Admittedly, yes, there's some overlap there.

They would have to admit 'seen it in the training data' as a factor, and that opens a can of worms.

reply

upvote

by epolanski7 hours ago|

[-]

The problem is that this is very hard to replicate and benchmarks focus on E2E tests, going from one prompt to the final solution.

They do not test how models perform when used interactively, like most of us do.

reply