Didn't understand those either and used the fuck out of them because "the experts" said we should.
Just like the invention of fire happened ages ago, but is still a crucial part of life today.
The mechanism behind engines were fully understood, any experiments with engines were reproducible and measurable. You could get an engine and create schematics by reverse engireening it.
LLMs, useful as they may be, are not that.
I had a specialization in Chemistry in High School. For some analysis, the fist step is to dissolve everything in boiling Nitric Acid. But stainless steel has Chrome is like a spell of protection, so you must use boiling Hydrochloric Acid instead. I have no idea why. It's just like magic. It may have Nickel, Molybdenum, and other metals, that give it more magical properties.
A few years ago there was a nice post about copying a normal steel alloy for knives to get an equivalent made of stainless steel. You need to reduce the the Carbon content to make it less brittle. And they had to add Vanadium so it keeps the sharpness of the knives. I have no idea why. It's just like magic.
If you have half an hour, it's worth reading, but beware that it has too many technical details that are close to magical https://knifesteelnerds.com/2021/03/25/cpm-magnacut/ (HN discussion https://news.ycombinator.com/item?id=29696120 | 375 points | Dec 2021 | 108 comments)
Humans have been using steel for however long, when and where it was understood to be an appropriate solution to a problem. In some sense, engineering is the development and application of that understanding. You do not need to have a molecular explanation of the interaction between carbon and iron to do effective engineering[-1] with steel.[0] Science seeks to explain how and why things are the way they are, and this can inform engineering, but it is not prerequisite.
I think that machine learning as a field has more of an understanding of how LLMs work than your parent post makes out. But I agree with the thrust of that comment because it's obvious that the reckless startups that are pushing LLMs as a solution to everything are not doing effective engineering.
[-1] "effective engineering" -- that's getting results, yes, but only with reasonable efficiency and always with safety being a fundamental consideration throughout
[0] No, I'm not saying that every instance of the use of steel has been effective/efficient/safe.
“”” Humanity has been using celibacy for over a millenia, however it's only in the past 100 years or so we have a good understanding of not having sex affects the psychology of a person, turning them into an ubermensch. Based on this argument, we should have never stopped having sex, until we had a complete first principles understanding. “””
Analogies can produce a lot of words, making it appear to be a high effort comment, but it also shifts the argument to why or why not an analogy is good or not, and away from the points the original poster was trying to make. And, by Sturgeon’s Law, most analogies are utter crap on top of being an already weak way to form an argument.
Humans could understand properties of steel long before they knew how Carbon interacted with Iron. Steel always behaved in a predictable, reproducible way. Empirical experiments with steel usage yielded outputs that could be documented and passed along. You could measure steel for its quality, etc.
The same cannot be said of LLMs. This is not to say they are not useful, this was never the claim of people that point at it's nondeterministic behavior and our lack of understanding of their workings to incorporate them into established processes.
Of course the hype merchants don't really care about any of this. They want to make destructive amounts of money out of it, consequences be damned.
The correct analogy is: if we just scale and improve steel enough, we'll get a flying car.
I strongly suspect, that we will come to a point, where it gets impossible to tell if something is AGI and consciouss or not.
That's exactly my point. In this analogy LLMs are steel, but the flying things are made out of aluminum, lithium and titanium and not steel. We need a better idea than LLMs because LLMs's are not suddenly going to turn into something they are not.
LLMs are literally stochastic by nature and can't be relied on for anything critical as its impossible to determine why they fail, regardless of the deterministic tooling you build around them.
Ahh, yes, unlike humans, who are completely deterministic, and thus can be trusted.
There are billions of people, you can interview/hire/fire until you get the right match.
There are 2? frontier LLM providers. 5? if you are more generous / ok with more trailing edge.
Everyone thought OpenAI was great, until Claude got better in Q1 and they switched to Anthropic, and then Codex got better and a good chunk moved back to OpenAI.. Seems kind of binary currently.
> Ad hoc fallacy is a fallacious rhetorical strategy in which a person presents a new explanation – that is unjustified or simply unreasonable – of why their original belief or hypothesis is correct after evidence that contradicts the previous explanation has emerged.
https://cerebralfaith.net/logical-fallacy-series-part-13-ad-...
> An argument is ad hoc if its only given in an attempt to avoid the proponent’s belief from being falsified. A person who is caught in a lie and then has to make up new lies in order to preserve the original lie is acting in an ad hoc manner.
It should be clear why the ad hoc fallacy is a fallacy.
There is probably a whole testing workflow at AI companies to tweak each new model until it "looks" acceptable.
But they still don't understand what they are doing. This is purely empirical.
That Nerdy personality prompt made me gag. As a card-carrying Nerd, I feel offended
To me they seem to be pretty damn smart, to put it mildly. They sometimes do stupid things - but so do smart people!
A calculator can do very complex sums very quickly, but we don't tend to call it "smart" because we don't think it's operating intelligently to some internal model of the world. I think the "LLMs are AGI" crowd would say that LLMs are, but it's perfectly consistent to think the output of LLMs is consistent/impressive/useful, but still maintain that they aren't "smart" in any meaningful way.
Okay, but you have to actually address why you think LLMs lack an "internal model of the world"
You can train one on 1930s text, and then teach it Python in-context.
They've produced multiple novel mathematical proofs now; Terrance Tao is impressed with them as research assistants.
You can very clearly ask them questions about the world, and they'll produce answers that match what you'd get from a "model" of the world.
What are weights, if not a model of the world? It's got a very skewed perspective, certainly, since it's terminally online and has never touched grass, but it still very clearly has a model of the world.
I'd dare say it's probably a more accurate model than the average person has, too, thanks to having Wikipedia and such baked in.
Clearly there's a limit. For example, if an alien autocomplete implementation were to fall out of a wormhole that somehow manages to, say, accurately complete sentences like "S&P 500, <tomorrow's date>:" with tomorrow's actual closing value today, I'd call that something else.
> At what point does autocomplete stop being "just autocomplete"?
Every single discussion on the internet is a repeat of https://en.wikipedia.org/wiki/Loki%27s_wager it seems…
That's the sorcery mentioned in the GP, the issue comes when people believe it to be smart however in reality it is just a next word prediction. Gives the impression it's actually thinking, and this is by design. Personally I think it's dangerous in the sense it gives users a false sense of confidence in the LLM and so a LOT of people will blindly trust it. This isn't a good thing.
edit:
You cannot predict all the actions or words of someone smarter than you. If I could always predict Magnus Carlsen's next chess move, I'd be at least as good at chess as Magnus - and that would have to involve a deep understanding of chess, even if I can't explain my understanding.
I can't predict the next token in a novel mathematical proof unless I've already understood the solution.
If you can predict the words a bright person will say about X... Isn't that some truly astounding tool? That could be used in myriad useful ways if one is a little creative with it
Since it's also "alien" it can also detect and explore paths that we simply haven't noticed since their biases aren't quite the same as ours
What would it take for you to concede a future model was smart?
For example, it's training set it purely engineering and code with general language data set, would be "aware" what art is, but has never seen an artistic image, aware what colours are and able to create something it never saw before.
Like a child with a paintbrush, there is an intuitive behavior that happens.
They can already create something they've never seen - you can prompt ChatGPT to generate images, and there's a few dedicated models for it: https://chatgpt.com/images/
Terence Tao feels like they've done innovative work on mathematics: https://www.scientificamerican.com/article/amateur-armed-wit...
They are useful but a cul de sac for heading toward AGI.
A better model to use is this: LLMs possess a different type of intelligence than us, just like an intelligent alien species from another planet might.
A calculator has a very narrow sort of intelligence. It has near perfect capability in a subset of algebra with finite precision numbers, but that's it.
An old-school expert system has its own kind of intelligence, albeit brittle and limited to the scope of its pre-programmed if-then-else statements.
By extension, an AI chat bot has a type of intelligence too. Not the same as ours, but in many ways superior, just as how a calculator is superior to a human at basic numeric algebra. We make mistakes, the calculator does not. We make grammar and syntax errors all the time, the AI chat bots generally never do. We speak at most half a dozen languages fluently, the chat bots over a hundred. We're experts in at most a couple of fields of study, the chat bots have a very wide but shallow understanding. Etc.
Don't be so narrow minded! Start viewing all machines (and creatures) as having some type of intelligence instead of a boolean "have" or "have not" intelligence.
Have you ever heard anyone refer to a calculator as intelligent?
These companies have a vested interest in making the product appear more human/smart than it is. It's new tech smeared with the same ole marketing matter.
The LLM tasks is to produce a string of words according to an internal model trained on texts written by humans (and now generted by other LLMs). This is not intelligence.
Where it fails is generally the first step. It’s kinda like the old saying “you have to ask the right question”. In all problem solving matters, the definition of problem is the first step. It may not be the hardest (we have problems that are well defined, but unresolved), but not being able to do it is often a clear indication of not being able to do the rest.
> What would convince you that you're wrong?
Maybe when I can have the same interaction as with my fellow humans, where I can describe the issue (which is not the problem) and they can go solve it and provide either a sound plan to make the issue disappear. Issue here refer to unpleasantness or frustrating situation.
Until then, I see them as tools. Often to speed up my writing pace (generic code and generic presentation), or as a weird database where what goes in have a high probability to appear.
It’s a fancy autocomplete that takes a bunch of text in and produces the most “likely” continuation for the source text “at once and in full”. So when you add to the source text something like: “You’re an edgy nerd”, it’s very much not surprising that the responses start referencing D&D tropes.
If you then use those outputs to train your base models further it’s not at all surprising that the “likely” continuations said models end up producing also start including D&D tropes because you just elevated those types of responses from “niche” to “not niche”.
The post-mortem is hilarious in that sense. “Oh, the goblin references only come up for ‘Nerdy’ prompt”. No shit.
>LLM is a sorcery tech that we don't understand at all
We do, and I'm sure that people at OpenAI did intuitively know why this is happening. As soon as I saw the persona mention, it was clear that the "Nerdy" behavior puts it in the same "hyperdimensional cluster" as goblins, dungeons and dragons, orcs, fantasy, quirky nerd-culture references. Especially since they instruct the model to be playful, and playful + nerdy is quite close to goblin or gremlin. Just imagine a nerdy funny subreddit, and you can probably imagine the large usage of goblin or gremlin there. And the rewards system will of course hack it, because a text containing Goblin or Gremlin is much more likely to be nerdy and quirky than not. You don't need GPT 5 for that, you would probably see the same behavior on text completion only GPT3 models like Ada or DaVinci. They specifically dissect how it came to this and how they fixed it. You can't do that with "sorcery we dont understand". Hell, I don't know their data and I easily understood why this is going on.
>they want you to think that LLMs are smart beasts (they are not)
I mean, depends on what you consider smart. It's hard to measure what you can't define, that's why we have benchmarks for model "smartness", but we cannot expect full AGI from them. They are smart in their own way, in some kind of technical intelligence way that finds the most probable average solution to a given problem. A universal function approximator. A "common sense in a box" type of smart. Not your "smart human" smart because their exact architecture doesn't allow for that.
>and that we know what LLMs are doing (we don't)
But we do. We understand them, we know how they work, we built thousands of different iterations of them, probing systems, replications in excel, graphic implementations, all kinds of LLM's. We know how they work, and we can understand them.
The big thing we can't do as humans is the same math that they do at the same speed, combining the same weights and keeping them all in our heads - it's a task our minds are just not built for. But instead of thinking you have to do "hyperdimensional math" to understand them 100%, you can just develop an intuition for what I call "hyperdimensional surfing", and it isn't even prompting, more like understanding what words mean to an LLM and into which pocket of their weights will it bring you.
It's like saying we can't understand CPU's because there is like 10 people on earth who can hold modern x86-64 opcodes in their head together with a memory table, so they must be magic. But you don't need to be able to do that to understand how CPU's work. You can take a 6502, understand it, develop an intuition for it, which will make understanding it 100x easier. Yeah, 6502 is nothing close to modern CPU's, but the core ideas and concepts help you develop the foundations. And same goes with LLM's.
>personally side with Yann Le Cun in believing that LLM is not a path to AGI
I agree, but it is the closest we currently have and it's a tech that can get us there faster. LLM's have an insane amount of uses as glue, as connectors, as human<>machine translators, as code writers, as data sorters and analysts, as experimenters, observers, watchers, and those usages will just keep growing. Maybe we won't need them when we reach AGI, but the amount of value we can unlock with these "common sense" machines is amazing and they will only speed up our search for AGI.
For example:
If you train it on a dataset of Othello games, or a dataset including these, you are basically creating a map of all possible moves and states that have ever happened, odds of transitions between them, effective and un-effective transitions.
By querying it, you basically start navigating the map from a spot, and it just follows the semi-randomly sampled highest confidence weights when navigating "the map".
And in the multidimensional cross-section of all these states and transitions, existence of a "board map" is implied, as it is a set of common weights shared between all of them. And it becomes even more obvious with championship models in Othello paper, as it was trained on better games in which the wider state of the board was more important than the local one, thus the overall board state mattered more for responses.
The second research you linked is also has a pretty obvious conclusion. It's telling us more about us as humans than about LLM's, about our culture and colors and how we communicate it's perception through text. If you want to try something similar, try kiki bouba style experiments on old diffusion models or old LLM's. A Dzzkwok grWzzz, will get you a much rougher and darker looking things than Olulola Opolili's cloudy vibes.
The active research is as much as:
- probing and seeing "hey lets see if funky machine also does X"
- finding a way to scientifically verify and explain LLMs behaviors we know
- pure BS in some cases
- academics learning about LLM's
And not a proof of where our understanding/frontier is. It is basically standardizing and exploring the intuition that people who actively work with models already have. It's like saying we don't understand math, because people outside the math circles still do not know all behaviors and possibilities of a monoid.