upvote
It's like a giant plinko board where the shape of the original disc guides how it falls through the apparatus, and the apparatus has been tuned so that different discs end up in the exits we want them to
reply
That's a very concise and illuminating way to think about what's happening, IF (and only if) you already know how these models work. Thanks for that.
reply
Yes this is more like compression to remember and not for learning/understanding.
reply
Compression is the reason why these Models are able to learn and understand.

My brain is doing the exact same thing.

I learned enough to compress concepts like a bike and what a bike does and for what i can use a bike.

Ask a LLM and it will answer you similiar to humans.

Blind people learn concepts of bikes too and in a smiliar way: by description.

LLMs just have so much data in form of text available and are able to ingest all of this, that the LLM compression algorithm doesn't has to be that good/finetuned than ours.

But I would assume that Yann LeCun's JEPA or other breakthroughs in the next few years will get us there.

reply
> Blind people learn concepts of bikes too and in a smiliar way: by description.

And by touch and sound. And maybe some were daring enough to drive one, or unlucky enough to get hit by one. But have way more input than just texts.

reply
reply
Invisibilia's episode was my first exposure to it.

https://www.npr.org/programs/invisibilia/378577902/how-to-be...

The man posits that clicking is instinctual for blind people but they are told to quiet down in class and most never develop their echolocation abilities

reply
Wow. Thank you.
reply
LLMs also have other inputs, like audio and images. They get encoded (just like a human eye encodes an image) and passed to the weights.
reply
So a blind person only can describe lava to you after they touched and heared it?
reply
A blind person has touched warm and hot things and gotten burned before, and then they are told lava is this molten liquid that is even hotter than anything they have touched. That is enough for them to understand.

A blind person that never touched a hot object wouldn't really know though, there is a reason we dismiss talk from people who lack experience.

reply
You don't know that. Yo don't know what someone would think if you tell them the general concept of cold and warm.

The reaction you should have, the feeling etc.

I asked chatgpt how it would describe a scene without mentioning temperature. It was very good in describing what a human would describe.

I'm aware of the bias we have against LLMs but I think people just underestimate how much data is there.

I'm not saying a robot wouldn't be better with this information or an LLM and they actually use temperature sensors for robots so they can control movement speed and dexterity with overheating elements but the gap is small.

reply
In what way is that different from any other model of reality that you'd use to winnow a dataset into an answer to a question? The only major difference I see is that beyond a certain number of transformations, people are willing to treat it as some sort of miracle, and too tired to figure out why it came up with the answer it came up with. It's almost like people desperately want to give up their agency and creativity to black boxes, whether those weights produce answers that are right or wrong. Factor in that psychology and it looks a lot less like we have invented something useful, and a lot more like we as a species are choosing to quit life en masse.
reply
> The only major difference I see is that beyond a certain number of transformations, people are willing to treat it as some sort of miracle, and too tired to figure out why it came up with the answer it came up with.

It’s funny, because I thought you were talking about humans here when you wrote this. We know some things about how our bodies encode information that is sent to the brain, and we know some things about how neurons receive information and act on it, but after that we get too tired and give up on how the brain works and treat it like a miracle.

It’s like we desperately want to believe our consciousness is not just electrical impulses in our brain, and we want to ascribe agency and uniqueness to the physical processes going on in our head.

reply
There's definitely a sizable contingent of people who desperately want to believe consciousness is just electrical impulses in our brain. Because "what else could it be"? The fact is that we just don't know, and "abiding in the not-knowing" is for many the most uncomfortable thing ever. Especially for the curious- and rational-minded people this forum tends to attract. I'm one of them, too.
reply
> but after that we get too tired and give up on how the brain works and treat it like a miracle.

I disagree. We know very well how neurons work, and we have a pretty good idea of how neural activity translates to behavior. In other words, we have a pretty good idea on how the brain works. We stop at consciousness because as of yet it is in the realm of philosophy, not science. We don‘t know what consciousness is or even whether or not it is useful for science and we are simply waiting for the philosophers guides us out of that situation.

Note that both cognitive psychology and behavioral psychology has done fine without tackling consciousness. When neuropsychology emerged in the 1980s it complemented both these fields perfectly. The situation is the opposite with the philosophy of mind which grew significantly around the same time.

There have been some attempts to describe consciousness as an emerging phenomena out of neural activity, but so far all of these attempts have failed, or at least failed to turn consciousness into a useful term in psychology (the way gravity is a useful term in physics). I think it is equally likely that these attempts have failed because consciousness may simply not be a useful term in psychology, that is as likely as it is that we simply don‘t understand it well enough.

reply
Saying we have a good idea of how the brain works massively overstates the case...

We know how neurons fire. We do not know how a brain turns that into thought, meaning, intention, experience and on and on. That is not "pretty well understanding the brain", it's understanding some components and hand waving the thing we actually care about.

reply
What I actually care about is how neural activity translates to behavior. And we have a good enough idea of that that we can design SSRI medicine to treat depression, or neurological tests to detect Alzheimer. As for experience we do know something and we are learning more with cognitive psychology, in e.g. priming experiments etc.

I feel like the search for consciousness is to psychology what the search for the Aether was for physics and chemistry. I think it is a worthwhile search, and maybe we will discover something important during that search, but we should also be prepared to find out that the thing might not exist, or it’s presumed properties are better explained with a different model.

reply
SSRIs are not evidence that we understand how neural activity becomes behavior. They are evidence that you can perturb a system usefully without understanding it very well. That is exactly my point.

Respectfully, you are miles out of your depth here.

reply
I don‘t see why you felt the need to insult me here. We are having a very common disagreement here, one which philosophers of science have been actively debating for several decades.

My point with the SSRI is that we know that serotonin is a chemical which incites certain neurons, and we know that a lack of activity of neurons in that general area in the brain is correlated with depression, so scientists were able to accurately predict that keeping the serotonin in that brain area for longer would increase brain activity there and decrease the level of depression.

This counts as pretty good understanding in my books at least. It teaches us very little about consciousness but my point is that it doesn’t have to. Just like Newton’s theory of gravity did not have to teach us about some deeper cosmological truth.

reply
> beyond a certain number of transformations, people are willing to treat it as some sort of miracle, and too tired to figure out why it came up with the answer it came up with

It’s less about being too tired and more about being realistic about the limits of understanding.

Consider mass and energy flows in planet-scale systems: At some point we call these “weather” and change the tools with which we study them, but we never stopped trying to understand the phenomenon.

reply
If you're going to make something smarter than a person, you got to be convinced that you're only going to be able to understand it on the single training step level and then inductively trust that the rest of it works. We do empirical testing of course with evals, but there's sort of an art to figuring out what is theoretically going to improve eval performance. Trying to fit the meaning of all those weights in your little human brain and working back from there isn't going to work for more than a little slice of the dataset at a time because that's all we can fit in our understanding.
reply
When we attempt to recreate those complex, planetary atmospheric phenomena in a box, we're doing so in order to measure and study them.

Making random turbulence in a box until it resembles the outside world, and calling it weather and extrapolating some predictive meaning from the result, is the total antithesis of what you're describing about why we come up with simplified models for impossibly complex systems. The purpose of [mathematical] models that are built thoughtfully is to explain why complex systems are the way they are, with data and algorithms, however imperfectly. [Whereas] The purpose of LLM models is to give the illusion of answering questions while never answering why the answer was given. The difference is the difference between a scientist and a tarot card reader, an equation and an oracle.

People have a well known tendency to gravitate toward the shamanistic, oracular, and superstitous. Listen, I ran a casino for 6 years, I know. The impossibility of knowing how 80 layers of matrix multiplication led to a particular answer is in itself a psychological factor in choosing whether to accept the answer or to question it. People tend to err on the side of the over, in sports betting terms... or on the lazy side in general... and they will make whatever excuses they need to after the fact to justify their decisions. So now we have a machine that can act like an oracle and which you can also blame, but the blame goes into a void because this machine is stateless and is only a reflection of information, not an intentional refinery of data.

Sit next to a bank of slot machines for an hour and listen to the absolutely ridiculous shit most people will come up with to explain how they "know" if a machine is going to pay out soon, and then tell me if you think it's a good idea to give them an LLM in their pocket to answer their questions in whatever way they frame them.

reply
> The purpose of [mathematical] models that are built thoughtfully is to explain why complex systems are the way they are, with data and algorithms, however imperfectly.

Nope. The main purpose of the whole endeavor is usually to predict the behavior of a complex system, because that's actually what we care about. If we can predict it, we can adapt to it, and eventually use it to our advantage.

Explaining why a complex system is the way it is, is merely nice-to-have. Models are opinions. All of them are wrong, but some are useful, and we rank them by how useful they are. The models and explanations are important because, beyond their elegance and convenience, it's also the case that more accurate models give you better predictions across larger domains, meaning we get better at getting something useful out of the complex system.

People get fixated on modern theoretical science, with bottom-up mathematical explanations traced through seas of empirical data, with whole magical rituals of peer review and double-blind studies and statistical significance around them. But they forget that the core of empirical science is literally throwing shit at a wall to see what sticks. That is the guiding principle, everything else is just making the process more efficient.

Understanding complex natural systems (or even engineered ones that got too complex) always starts with tests - tests on the real thing, then on approximate models that we poke and prod and bash into shape until they start acting similarly to the real thing. It's through the poking and bashing, and how they affect our proxy model, that we glean insights into nature of the simulated phenomena, and eventually formulate general theories - but more importantly, the models give us useful predictions from the start, before we have any theories explaining why.

reply
I don't know - this is a highly specific interpretation of both what science is and why people choose to do it.

I'm a scientist. Believe it or not, I believe in substantially more than prediction and I think its rather trivial to come up with examples where mere prediction is insufficient to meet a normal person's notion of an account of a thing (eg, pre-copernican planetary motion). I'm not saying you are wrong, per se, just that the idea that "it was prediction all along" is a very specific idea of what human beings are interested in and what we are up to.

> that we glean insights into nature of the simulated phenomena

That is right - most people believe that there is a simulated phenomenon "out there" that we learn about. I think there are strong reasons to believe this having to do with how models are related to predictions. The wrong ontology can make prediction very hard and the right one can make prediction substantially easier. Arguably, we are in that situation right now with language models - we just threw a lot of parameters at the problem and now we are able to predict but we still don't really understand. This is perhaps inevitable in the case of language, but I don't think we should look at models with tons of degrees of freedom and the ability to predict things as a death knell for the very idea of deeper understanding.

reply
> The purpose of LLM models is to give the illusion of answering questions while never answering why the answer was given.

This is just your own idiosyncratic and biased belief. You're not describing anything objective about LLMs, you're describing your personal attitude to them. This colors your understanding in a way that can't really be reasoned with until you let go of the artificial constraints you're imposing on your own understanding.

reply
> Sit next to a bank of slot machines for an hour and listen to the absolutely ridiculous shit most people will come up with to explain how they "know" if a machine is going to pay out soon, and then tell me if you think it's a good idea to give them an LLM in their pocket to answer their questions in whatever way they frame them.

If the LLM in their pocket has a more robust world model than they themselves and is e.g. able to refute their irrational convictions, it actually seems like a very good idea. (Big if, of course.)

reply
>It's almost like people desperately want to give up their agency and creativity

Don't make me think!

Also don't make me take responsibility. (This seems to be the actual function of every organization.)

reply
Agency?

What are you talking about?

I want freedom.

I want freedom to do what i want and not sitting in front of a computer and coding for some company.

Please AI lets burn down knowledge work and labor work. Lets create so much stress to our society that we start rethinking what works mean.

Lets redefine work into discovering the world again. Let people do old handcraft jobs, let them do more sports, let them read more, let them write and make more. Let them enjoy nature.

reply
Work has never been about "discovering the world". There have been a handful of privileged folks who had the time to "discover the world". Work has traditionally been "let's find enough food for my family". If you want to think of a future of abundance then perhaps we can discover the world.
reply
> Lets redefine work into discovering the world again. Let people do old handcraft jobs, let them do more sports, let them read more, let them write and make more. Let them enjoy nature.

Why leave something so important up to what AI does or doesn't do?

reply
deleted
reply
Because capitalism doesn't allow for that.

Only a fundamental change to our society will allow this for the masses when pressure to the rich skyerockets

reply
This seems to be a little naive about how humans consume the benefits we create in society.

"Let people do old handcraft jobs, let them do more sports, let them read more, let them write and make more. Let them enjoy nature."

Very nice thoughts. You know we all could do this today without "burning it down"? Get in your pod, eat your slop, and watch your screen is where this is headed.

"I want freedom to do what i want and not sitting in front of a computer and coding for some company."

You get that it's you creating the misery here? Then stop? Don't do it. Go start a farm or whatever you think will solve your problems. At some point this all boils down to "chop wood and fetch water" so if the modern way of doing that is so terrible then stop. Go fetch water the old fashioned way and be free.

reply
The solution we've come up with is move all the unpleasant work stuff to China where people don't complain about doing it because they already have communism, and therefore everything is of course effortlessly perfect there.
reply
"I want freedom to do what i want and not sitting in front of a computer and coding for some company."

"Please AI lets burn down knowledge work and labor work"

"Let people do old handcraft jobs."

So many presuppositions about what people want to do.

As a child I spent a lot of time programming and doing "knowledge work" because it's fun - I don't enjoy "old hand-crafted jobs". Sure, let's definitely destroy capitalism in it's current state I suppose. But I find people like you who hate knowledge-work/coding and think everyone else must feel the same and only do it for the money a bit out-of-touch.

reply
right, these knowledge work and coding jobs are, by my lights, about the best possible job. From my perspective we've invented a machine that does the fun parts while leaving me the less fun parts (review, various hard-to-claude janitorial tasks, etc).

I might like woodworking as a hobby (for example), but I sure as heck don't want to be a carpenter or to depend on my ability to hand craft enough widgets people like to survive

reply
I differenciate between things you have to do (work) and things you want to do. Work means someone else is telling you your priorities.

If you want to write code and think, you would be welcome in my utopian vision.

But when i write code, its business shit. And its business shit someoneelse already solved a few times.

reply
The weights are code, the prompt is code, the output is code.

Is the meat code?

reply
The data is the code. Training algorithm is the compiler. The weights are the byte code produced to run on the inference VM.
reply
The data is the code is the data. Reality has no distinction between "data" and "code". These terms are categories we impose on systems we design, to make it easier for us to build and reason about them, but they're nothing but mere opinions, and depend less on the system structure, and more on the perspective of the person asking which is which.

This is related to, and possibly equivalent with, the core point of both this story and the original one: computation is independent from substrate.

You can build a computer out of anything, whether it's semiconductors or lasers or meat or magnetic fields or water flowing downhill or abstract thought, and that computer will happily perform the same computation as every other equivalent construct from whatever substrate. That's because computers are ultimately made of math, and we design "real ones" by finding ways to approximate the mathematical constraints with physical systems. But the choice of how to map the math to physical systems is completely arbitrary, and any such mappings are equivalent from POV of information processing ability.

(Of course substrate is not arbitrary from economic POV, which is why we build most of our computers out of silicon and plastic, and make it work with electric current and lasers.)

reply
> Reality has no distinction between "data" and "code"

yes, yes, ostensibly the universe is built on lisp.

But we all know that it was hacked together with a lot of perl[1].

[1] you all know the reference.

reply
One of the best thing I done for my career (as a self taught software developer, but with a degree in electronic engineering) is to learn computation theory.

Computation is math (and a very restricted subset of math). It’s mostly specific sequences of sets manipulation. What sets and what manipulations are defined by people, not by the idea of computation.

The best thing is that as soon as you specify the sequences of manipulation, it become a a set that you can manipulate. That can be a difficult concept to grasp, but that’s what helps in designing notation that are more appropriate for the human mind to describe a solution for a specific problem.

reply
Yes. Is it data? Yes.

Is the distinction between "code" and "data" just someone's opinion? Yes. There is no such distinction in reality.

reply
This is a good model. If you take an old ROM dump from a video game, it's just a pile of bits. You don't know what bits represent code, what represent an image, what represent text, etc. You have to analyze them contextually to actually figure out what is code and what is "data" in context, because without context they are truly one and the same.
reply
That's why encountering something like LISP for the first time (by writing a LISP interpreter, for example) creates a big bang event in form of an imminent intellectual catharsis. People who encountered it just once, will never be able to see the world through the old "meaty" lenses afterwards.
reply
deleted
reply
Is matter code? There is some sort of computation happening in space over time.
reply
By Fermat's principle, a ray of light has to know where it will ultimately end up before it can choose the direction to begin moving in.

So either something is computing it or some exploration is happening at quantum level and we just see the final result.

reply
Fermat's principle is an outcome of constructive interference of waves. It works both for classical and quantummechanical descriptions. E.g. check https://phys.libretexts.org/Bookshelves/University_Physics/U...
reply
> a ray of light has to know where it will ultimately end up before it can choose the direction to begin moving in

A ray of light doesn't know or choose because it has no agency, just like an apple doesn't know or decide to fall because of gravity. It's an anthropomorphization.

reply
True, so the interference is the "computation"(heavy emphasis on quotes) which gives rise to the principle.
reply
> a ray of light has to know where it will ultimately end up before it can choose the direction to begin moving in

I'm no physicists, so I guess I'll ask it: Why?

Also related, why do some ray of light then "see" a black hole yet decide to head into them anyways, if they saw it before they went in that direction? Seems like a dumb move :)

reply
Its future isn't over there because it moves in that direction, instead it moves in that direction because its future lies over there.

Relatedly:

> [General Relativity] basically says that the reason you are sticking to the floor right now is that the shortest distance between today and tomorrow is through the center of the Earth.

https://physics.stackexchange.com/questions/250800/gr-and-my...

reply
Does anyone have a link to a good video visualisation of training & inference?
reply
3 blue 1 brown has a great visual introduction to transformers, the heart of LLMs.

It's chapter 5. Start at chapter 1 if you want more background on neural nets and backprop.

https://youtu.be/wjZofJX0v4M?si=HFXbrB-5cArprGaU

reply
Yes, yes, but what fertile fallacies and common misunderstanding can politicians use to acquire more power via exploiting the difference between the common person's flawed understanding due to cargo culting, cognitive biases, and/or outdated or inappropriate analogies vs actual reality? Is there any way we can get the AI to say give all political power to narrator is the solution to all problems and use the common person's mistaken worship of AI as a spiritual all knowing conscious being with unusual sensitivity and caring about everyone to cement that power? Certainly one of you eggheads can tweak that for me? What? It's against your ethics? We're trying to save the world here. Here, let me call up Bernie Sanders to propose nationalizing half your companies so we can do that.
reply