undefined

upvote

points

by bensyverson20 hours ago |

upvote

by user_783219 hours ago|

[-]

> I love that we're still learning the emergent properties of LLMs!

TBH, this is (very much my opinion btw) the least surprising thing. LLMs (and especially their emergent properties) are still black boxes. Humans have been studying the human brain for millenia, and we are barely better at predicting how humans work (or for eg to what extent free will is a thing). Hell, emergent properties of traffic was not understood or properly given attention to, even when a researcher, as a driver, knows what a driver does. Right now, on the front page, is this post:

> 14. Claude Code Found a Linux Vulnerability Hidden for 23 Years (mtlynch.io)

So it's pretty cool we're learning new things about LLMs, sure, but it's barely surprising that we're still learning it.

(Sorry, mini grumpy man rant over. I just wish we knew more of the world but I know that's not realistic.)

reply

upvote

by AlphaAndOmega017 hours ago|

[-]

I'm a psychiatry resident who finds LLM research fascinating because of how strongly it reminds me of our efforts to understand the human brain/mind.

I dare say that in some ways, we understand LLMs better than humans, or at least the interpretability tools are now superior. Awkward place to be, but an interesting one.

reply

upvote

by p1esk17 hours ago|

[-]

LLMs are orders of magnitude simpler than brains, and we literally designed them from scratch. Also, we have full control over their operation and we can trace every signal.

Are you surprised we understand them better than brains?

reply

upvote

by danielmarkbruce15 hours ago|

[-]

"Designed" is a bit strong. We "literally" couldn't design programs to do the interesting things LLMs can do. So we gave a giant for loop a bunch of data and a bunch of parameterized math functions and just kept updating the parameters until we got something we liked.... even on the architecture (ie, what math functions) people are just trying stuff and seeing if it works.

reply

upvote

by JoshTriplett5 hours ago|

[-]

https://xkcd.com/1838/

reply

upvote

by batshit_beaver14 hours ago|

[-]

> We "literally" couldn't design programs to do the interesting things LLMs can do.

That's a bit of an overstatement.

The entire field of ML is aimed at problems where deterministic code would work just fine, but the amount of cases it would need to cover is too large to be practical (note, this has nothing to do with the impossibility of its design) AND there's a sufficient corpus of data that allows plausible enough models to be trained. So we accept the occasionally questionable precision of ML models over the huge time and money costs of engineering these kinds of systems the traditional way. LLMs are no different.

reply

upvote

by danielmarkbruce12 hours ago|

[-]

Saying ML is a field where deterministic code would work just fine conveniently leaves out the difficult part - writing the actual code.... Which we haven't been able to do for most of the tasks at hand.

What you are saying is fantasy nonsense.

reply

upvote

by astrange10 hours ago|

[-]

They did not leave it out.

> but the amount of cases it would need to cover is too large to be practical (note, this has nothing to do with the impossibility of its design)

reply

upvote

by yunnpp12 hours ago|

[-]

> would work just fine, but the amount of cases it would need to cover is too large to be practical

So it doesn't work.

reply

upvote

by idiotsecant11 hours ago|

[-]

And all you have to do is write an infinite amount of code to cover all possible permutations of reality! No big deal, really.

reply

upvote

by growpdifjkl11 hours ago|

[-]

[flagged]

reply

upvote

by AlphaAndOmega010 hours ago|

[-]

I'm a psychiatry resident who has been into ML since... at least 2017. I even contemplated leaving medicine for it in 2022 and studied for that, before realizing that I'd never become employable (because I could already tell the models were getting faster than I am).

You would be sorely mistaken to think I'm utterly uninformed about LLM-research, even if I would never dare to claim to be a domain expert.

reply

upvote

by jeremyjh15 hours ago|

[-]

We've been studying brains a lot longer. LLMs are grown, not built. The part that is designed are the low-level architecture - but what it builds from that is incomprehensible and unplanned.

reply

upvote

by da_chicken2 hours ago|

[-]

It's not that much longer, really.

LLMs draw origins from, both n-gram language models (ca. 1990s) and neural networks and deep learning (ca. 2000). So we've only had really good ones maybe 6-8 years or so, but the roots of the study go back 30 years at least.

Psychiatry, psychology, and neurology on the other hand, are really only roughly 150 years old. Before that, there wasn't enough information about the human body to be able to study it, let alone the resources or biochemical knowledge necessary to be able to understand it or do much of anything with it.

So, sure, we've studied it longer. But only 5 times longer. And, I mean, we've studied language, geometry, and reasoning for literally thousands of years. Markov chains are like 120 years old, so older than computer science, and you need those to make an LLM.

And if you think we went down some dead-end directions with language models in the last 30 years, boy, have I got some bad news for you about how badly we botched psychiatry, psychology, and neurology!

reply

upvote

by ctoth13 hours ago|

[-]

> Also, we have full control over their operation and we can trace every signal. Are you surprised we understand them better than brains?

Very, monsieur Laplace.

reply

upvote

by 12 hours ago|

[-]

deleted

reply

upvote

by evilduck14 hours ago|

[-]

To be fair to your field, that advancement seems expected, no? We can do things to LLMs that we can't ethically or practically do to humans.

reply

upvote

by AlphaAndOmega010 hours ago|

[-]

I'm still impressed by the progress in interpretability, I remember being quite pessimistic that we'd achieve even what we have today (and I recall that being the consensus in ML researchers at the time). In other words, while capabilities have advanced at about the pace I expected from the GPT-2/3 days, mechanistic interpretability has advanced even faster than I'd hoped for (in some ways, we are very far from completely understanding the ways LLMs work).

reply

upvote

by bensyverson18 hours ago|

[-]

Learning about the emergent properties of these black boxes is not surprising, but it's also not daily. I think every new insight is worth celebrating.

reply

upvote

by TeMPOraL18 hours ago|

[-]

Indeed. For me, it's also a good reminder that AI is here to stay as technology, that the hype and investment bubble don't actually matter (well, except to those that care about AI as investment vehicle, of which I'm not one). Even if all funding dried out today, even if all AI companies shut down tomorrow, and there are no more models being trained - we've barely begun exploring how to properly use the ones we have.

We have tons of low-hanging fruits across all fields of science and engineering to be picked, in form of different ways to apply and chain the models we have, different ways to interact with them, etc. - enough to fuel a good decade of continued progress in everything.

reply

upvote

by bathtub36518 hours ago|

[-]

AI has been here to stay for decades

reply

upvote

by TeMPOraL18 hours ago|

[-]

Maybe, but you couldn't tell that these days, casually scrolling this or any other tech-oriented discussion board.

reply

upvote

by ethin16 hours ago|

[-]

I mean... You could? AI comes in all kinds of forms. It's been around practically since Eliza. What is (not) here to stay are the techbros who think every problem can be solved with LLMs. I imagine that once the bubble bursts and the LLM hype is gone, AI will go back to exactly what it was before ChatGPT came along. After all, IMO it's quite true that the AIs nobody talks about are the AIs that are actually doing good or interesting things. All of those AIs have been pushed to the backseat because LLMs have taken the driver and passenger seats, but the AIs working on cures for cancer (assuming we don't already have said cure and it just isn't profitable enough to talk about/market) for example are still being advanced.

reply

upvote

by darkwater15 hours ago|

[-]

Saying that LLMs will disappear once the financial hype desinflate is like saying that LLMs are the answer to everything.

reply

upvote

by 59nadir1 hours ago|

[-]

Personally I read the GP post with more emphasis on this bit:

> What is (not) here to stay are the techbros who think every problem can be solved with LLMs.

LLMs are in all likelyhood here to stay, but the scumbags doing business around them right now are hopefully going away eventually.

reply

upvote

by darkwater1 hours ago|

[-]

I agree on that part as well, but saying that AI will go back at what it was before ChatGPT came along is false. LLM will still be a standalone product and will be taken for granted. People will (maybe? hopefully?) eventually learn to use them properly and not generate tons of slop for the sake of using AI. Many "AI companies" will disappear from the face of Earth. But our reality has changed.

reply

upvote

by user_783216 hours ago|

[-]

Oh I very much agree that it's great to see more research and findings and improvements in this field. I'm just a little puzzled by GP's tone (which suggested that it isn't completely expected to find new things about LLMs, a few years in).

reply

upvote

by bensyverson16 hours ago|

[-]

I'm the GP! lol… Not sure how you got that from my tone, but I find these discoveries expected but not routine, and also interesting.

reply

upvote

by amelius18 hours ago|

[-]

Studies of LLMs belong in their own field of science, just like psychology is not being studied in the physics department.

reply

upvote

by guelo12 hours ago|

[-]

¸That field is called Machine Learning.

reply

upvote

by osigurdson12 hours ago|

[-]

That is a very interesting thought!

reply

upvote

by littlestymaar15 hours ago|

[-]

Interestingly enough, for a while physics used to be studied by philosophers (and used to be put in the natural philosophy basket, together with biology and most other hard sciences).

reply

upvote

by zer00eyz17 hours ago|

[-]

The intersection of physics isnt psychology it is philosophy, and the same is true (at present) with LLM's

Much as Diogenes mocked Platos definition of a man with a plucked chicken, LLM's revealed what "real" ai would require: contigous learning. That isnt to diminish the power of LLM's (the are useful) but that limitation is a fairly hard one to over come if true AGI is your goal.

reply

upvote

by andai15 hours ago|

[-]

Is it because we haven't invented something better than backpropagation yet?

From what I understand, a living neural network learns several orders of magnitude more efficiently than an artificial one.

I'm not sure where that difference comes from. But my brain probably isn't doing back propagation, it's probably doing something very different.

reply

upvote

by astrange10 hours ago|

[-]

Your brain is doing several different things, because there are different parts of your brain.

(eg different kinds of learning for long-term memory, short-term memory, languages, faces and reflexes.)

reply

upvote

by quantummagic13 hours ago|

[-]

What is "contigous" learning, and why is it a hard requirement of AGI?

reply

upvote

by amelius17 hours ago|

[-]

What do you mean by the intersection of physics?

The intersection of what with physics?

reply

upvote

by zer00eyz16 hours ago|

[-]

The intersection of disciplines.

Sir Roger Penrose, on quantum consciousness (and there is some regret on his part here) -- OR -- Jacob Barandes for a much more current thinking on this sort of intersectional exploratory thinking.

reply

upvote

by Invictus017 hours ago|

[-]

To say we've been studying the brain for millennia is an extreme exaggeration. Modern neuroscience is only about 50 years old.

reply

upvote

by user_783216 hours ago|

[-]

I hate to "umm, akshually" but apparently we have been studying the brain for thousands of years. I wasn't talking about purely modern neuroscience (which ironically for our topic of emergence, (often till recently/still in most places) treats the brain as the sum of its parts - be them neurons or neurotransmitters).

> The earliest reference to the brain occurs in the Edwin Smith Surgical Papyrus, written in the 17th century BC.

I was actually thinking of ancient greeks when writing my comment, but I suppose Egyptians have even older records than them.

From https://en.wikipedia.org/wiki/History_of_neuroscience

reply

upvote

by Invictus011 hours ago|

[-]

None of that counts as studying the brain. It's like saying rubbing sticks together to make fire counts as studying atomic energy. Those early "researchers" were hopelessly far away from even the most tangential understanding of the workings of the brain.

reply

upvote

by timcobb17 hours ago|

[-]

I came here to say this :)

reply

upvote

by vova_hn216 hours ago|

[-]

I've always thought that it is kinda weird that we spend exactly the same amount of compute to calculate both "fork" tokens and "lock" tokens.

I think that with grammar-aware sampling / constrained decoding [0][1] it is possible to sometimes skip calling the model altogether if only one token is allowed by grammar and just insert it, but I don't think that any of the current, widely used combinations of models/harnesses use it. And it only skips inference in rare edge cases.

I wonder if there is a more general solution that can make models spend more compute on making important choices, while making generation of the "obvious" tokens cheaper and faster.

[0] https://github.com/ggml-org/llama.cpp/blob/master/grammars/R...

[1] https://developers.redhat.com/articles/2025/06/03/structured...

reply

upvote

by jameshart16 hours ago|

[-]

Give coding agents access to intellisense and syntax highlighting.

Making coding agents spit out syntactically correct code token by token is like asking a human to code on a whiteboard.

reply

upvote

by vova_hn216 hours ago|

[-]

Yeah, I was also thinking about it A LOT.

We kinda have a little bit of it with some coding harnesses giving model access to LSP, but I think that we can insert this knowledge on a lower level if we find a clever way to somehow utilize it during sampling.

I think that there is a lot of low hanging fruit in this area.

And in general, I think that people try to use LLMs too much to solve problems that can be easily solved by cheaper (computationally), and, more importantly deterministic tools.

For example, back in the day when LLM-assisted coding just became a thing people very often complained about models generating syntactically incorrect code and inventing non-existent library methods.

Well, I, an experienced human programmer, probably would also be making syntax mistakes and inventing non-existent methods if you stripped me of my tools and made me write code in a bare text editor without syntax highlighting.

Thankfully, my IDE would autocomplete real syntax and actually existing library methods for me and immediately give me feedback if I make a mistake anyway. And all of it is achieved using reliable deterministic code without the inherent issues of statistical models.

I think that it is really inefficient to reach for an expensive and unreliable tool when a cheap and reliable tool will do.

reply

upvote

by orbital-decay39 minutes ago|

[-]

You're describing structured outputs.

reply

upvote

by jwolfe16 hours ago|

[-]

In general these agents support LSPs, which is often as much information as your IDE will give you. They are also not required to output syntactically correct code token by token when running agentically, because the loop is:

1. code

2. syntax check / build / format / lint (details language dependent)

3. test

and they can hop between 1 and 2 however many times they want.

reply

upvote

by tadfisher15 hours ago|

[-]

Doing a tool call for autocomplete is not going to make coding agents faster.

I do think there is some merit in a tool that dumps all namespaces and reachable symbols so the agent can do its own autocomplete without a round-trip.

reply

upvote

by jameshart11 hours ago|

[-]

Doesn’t need to be a tool call.

As a human coder you don’t summon intellisense. It’s just popped up into your visual field as extra input - contextual cues.

You could force intellisense state into the context vector the LLM receives.

reply

upvote

by foota7 hours ago|

[-]

Not really, because the LLM loop doesn't have the ability to get updates from the agent live. It would have to somehow be integrated all the way down the stack.

reply

upvote

by jameshart7 hours ago|

[-]

LLMs can have whatever abilities we build for them. The fact we currently start their context out with a static prompt which we keep feeding in on every iteration of the token prediction loop is a choice. We don’t have to keep doing that if there are other options available.

reply

upvote

by sgbeal16 hours ago|

[-]

> Give coding agents access to intellisense and syntax highlighting.

i once asked an LLM if it could ingest code from an interactive session more easily if it were in appropriately-typed markdown fences and it said absolutely yes, and that the syntax highlighting fed to it that way helps it immensely. i was downright shocked that syntax highlighting was anything more than noise for them.

reply

upvote

by devmor12 hours ago|

[-]

Why would this be surprising? That’s exactly how much of the code they were trained on is presented in PRs, Forums, etc.

reply

upvote

by astrange10 hours ago|

[-]

Is that true? That depends on how their web scraping works, like whether it runs client-side highlighting, strips out HTML tags, etc.

reply

upvote

by devmor6 hours ago|

[-]

The highlighting isn't what matters, its the pretext. E.g. An LLM seeing "```python" before a code block is going to better recall python codeblocks by people that prefixed them that way.

reply

upvote

by 9 hours ago|

[-]

deleted

reply

upvote

by olejorgenb10 hours ago|

[-]

> I wonder if there is a more general solution that can make models spend more compute on making important choices, while making generation of the "obvious" tokens cheaper and faster.

I think speculative decoding count as a (perhaps crude) way implementing this?

reply

upvote

by quotemstr12 hours ago|

[-]

> I wonder if there is a more general solution that can make models spend more compute on making important choices

There's a lot of work going on in various streams towards making it possible to vary compute per-token, dynamically, e.g. universal transformers. Maybe one day it'll work well enough to beat conventional techniques.

reply

upvote

by khalic19 hours ago|

[-]

Another example of the mindf@#$ these systems are: I was doing some fine tuning to a small model, take data fields and make a sentence out of it. I was running into mode collapse (basically when the AI simplifies too much and always output the same thing).

I got unstuck by randomizing the field order for each row?!? At training, and now I'm thinking I should do the same at inference time...

reply

upvote

by p_stuart8217 hours ago|

[-]

the irony of modern software engineering: we spent decades perfecting deterministic algorithms, and now we're basically just shaking a black box and hoping the magic rocks align.

reply

upvote

by darkhorse22210 hours ago|

[-]

Quantum physics teaches us that at the fundamental levels of physics, reality itself is probabilistic. Probability distributions collapsing to discrete locations aligns nicely across LLMs and quantum mechanics.

reply

upvote

by khalic16 hours ago|

[-]

It's a little disturbing, but also very fun to just discover by probing, building and breaking.

reply

upvote

by astrange10 hours ago|

[-]

This is an AI bot btw. (sarcasm, metaphor that doesn't make sense)

reply

upvote

by khalic10 hours ago|

[-]

Me or the new account?

reply

upvote

by astrange9 hours ago|

[-]

Not you!

reply

upvote

by auspiv16 hours ago|

[-]

apparently you can straight up duplicate/add/rearrange layers without changing any of the weights and get better results as well - https://dnhkng.github.io/posts/rys/

reply

upvote

by quotemstr12 hours ago|

[-]

Neat!

> This is probably due to the way larger numbers are tokenised, as big numbers can be split up into arbitrary forms. Take the integer 123456789. A BPE tokenizer (e.g., GPT-style) might split it like: ‘123’ ‘456’ ‘789’ or: ‘12’ ‘345’ ‘67’ ‘89’

One of the craziest LLM hacks that doesn't get love is https://polymathic-ai.org/blog/xval/

xVal basically says "tokenizing numbers is hard: what if instead of outputting tokens that combine to represent numbers, we just output the numbers themselves, right there in the output embedding?"

It works! Imagine you're discussing math with someone. Instead of saying "x is twenty five, which is large" in words, you'd say "x is", then switch to making a whistling noise in which the pitch of your whistle, in its position within your output frequency range, communicated the concept of 25.00 +/- epsilon. Then you'd resume speech and say "which is large".

I think the sentiment is that today's models are big and well-trained enough that receiving and delivering quantities as tokens representing numbers doesn't hurt capabilities much, but I'm still fascinated by xVal's much more elegant approach.

reply

upvote

by khalic11 hours ago|

[-]

I was having some issues with IP addresses representation, this might solve it

reply

upvote

by khalic15 hours ago|

[-]

This is crazy, thank you for the link!

reply

upvote

by toddmorey18 hours ago|

[-]

wow that's fascinating

reply

upvote

by stingraycharles20 hours ago|

[-]

Seems like this is true for not just code but for all content being generated? Albeit for code it’s more well-defined, but the fork / lock mechanism works for a lot more problem domains.

reply

upvote

by bensyverson20 hours ago|

[-]

That would seem intuitively true; it certainly applies to written language, where a clause could go off in another direction, but at other positions the correct grammar/syntax is unambiguous.

reply

upvote

by bryanrasmussen20 hours ago|

[-]

thinking - well if we think of lock as happening in a narrative, then I think we can see there can be points where "everything you know is wrong" which essentially allows you to go back into a sort of fork mode and work towards another lock.

Completely artistic creation, creating something that does not exist and that cannot produce things out of itself, means that locking can be more diffuse, not as settled.

reply

upvote

by stingraycharles19 hours ago|

[-]

I think this seems similar to what Anthropic had been doing since the latest few Opus releases, which is interleaved thinking; CoT reasoning in the middle of a message. But they operate at different layers.

reply

upvote

by orbital-decay18 hours ago|

[-]

One relevant thing is that these forks are unnaturally narrow in all models, and rather resemble locks (not quite but close). From multiple possible continuations models tend to prefer just a couple, i.e. the model is a lot less random than it should be. That's why you're seeing annoying slop in writing and instantly recognizable color schemes in vibecoded sites. Lack of diversity probably limits the usefulness of this method as well.

>I love that we're still learning the emergent properties of LLMs!

There are tons of low-hanging fruits there.

reply

upvote

by p_stuart8217 hours ago|

[-]

it feels like the modern recurrence of the early 2010s bootstrap templates. we figured out how to automate building sites instantly, but at the cost of making the entire web look exactly the same.

reply

upvote

by DavidPiper19 hours ago|

[-]

Sounds just like John Cleese's "Open Mode" and "Closed Mode" - https://www.youtube.com/watch?v=Pb5oIIPO62g

reply

upvote

by sdwr8 hours ago|

[-]

What's cool is that they aren't adjusting the temperature of the model live, or predicting/labeling any of the fork/lock points.

reply

upvote

by nostrebored15 hours ago|

[-]

Could we not get the same with EAFT? Maybe that’s what it’s doing but definitely not the first to think “let’s lock in high probability solutions”

In nemotron the high perplexity solutions are selected for RL, in VLM training a few people are looking at the entropy distributions of the training set, etc

reply

upvote

by robocat11 hours ago|

[-]

> In other words, just like us

I think you are implying a reverse causation. They used a metaphor from us.

reply

upvote

by michaelbuckbee19 hours ago|

[-]

I don't really understand the internal mechanics of of this, but my first thought was why not combine this with a linter/tests. So that it produces all the forks and only keeps the syntactically correct ones.

reply

upvote

by mrtesthah16 hours ago|

[-]

That’s going to be inefficient when most of the generations have broken syntax and can’t even parse.

reply

upvote

by TacticalCoder18 hours ago|

[-]

> What this paper shows is that their simple technique (SSD)

"Simple Self-Distillation". We had an acronym for Solid-State Drive. Don't know about that technique but the naming sure sound.. Simple?

reply