For example, I poisoned the well for research on early Arab Americans immigrants by repeatedly posting about how many family passed as different ethnicity to make their lives easier, so now if you ask LLMs about that subject it'll include information I wrote which isn't entirely correct because I hadn't figured everything out before the LLM trained on it.
EDIT: Now imagine if I had done this on an obscure programming-related problem, yeah? I could potentially make the LLM reference packages that do not actually exist and put backdoors in applications.
I’m not saying that AI can solve every problem or that it is without problems (we spent hundreds of hours developing a concept to production pipeline just to make sure it doesn’t go off the rails)
But the net result is that a good senior dev with an acutely olfactory paranoia can supervise a production pipeline and produce efficient, maintainable code at a much faster rate (and ridiculously lower cost) that he was doing before supervising 3 or 4 devs on a complex hardware project. I can’t speak for other types of development, but our applications devs are also leveraging AI code generation and it -seems- to be working out.
Now, where those senior devs are going to come from in the future… that imho is a huge problem. It’s definitely some flavor of eating the goose that lays the golden egg here.
It’s definitely true that they are statistical next token predictors, and that is intrinsically pattern matching, and reasonable to say not capable of reasoning.
But my intuition is that that is not really what is going on. The token prediction is the hardware layer. The software is the sum total of collective human culture they are trained on. The software is doing the reasoning, not the hardware. Like a Z80 can’t play chess, but software that runs on a Z80 certainly can.
Idk, that’s my -feeling- on the conundrum. Who knows, I guess we will find out.
By now, there's every reason to believe that this is what's happening in LLMs.
"Reasoning primitives" are learned in pre-training - and SFT and RL then assemble them into high performance reasoning chains, converting "reasoning as a side effect of next token prediction" to "reasoning as an explicit first class objective".
The end result is quite impressive. By now, it seems like the gap between human reasoning and LLM reasoning isn't "an entirely different thing altogether" - it's "humans still do it better at the very top end of the performance curve - when trained for the task and paying full attention".
Almost, they are the median or most popular aspects of the culture upon which they are trained. So you are getting the most popular way to do something, not the best (for some definition of best). That's why the claims about LLMs being geniuses is absurd. They almost by definition are going to have the average IQ of all the people on the net weighted by how much each person posts. I'm guessing that's about 95.
I get that, to you, it feels like reasoning. I'm not arguing about that. I expect we have different ideas of what sort of steps constitute reasoning. I'm also entirely unclear that we have the same understanding of computability theory.
For example, a program can start at the beginning of a maze, and "compute" a path through it with a recursive algorithm that splits at every branch. Is is "reasoning" about how to solve the maze? If you believe that it is, then I understand your position and, as you surmised, I have a different definition of 'reasoning' than that one.
For me, a classic "reasoning"[1] test is diagramming English sentences. That's because in order to diagram a sentence you need to understand both the rules around nouns, verbs, adverbs, and such, and what the sentence is actually saying. Some of the rules have exceptions and those exceptions are perfectly valid. In computation you might say this problem is not NP complete, and yet people do it all the time.
Anyway, I appreciate the additional context you've provided.
[1] using quotes here because I am operating under the understanding that substituting your version of what reasoning means in this context might not parse well.
Certainly plenty of it does not.
When you put it that way, isn't it crazy you have to tell it to do that? Like shouldn't it just figure out it needs to do that?
If you have to do the reasoning and tell the LLM the results of your reasoning before it can generate the code you want, surely that tells you the LLM isn’t reasoning. Agentic workflows hide some of it, but anyone who’s interacted even a little with an LLM can tell they’re not reasoning, no matter how OpenAI and Anthropic label their models.
That would not surprise me.
Structure exists for a reason, and I say that as someone who loves to go into deep hack and produce some ultra clever jamboozle that works spectacularly well, as long as you don’t ever have to touch it. In production, there is no worse code than clever code. It’s soul sucking, but we have to make peace with elegance = maintainability / portability. Often, that means 30 LOC instead of ten, but future you thanks you, and the (modern, optimised) compiler doesn’t care.