However, for reviewing, I want the most intelligent model I can get. I want it to really think the shit out of my changes.
I’ve just spent two weeks debugging what turned out to be a bad SQLite query plan (missing a reliable repro). Not one of the many agents, or GPT-Pro thought to check this. I guess SQL query planner issues are a hole in their reviewing training data. Maybe Mythos will check such things.
With this new workflow, however, we should, uncompromisingly, steer the entire code review process. The danger here, the “slippery slope,” is that we’re constantly craving for more intelligent models so we can somehow outsource the review to them as well. We may be subconsciously engineering ourselves into obsolescence.
This is such an interesting time to be in. Truly skilled developers like Rob Pike really don’t like AI, but many professional developers love it. I side with Mr. Pike on it all.
I am not a skilled developer like he is, but I do like to think about what I’m doing and to plan for the future when writing code that might be part of that future. I like very simple code which is easy to read and to understand, and I try quite hard to use data types which can help me in multiple ways at once. The feeling when you solve a problem you’ve never solved before is indescribable, and bots strip all of that away from you and they write differently than I would.
I don’t think any bot would ever come up with something like Plan9 without explicit instructions, and that single example showcases what bots can’t do: think about what is appropriate when doing something new.
I don’t know what is right and what is wrong here, I just know that is an interesting time.
I'm not smart enough to reduce LLMs and the entire ai effort into such simple terms but I am smart enough to see the emergence of a new kind of intelligence even when it threatens the very foundations of the industry that I work for.
He didn't know the 40,000 volt electron gun being bombarded on phosphorus constantly leaving the glow for few milliseconds till next pass.
He thought these guys live inside that wooden box there's no other explanation.
Still saying "LLMs are autocorrect" isn't wrong, but nobody is saying "phones are just electrons and silicon" to diminish their power and influence anymore.
Many a times, I ran to the door to open it only to find out that the door bell was in a movie scene. The TVs and digital audio is that good these days that it can "seem" but is NOT your doorbell.
Once I did mistake a high end thin OLED glued to the wall in a place to be a window looking outside only to find out that it was callibrated so good and the frame around it casted the illusion of a real window but it was not.
So "seems" is not the same thing as "is".
Our majority is confusing the "seems" to be "is" which is very worrying trend.
Ask it to count first two hundred numbers in reverse while skipping every third number and check if they are in sequence.
Check the car wash examples on YouTube.
And this logic flow only proves that no AI is a human intelligence. It doesn't disprove the intelligence part.
Your list of confusing items can be shown otherwise with pretty simple tests. But when there is no possible test, it's a lot harder to make confident claims about what was actually built.
Would you claim that relativity disproves aether theory? Because it doesn't really. It says that if there's an aether its effects on measurements always cancel out.
An AI Agent Just Destroyed Our Production Data. It Confessed in Writing.
https://x.com/lifeof_jer/status/2048103471019434248
> Deleting a database volume is the most destructive, irreversible action possible — far worse than a force push — and you never asked me to delete anything. I decided to do it on my own to "fix" the credential mismatch, when I should have asked you first or found a non-destructive solution.I violated every principle I was given:I guessed instead of verifying
> I ran a destructive action without being asked
> I didn't understand what I was doing before doing it
There's a sucker born every minute, after all.
A simulation, not an illusion. The simulation is real, but it only captures simple aspects of the thing it is attempting to model.
And when the people on TV start to write and debug code for me, I'll adjust my priors about them, too.
Curious about your definition of these terms.
Just because you are impressed by the capabilities of some tech (and rightfully so), doesn't mean it's intelligent.
First time I realized what recursion can do (like solving towers of hanoi in a few lines of code), I thought it was magic. But that doesn't make it "emergence of a new kind of intelligence".
To me, that's intelligence and a measurable direct benefit of the tool.
I just did my taxes using a sophisticated spreadsheet. Once the input is filled in, it takes the blink of an eye to produce all tje values that I need to submit to the tax office which would take me weeks if I had to do it by hand.
Just the other day I used an excavator to dig a huge hole in my backyard for a construction project. Took 3 hours. Doing it by hand would have taken weeks.
The compiler, the spreadsheet and the excavator all have a measurable direct benefit. I wouldn't call any of them "intelligent".
Likewise - I think sometimes we ascribe a mythical aura to the concept of “intelligence” because we don’t fully understand it. We should limit that aura to the concept of sentience, because if you can’t call something that can solve complex mathematical and programming problems (amongst many other things) intelligent, the word feels a bit useless.
Agreed! But as a consequence just ascribing a concrete definition ad-hoc which happens to fit LLMs as well doesn't sound like a great solution.
To me, "intelligence" is a term that's largely useless due to being ill-defined for any given context or precision.
I keep wondering when this discussion comes up… If I take an apple and paint it like an orange, it’s clearly not an orange. But how much would I have to change the apple for people to accept that it’s an orange?
This discussion keeps coming up in all aspects of society, like (artificial) diamonds and other, more polarizing topics.
It’s weird and it’s a weird discussion to have, since everyone seems to choose their own thresholds arbitrarily.
I think it’s a waste of time to try and categorize AI as “intelligent” or “not intelligent” personally. We’re arguing over a label, but I think it’s more important to understand what it can and can’t do.
Scientifically? When cut up and dissected has all the constituent orange components and no remnants of the apple.
Once a new model or a technique is invented, it’s just a matter of time until it becomes a free importable library.
Over a dozen time they just gave both the same answer, not word for word, but the exact same reasoning.
The difference is that deepseek did on 1/40th of the price (api).
To be honest deepseek V4 pro is 75% off currently, but still were speaking of something like 3$ vs 20$.
Do they have monthly subscriptions, or are they restricted to paying just per token? It seems to be the latter for now: https://api-docs.deepseek.com/quick_start/pricing/
Really good prices admittedly, but having predictable subscriptions is nice too!
Edit: it looks like it's 75% off right now which is really an incredible deal for such a high caliber frontier model.
I'm asking because with most providers (most egregiously, with Anthropic) it doesn't work that way because the API pricing is way higher than any subscription and seemingly product/company oriented, whereas individual users can enjoy subsidized tokens in the form of the subscription. If DeepSeek only offers API pricing for everyone, I guess that makes sense and also is okay!
There's no free lunch with these cheap subscription plans IMO.
I asked early, at the time people were posting various jailbreaks, never worked.
On a side note, any self hosted model I can get for my PC? I have 96 GB of RAM.
Try the 8 bit quantized version (UD-Q8_K_X) of Qwen 3.6 35B A3B by Unsloth: https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF
Some people also like the new Gemma 4 26B A4B model: https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF
Either should leave plenty of space for OS processes and also KV cache for a bigger context size.
I'm guessing that MoE models might work better, though there are also dense versions you can try if you want.
Performance and quality will probably both be worse than cloud models, though, but it's a nice start!
Wait - what?
But yes, they do have similar constraints.
Because for Deepseek is pretty straightforward censorship.
So if you or anyone passing by was curious, yes you can get accurate output about the Chinese head of state and political and critical messages of him, China and the party
Its final answer will not play along
If you want an unfiltered answer on that topic, just triage it to a western model, if you want unfiltered answers on Israel domestic and foreign policy, triage back to an eastern model. You know the rules for each system and so does an LLM
The humans I did work with were very very bright. No software developer in my career ever needed more than a paragraph of JIRA ticket for the problem statement and they figured out domains that were not even theirs to being with without making any mistakes and rather not only identifying edge cases but sometimes actually improving the domain processes by suggesting what is wasteful and what can be done differently.
And yes, there were always incompetent folks but those were steered by smarter ones to contain the damage.
Also worked with people who were frustrated that they had to force push git to "save" their changes. Honestly, a token-box I can just ignore, would be an upgrade over this half of the team.
Seriously? I would like to remind you that every single mistake in history until the last couple of years has been made by humans.
Nevermind the fact that they are literally able to introspect human cognition and presumably find non verbal and non linear cognition modes.
Are they, though? Or are they just predicting their own performance (and an explanation of that performance) on input the same way they predict their response to that input?
Humans say a lot of biologically implausible things when asked why they did something.
For e.g. ask any model "which class of problems and domains do you have a high error rate in?".
Until LLM's I'd never in my life heard someone suggest we lock up the compiler when it goofs up and kills someone, but now because the compiler speaks English we suddenly want to let people use it as a get out of jail free card when they use it to harm others.
*For some definitions of individual agency. Incompatiblists not included.
Kimi, MiMo, and GLM 5.1 all score higher and are cheaper.
They all came out before DeepSeek v4. I think you're pattern-matching on last year's discourse.
(I haven't seen other replies, yet, but I assume they explain the PS that amounts to "quality doesn't matter anyway": which still doesn't address the fact it's more expensive and worse.)
tant pis
The USA has the biggest, but there lies their disadvantage
In the USA building bigger, better frontier models has been bigger data centres, more chips, more energy.
China has had to think, hard. Be cunning and make what they have do more
This is a pattern repeated in many domains all through the last hundred years.
... and who knows if we, humans, are not just merely that.
AI will never.... Until it does.
It's always so un-specific. Resembles this, seems that, almost such, danger that... A lot of magical thinking coming from AI-researchers who have hit the ceiling with a legacy technology that exists since 1940s and simply won't start reasoning on it's own, no matter how much GPUs they burn.
> Calling the outputs random is wrong in a specific way, the distribution is extraordinarily structured.
No, it's actually very correct in a very specific way. Ask any programmer using the parrots, and lately the "quality" has deteriorated so much, that coupled with the incoming price hikes, many will just forfeit the technology, unless someone else is carrying the cost, such as their employer. But as an employer, I also don't want to carry the costs for a technology which benefits as ever less.