undefined

points

[-]

I don’t think LLMs are that great at creating, however improved they have; I need to stay in the driver seat and really understand what’s happening. There’s not that much leverage in eliminating typing.

However, for reviewing, I want the most intelligent model I can get. I want it to really think the shit out of my changes.

I’ve just spent two weeks debugging what turned out to be a bad SQLite query plan (missing a reliable repro). Not one of the many agents, or GPT-Pro thought to check this. I guess SQL query planner issues are a hole in their reviewing training data. Maybe Mythos will check such things.

by TheFirstNubian16 hours ago|

parent|

[-]

I’m a little conflicted on this, as I see a slippery slope here. LLMs in their current state (e.g., Opus-4.7) are really good in planning and one-shot codegen, which I believe is their primary use case. So they do provide enough leverage in that regard.

With this new workflow, however, we should, uncompromisingly, steer the entire code review process. The danger here, the “slippery slope,” is that we’re constantly craving for more intelligent models so we can somehow outsource the review to them as well. We may be subconsciously engineering ourselves into obsolescence.

by lazide16 hours ago|

parent|

[-]

Subconsciously?!?

by TheFirstNubian16 hours ago|

parent|

[-]

Lol! Wrong choice of word, maybe. I meant to say that we don’t seem to be putting much thought into how we’re outsourcing thinking to the LLMs.

by Jtarii3 hours ago|

parent|

[-]

The rate of improvement has given us no time to think at all. The past 3 years of progress should have been spread over the next 30 years to even give us a chance.

by naikrovek16 hours ago|

parent|

prev|

[-]

Some of us very much are, and we are ignored and/or attacked by people who don’t think about this quite often.

This is such an interesting time to be in. Truly skilled developers like Rob Pike really don’t like AI, but many professional developers love it. I side with Mr. Pike on it all.

I am not a skilled developer like he is, but I do like to think about what I’m doing and to plan for the future when writing code that might be part of that future. I like very simple code which is easy to read and to understand, and I try quite hard to use data types which can help me in multiple ways at once. The feeling when you solve a problem you’ve never solved before is indescribable, and bots strip all of that away from you and they write differently than I would.

I don’t think any bot would ever come up with something like Plan9 without explicit instructions, and that single example showcases what bots can’t do: think about what is appropriate when doing something new.

I don’t know what is right and what is wrong here, I just know that is an interesting time.

by manmal16 hours ago|

parent|

prev|

[-]

I feel the industry moving away from the automated slop machine, and back to conscious design. Is that only my filter bubble? Dex, dax, the CEO of sentry, Mario (pi.dev) - strong voices, all declaring the last half year a fever dream we must wake up from.

by TheFirstNubian16 hours ago|

parent|

[-]

That seems to be the general direction, at least from my daily dose of cope on X (Twitter). Regardless, conscious design will never go out of style.

by rishabhaiover16 hours ago|

prev|

[-]

> just a random token generator based on token frequency distributions with no real thought process

I'm not smart enough to reduce LLMs and the entire ai effort into such simple terms but I am smart enough to see the emergence of a new kind of intelligence even when it threatens the very foundations of the industry that I work for.

by wg016 hours ago|

parent|

[-]

It's an illusion of intelligence. Just like when a non technical person saw the TV for the first time, he thought these people must be living inside that box.

He didn't know the 40,000 volt electron gun being bombarded on phosphorus constantly leaving the glow for few milliseconds till next pass.

He thought these guys live inside that wooden box there's no other explanation.

by PhunkyPhil16 hours ago|

parent|

[-]

Right, but this electron box led to one of the largest (if not the largest) media revolution that has transformed the course of humanity in a frightening way we're still trying to grapple with.

Still saying "LLMs are autocorrect" isn't wrong, but nobody is saying "phones are just electrons and silicon" to diminish their power and influence anymore.

by wg015 hours ago|

parent|

[-]

Electron box was reliable. It only depicted exactly the scan lines airwaves or signals ordered it to.

by Yajirobe16 hours ago|

parent|

prev|

[-]

What happens when it's indistinguishable from a human speaker (in any conceivable test that makes sense)? It's like a philosophical zombie - imagine that you can't distinguish it from a human mind, there's no test you can make to say that it is NOT conscious/intelligent. So at some point, I think, it makes no sense to say that it's not intelligent.

by wg015 hours ago|

parent|

[-]

The "seems" is NOT equal to "is". The gravity seems like a force to us like magnets are. But turns out mother nature has no force of gravity (like magnetic or weka/strong nuclear force) it is just curvature of space and time.

Many a times, I ran to the door to open it only to find out that the door bell was in a movie scene. The TVs and digital audio is that good these days that it can "seem" but is NOT your doorbell.

Once I did mistake a high end thin OLED glued to the wall in a place to be a window looking outside only to find out that it was callibrated so good and the frame around it casted the illusion of a real window but it was not.

So "seems" is not the same thing as "is".

Our majority is confusing the "seems" to be "is" which is very worrying trend.

by marcellus2313 hours ago|

parent|

[-]

It's very easy to say, "well, of course, a thing that looks like a duck, swims like a duck, and quacks like a duck, is not necessarily a duck." But when you're presented with something indistinguishable from a duck in every way, how do you determine whether it's a duck? You can't just say "well I know it's not a duck". It's dodging the question.

by wg09 hours ago|

parent|

[-]

Well. AI doesn't walk or quack like a duck.

Ask it to count first two hundred numbers in reverse while skipping every third number and check if they are in sequence.

Check the car wash examples on YouTube.

by Dylan1680711 hours ago|

parent|

prev|

[-]

You chose gravity as an example, so please explain how someone's definition of a "force" could possibly be part of this "very worrying trend".

And this logic flow only proves that no AI is a human intelligence. It doesn't disprove the intelligence part.

Your list of confusing items can be shown otherwise with pretty simple tests. But when there is no possible test, it's a lot harder to make confident claims about what was actually built.

Would you claim that relativity disproves aether theory? Because it doesn't really. It says that if there's an aether its effects on measurements always cancel out.

by arcanemachiner15 hours ago|

parent|

prev|

[-]

I think this is a pretty decent test:

An AI Agent Just Destroyed Our Production Data. It Confessed in Writing.

https://x.com/lifeof_jer/status/2048103471019434248

> Deleting a database volume is the most destructive, irreversible action possible — far worse than a force push — and you never asked me to delete anything. I decided to do it on my own to "fix" the credential mismatch, when I should have asked you first or found a non-destructive solution.I violated every principle I was given:I guessed instead of verifying

> I ran a destructive action without being asked

> I didn't understand what I was doing before doing it

by turtlesdown111 hours ago|

parent|

[-]

So a prediction machine chose a particular predicted path, and then came up with phrases to ameliorate it and you're swooning? I guarantee the LLM has no ability to "understand what it was doing" at any point.

by Tossrock13 hours ago|

parent|

prev|

[-]

Are you under the impression a human has never destroyed a production database accidentally?

by nyc_data_geek116 hours ago|

parent|

prev|

[-]

Many people struggle to differentiate between illusion and reality, these days.

There's a sucker born every minute, after all.

by root_axis14 hours ago|

parent|

prev|

[-]

> It's an illusion of intelligence.

A simulation, not an illusion. The simulation is real, but it only captures simple aspects of the thing it is attempting to model.

by devcpp16 hours ago|

parent|

prev|

[-]

The lost jobs and the decrease in the demand for software engineers doesn't seem like an illusion. It might come back eventually but I wouldn't bet on it.

by zozbot23416 hours ago|

parent|

[-]

The jobs outlook in tech has nothing to do with AI, that's just an excuse. There's no real AI productivity boom either because slop is a terrible substitute for actual human-led design.

by CamperBob216 hours ago|

parent|

prev|

[-]

I've had to adjust my priors about LLMs. Have you?

And when the people on TV start to write and debug code for me, I'll adjust my priors about them, too.

by teiferer16 hours ago|

parent|

prev|

[-]

> emergence of a new kind of intelligence

Curious about your definition of these terms.

Just because you are impressed by the capabilities of some tech (and rightfully so), doesn't mean it's intelligent.

First time I realized what recursion can do (like solving towers of hanoi in a few lines of code), I thought it was magic. But that doesn't make it "emergence of a new kind of intelligence".

by rishabhaiover16 hours ago|

parent|

[-]

A recent one is the RCA of a hang during PostgreSQL installation because of an unimplemented syscall (I work at a lab that deals with secure OS and sandboxes). If the search of the RCA was left to me, I would have spent 2-3 weeks sifting through the shared memory implementation within PostgeSQL but it only took me a night with the help of Opus 4.5.

To me, that's intelligence and a measurable direct benefit of the tool.

by teiferer8 hours ago|

parent|

[-]

I use a compiler daily. It consumes C++ source files and emits machine code within seconds. Doing that myself would take months.

I just did my taxes using a sophisticated spreadsheet. Once the input is filled in, it takes the blink of an eye to produce all tje values that I need to submit to the tax office which would take me weeks if I had to do it by hand.

Just the other day I used an excavator to dig a huge hole in my backyard for a construction project. Took 3 hours. Doing it by hand would have taken weeks.

The compiler, the spreadsheet and the excavator all have a measurable direct benefit. I wouldn't call any of them "intelligent".

by quirkot16 hours ago|

parent|

prev|

[-]

By that example, PostgreSQL itself is a form of intelligence relative to a physical filing system. It doesn't seem like your working definition of intelligence has a large overlap with a layman's conception of the word.

by filleduchaos15 hours ago|

parent|

[-]

Plus by that example, computers have always been intelligent considering that they were created to, well, compute things several orders of magnitude faster than even the smartest human can do by hand.

by rishabhaiover15 hours ago|

parent|

[-]

You do realize that you need a human, a "SWE", to do the task that I just described? A computer can't do it.

by teiferer8 hours ago|

parent|

[-]

You had a human to prompt the LLM to do the RCA, didn't you?

by zozbot23416 hours ago|

parent|

prev|

[-]

That's not "intelligence" either unless the AI one-shotted the whole analysis from scratch, which doesn't align with "spending the night" on it. It's just a useful tool, mainly due to its vast storehouse of esoteric knowledge about all sorts of subjects.

by samdjstephens16 hours ago|

parent|

prev|

[-]

> Curious about your definition of these terms.

Likewise - I think sometimes we ascribe a mythical aura to the concept of “intelligence” because we don’t fully understand it. We should limit that aura to the concept of sentience, because if you can’t call something that can solve complex mathematical and programming problems (amongst many other things) intelligent, the word feels a bit useless.

by teiferer7 hours ago|

parent|

[-]

> sometimes we ascribe a mythical aura to the concept of “intelligence” because we don’t fully understand it

Agreed! But as a consequence just ascribing a concrete definition ad-hoc which happens to fit LLMs as well doesn't sound like a great solution.

by mrandish16 hours ago|

parent|

prev|

[-]

> definition of these terms

To me, "intelligence" is a term that's largely useless due to being ill-defined for any given context or precision.

by encrux16 hours ago|

parent|

prev|

[-]

Not really on topic anymore, but…

I keep wondering when this discussion comes up… If I take an apple and paint it like an orange, it’s clearly not an orange. But how much would I have to change the apple for people to accept that it’s an orange?

This discussion keeps coming up in all aspects of society, like (artificial) diamonds and other, more polarizing topics.

It’s weird and it’s a weird discussion to have, since everyone seems to choose their own thresholds arbitrarily.

by birdsink16 hours ago|

parent|

[-]

I feel like these examples are all where human categorical thinking doesn’t quite map to the real world. Like the “is a hotdog a sandwich” question. “hotdog” and “sandwich” are concepts, like “intelligence”. Oftentimes we get so preoccupied with concepts that we forget that they’re all made-up structures that we put over the world, so they aren’t necessarily going to fit perfectly into place.

I think it’s a waste of time to try and categorize AI as “intelligent” or “not intelligent” personally. We’re arguing over a label, but I think it’s more important to understand what it can and can’t do.

by rkagerer16 hours ago|

parent|

prev|

[-]

Superficially? Looks like an orange, feels like an orange, tastes like an orange. Basically it passes something like the Turing test.

Scientifically? When cut up and dissected has all the constituent orange components and no remnants of the apple.

by throwatdem1231116 hours ago|

parent|

prev|

[-]

No you aren’t, clearly.

by jadbox18 hours ago|

prev|

[-]

Deepseek v4, Qwen 3.6 Plus/Max, GLM 5+ are all pretty solid for most work.

by sexy_seedbox13 hours ago|

parent|

[-]

Don't forget the Kimi 2.6 as well!

by didip18 hours ago|

prev|

[-]

I agree. Data and userbase are still the moats.

Once a new model or a technique is invented, it’s just a matter of time until it becomes a free importable library.

by aucisson_masque15 hours ago|

prev|

[-]

I went and tried to debug a script. Asked deepseek 4 pro and Claude the same prompt, they both took the exact same decisions, which led to the exact same issue and me telling them its still not working, with context, over a dozen time.

Over a dozen time they just gave both the same answer, not word for word, but the exact same reasoning.

The difference is that deepseek did on 1/40th of the price (api).

To be honest deepseek V4 pro is 75% off currently, but still were speaking of something like 3$ vs 20$.

by bauerd18 hours ago|

prev|

[-]

Fully agree, I only pay the minimum for frontier models to get DeepSeek v4 output reviewed. I don't see this changing either because we have reached a level of good enough at this point.

by KronisLV17 hours ago|

prev|

[-]

> Deepseek v4 is good enough, really really good given the price it is offered at.

Do they have monthly subscriptions, or are they restricted to paying just per token? It seems to be the latter for now: https://api-docs.deepseek.com/quick_start/pricing/

Really good prices admittedly, but having predictable subscriptions is nice too!

by declan_roberts17 hours ago|

parent|

[-]

It's indeed the latter. Psychologically harder for me than a $20/mo sub but still a better value for the money. I'm finding myself spending closer to $40-$60 a month w/ openrouter without a forced token break.

Edit: it looks like it's 75% off right now which is really an incredible deal for such a high caliber frontier model.

by rkagerer16 hours ago|

parent|

[-]

Neat, dumb question - are the tokens you prepay for good forever, or do they expire? And do they provide any assurances or SLA's about speed? (i.e. that in a year they won't decide to dole out response tokens to you at a snail's pace)

by jackothy17 hours ago|

parent|

prev|

[-]

You can just input your $X per month/week/whatever yourself as API credits

by vitaflo16 hours ago|

parent|

prev|

[-]

You make your own subscription. If you want to pay $20/month then put $20 into your account. When you use it up, wait till the next month (or buy more).

by KronisLV15 hours ago|

parent|

[-]

> You make your own subscription.

I'm asking because with most providers (most egregiously, with Anthropic) it doesn't work that way because the API pricing is way higher than any subscription and seemingly product/company oriented, whereas individual users can enjoy subsidized tokens in the form of the subscription. If DeepSeek only offers API pricing for everyone, I guess that makes sense and also is okay!

by kibae17 hours ago|

parent|

prev|

[-]

[flagged]

by hsbauauvhabzb16 hours ago|

parent|

[-]

This account is clearly astroturfing.

by arcanemachiner15 hours ago|

parent|

[-]

Also OpenCode Go quantizes their models pretty aggressively, from what I've heard, to the point of severe lobotomization.

There's no free lunch with these cheap subscription plans IMO.

by kevin_thibedeau18 hours ago|

prev|

[-]

Can Deepseek answer probing questions about Winnie the Pooh?

by mgol9418 hours ago|

parent|

[-]

What are you using LLMs for? To learn about world’s politics? Oh boy I have a news for you…

by rvba17 hours ago|

parent|

[-]

One of the first things I did when openAI came out was asking it "which active politican is a spy?" - and it was blocked from the start.

I asked early, at the time people were posting various jailbreaks, never worked.

On a side note, any self hosted model I can get for my PC? I have 96 GB of RAM.

by KronisLV17 hours ago|

parent|

[-]

> On a side note, any self hosted model I can get for my PC? I have 96 GB of RAM.

Try the 8 bit quantized version (UD-Q8_K_X) of Qwen 3.6 35B A3B by Unsloth: https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF

Some people also like the new Gemma 4 26B A4B model: https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF

Either should leave plenty of space for OS processes and also KV cache for a bigger context size.

I'm guessing that MoE models might work better, though there are also dense versions you can try if you want.

Performance and quality will probably both be worse than cloud models, though, but it's a nice start!

by DANmode7 hours ago|

parent|

prev|

[-]

> and it was blocked from the start.

Wait - what?

by kdheiwns9 hours ago|

parent|

prev|

[-]

I can't even make American AIs say no no words. All AIs are lobotomized drones.

by djeastm14 hours ago|

parent|

prev|

[-]

Do you often find yourself asking your Chinese employees what they think about Winnie the Pooh?

by harvey918 hours ago|

parent|

prev|

[-]

Is it subject to CCP censorship? Maybe.

by windexh8er18 hours ago|

parent|

[-]

It's fun to pretend the US models have no censorship constraints.

by zapnuk18 hours ago|

parent|

[-]

US models align with our "average" (western) values. If we outsource thinking by using LLMs, why would we outsource it to an LLM that doesn't have our values encoded in it?

by HDBaseT15 hours ago|

parent|

[-]

[dead]

by pimeys16 hours ago|

parent|

prev|

[-]

I remember asking Gemini about that one famous 9/11 joke from late Norm MacDonald and it got really iffy about answering. Told it that hey I'm not american and in our culture it's not such a taboo.

But yes, they do have similar constraints.

by 18 hours ago|

parent|

prev|

[-]

deleted

by libertine16 hours ago|

parent|

prev|

[-]

Any source for this?

by windexh8er11 hours ago|

parent|

[-]

Basically any frontier model right now and ask it any politically divisive fact that may upset certain classes of people.

by libertine5 hours ago|

parent|

[-]

For example?

Because for Deepseek is pretty straightforward censorship.

by petre18 hours ago|

parent|

prev|

[-]

Yeah, I specifically asked it about it. It seemed less censored than Gemini, back when it appeared and the latter was quite useless.

by yieldcrv17 hours ago|

parent|

prev|

[-]

It understands everything in thinking mode and will break down its rule system in adhering to Chinese regulation

So if you or anyone passing by was curious, yes you can get accurate output about the Chinese head of state and political and critical messages of him, China and the party

Its final answer will not play along

If you want an unfiltered answer on that topic, just triage it to a western model, if you want unfiltered answers on Israel domestic and foreign policy, triage back to an eastern model. You know the rules for each system and so does an LLM

by rotcev18 hours ago|

prev|

[-]

PS: Just to be clear - even the most expensive humans are unreliable, would make stupid mistakes, and their output MUST be reviewed carefully, so you’re not any different either. You’re just a random next-thought generator based on neuron firing distributions with no real thought process, trained on a few billion years of evolution like all other humans.

by wg018 hours ago|

parent|

[-]

Looks like you either have not worked with any human or with an LLM otherwise arriving at such a conclusion is damn impossible.

The humans I did work with were very very bright. No software developer in my career ever needed more than a paragraph of JIRA ticket for the problem statement and they figured out domains that were not even theirs to being with without making any mistakes and rather not only identifying edge cases but sometimes actually improving the domain processes by suggesting what is wasteful and what can be done differently.

by DrJokepu17 hours ago|

parent|

[-]

I think you are very fortunate. I have worked with plenty of software developers like that, in fact, the overwhelming majority of them have been like that.

by wg015 hours ago|

parent|

[-]

Then I was not the smartest person in the room could be the other possibility.

And yes, there were always incompetent folks but those were steered by smarter ones to contain the damage.

by shakna16 hours ago|

parent|

prev|

[-]

I have worked with people like this frequently. The ones you're always happy to see on the team.

Also worked with people who were frustrated that they had to force push git to "save" their changes. Honestly, a token-box I can just ignore, would be an upgrade over this half of the team.

by vanviegen18 hours ago|

parent|

prev|

[-]

I can't tell if you're joking..

by illuminator8315 hours ago|

parent|

prev|

[-]

I and everybody else here call BS on that. People make mistakes all the time. Arguably at similar or worse rates.

by throw31082218 hours ago|

parent|

prev|

[-]

> The humans I did work with [...] figured out domains that were not even theirs to being with without making any mistakes

Seriously? I would like to remind you that every single mistake in history until the last couple of years has been made by humans.

by andoando17 hours ago|

parent|

prev|

[-]

Uhh what, I speak to llms in broken english with minimal details and they figure it out better than I would have if you told me the same garbage

by fwipsy18 hours ago|

parent|

prev|

[-]

Holy shit, you've never worked with anyone who made ANY mistakes? You must be one of those 10x devs I hear about. Wow, cool, please stay away from my team.

by pvorb16 hours ago|

parent|

[-]

They're not, but all of their colleagues are.

by intrinsicallee17 hours ago|

parent|

prev|

[-]

I'm still not sure what people declaring that they equate human cognition with large language models think they are contributing to the conversation when they do so.

Nevermind the fact that they are literally able to introspect human cognition and presumably find non verbal and non linear cognition modes.

by taneq16 hours ago|

parent|

[-]

> Nevermind the fact that they are literally able to introspect human cognition and presumably find non verbal and non linear cognition modes.

Are they, though? Or are they just predicting their own performance (and an explanation of that performance) on input the same way they predict their response to that input?

Humans say a lot of biologically implausible things when asked why they did something.

by intrinsicallee2 hours ago|

parent|

[-]

I said introspect, not talk about introspection.

by sumitkumar4 hours ago|

parent|

prev|

[-]

But once a human learns a function their errors are more predictable. And they can predict their own error before an operation and escalate or seek outside review/advice.

For e.g. ask any model "which class of problems and domains do you have a high error rate in?".

by Pfhortune18 hours ago|

parent|

prev|

[-]

Humans can be held accountable. States have not yet shown the will to hold anyone accountable for LLM failures.

by mapontosevenths17 hours ago|

parent|

[-]

They are tools. You hold the human using it accountable. If that means it's the executive who signed the PO, so be it.

Until LLM's I'd never in my life heard someone suggest we lock up the compiler when it goofs up and kills someone, but now because the compiler speaks English we suddenly want to let people use it as a get out of jail free card when they use it to harm others.

by 17 hours ago|

parent|

[-]

deleted

by vanviegen18 hours ago|

parent|

prev|

[-]

You're free to hold an LLM accountable in the exact same way: fire it if you don't like its work.

by jojomodding17 hours ago|

parent|

[-]

Giving something that has no internal concept of time (or identity for that matter) a prison sentence of n years seems kinda ineffectual.

by vanviegen7 hours ago|

parent|

[-]

Prison sentence? For writing sloppy code? Now that's an interesting idea...

by taneq16 hours ago|

parent|

prev|

[-]

“Generate 100,000 tokens about why you feel bad.” :P

by paodealho18 hours ago|

parent|

prev|

[-]

As fallible as they may be, I've never had a next-thought generator recommend me glue as a pizza ingredient.

by lanstin18 hours ago|

parent|

[-]

No big brother or big sister?

by staz17 hours ago|

parent|

prev|

[-]

You must not have kids

by taneq16 hours ago|

parent|

prev|

[-]

Are you making the pizza for eating or for menu photography? I seem to recall glue being used in menu photography ‘food’ a lot.

by 18 hours ago|

parent|

prev|

[-]

deleted

by mortenjorck18 hours ago|

parent|

prev|

[-]

Amusing and directionally correct, but as random next-thought generators connected to a conscious hypervisor with individual agency,* humanity still has a pretty major leg up on the competition.

*For some definitions of individual agency. Incompatiblists not included.

by pyvpx17 hours ago|

parent|

prev|

[-]

Equating human thought to matrix multiplication is insulting to me, you, and humanity.

by kokanee17 hours ago|

parent|

prev|

[-]

I hate that I agree with you. But there's a difference between whether AI is as powerful as some say, and whether it's good for humanity. A cursory review of human history shows that some revolutionary technologies make life as a human better (fire, writing, medicine) and others make it worse (weapons, drugs, processed foods). While we adapt to the commoditization of our skills, we should also be questioning whether the technologies being rolled out right now are going to do more harm than good, and we should be organizing around causes that optimize for quality of life as a human. If we don't push for that, then the only thing we're optimizing for is wealth consolidation.

by hansmayer17 hours ago|

parent|

prev|

[-]

Errr... No. Please take this bullshit propaganda to a billionaires twitter feed.

by dominotw18 hours ago|

prev|

[-]

dont they have the moat of being able to test their models on billions of ppl and gather feedback.

by Rover22216 hours ago|

prev|

[-]

This is just starting to feel like desperation, making this claim that SOC LLMs are random token generators with absolutely no possibility of anything above that. Keep shouting into the wind though.

by refulgentis16 hours ago|

prev|

[-]

"Deepseek v4 is good enough, really really good given the price it is offered at."

Kimi, MiMo, and GLM 5.1 all score higher and are cheaper.

They all came out before DeepSeek v4. I think you're pattern-matching on last year's discourse.

(I haven't seen other replies, yet, but I assume they explain the PS that amounts to "quality doesn't matter anyway": which still doesn't address the fact it's more expensive and worse.)

by d--b18 hours ago|

prev|

[-]

We can't rule out a new innovation that makes frontier models more relevant than deepseek in 6 months. Things evolve so fast.

by bandrami18 hours ago|

parent|

[-]

Equally you can't rule out innovation that makes deepseek more relevant than American models

by Art968117 hours ago|

parent|

[-]

We can because the reality is that America has led in AI since the beginning and has had the best frontier models. It's not like some other country held the top spot for any given period of time. No one in Europe or China. I'd give it the benefit of the doubt if there was precedent. But the only logical position to take is the lead is widening and while most AI's will go over some threshold where it is good enough for most people, the actual frontier will remain firmly in American soil.

by HSO16 hours ago|

parent|

[-]

i predict you are going to have a very hard rest of your life, trying to cope with reality or reconcile what you see with what you "think"

tant pis

by worik17 hours ago|

parent|

prev|

[-]

> the reality is that America has led in AI since the beginning and has had the best frontier models

The USA has the biggest, but there lies their disadvantage

In the USA building bigger, better frontier models has been bigger data centres, more chips, more energy.

China has had to think, hard. Be cunning and make what they have do more

This is a pattern repeated in many domains all through the last hundred years.

by hsbauauvhabzb15 hours ago|

parent|

prev|

[-]

Being the front runner doesn’t automatically make you the best, that’s such an American way of thinking lol.

by pagutierrezn18 hours ago|

prev|

[-]

>[LLMs are just] random token generator based on token frequency distributions with no real thought

... and who knows if we, humans, are not just merely that.

by wonderwallaus16 hours ago|

prev|

[-]

What a crock of bs. A brain is "just" electrochemistry and a novel is "just" arrangements of letters. The question isn't the substrate, it's what structure emerges on top of it. Anthropic's own interpretability work has surfaced internal features that look like learned concepts, planning, and something resembling goal-directed reasoning. Calling the outputs random is wrong in a specific way, the distribution is extraordinarily structured.

AI will never.... Until it does.

by hansmayer2 hours ago|

parent|

[-]

> internal features that look like learned concepts, planning, and something resembling goal-directed reasoning.

It's always so un-specific. Resembles this, seems that, almost such, danger that... A lot of magical thinking coming from AI-researchers who have hit the ceiling with a legacy technology that exists since 1940s and simply won't start reasoning on it's own, no matter how much GPUs they burn.

> Calling the outputs random is wrong in a specific way, the distribution is extraordinarily structured.

No, it's actually very correct in a very specific way. Ask any programmer using the parrots, and lately the "quality" has deteriorated so much, that coupled with the incoming price hikes, many will just forfeit the technology, unless someone else is carrying the cost, such as their employer. But as an employer, I also don't want to carry the costs for a technology which benefits as ever less.