Intermittent variable rewards, whether produced by design or merely as a byproduct, will induce compulsive behavior, no matter the optimization target. This applies to Claude
Does this mean I should not garden because it's a variable reward? Of course not.
Sometimes I will go out fishing and I won't catch a damn thing. Should I stop fishing?
Obviously no.
So what's the difference? What is the precise mechanism here that you're pointing at? Because sometimes life is disappointing is a reason to do nothing. And yet.
Anthropic's optimization target is getting you to spend tokens, not produce the right answer. It's to produce an answer plausible enough but incomplete enough that you'll continue to spend as many tokens as possible for as long as possible. That's about as close to a slot machine as I can imagine. Slot rewards are designed to keep you interested as long as possible, on the premise that you _might_ get what you want, the jackpot, if you play long enough.
Anthropic's game isn't limited to a single spin either. The small wins (small prompts with well defined answers) are support for the big losses (trying to one shot a whole production grade program).
The majority of us are using their subscription plans with flat rate fees.
Their incentive is the precise opposite of what you say. The less we use the product, the more they benefit. It's like a gym membership.
I think all of the gambling addiction analogies in this thread are just so strained that I can't take them seriously. Even the basic facts aren't even consistent with the real situation.
they want me to not spend tokens. that way my subscription makes money for them rather than costing them electricity and degrading their GPUs
If you're on anything but their highest tier, it's not altogether unreasonable for them to optimize for the greatest number of plan upgrades (people who decide they need more tokens) while minimizing cancellations (people frustrated by the number of tokens they need). On the highest tier, this sort of falls apart but it's a problem easily solved by just adding more tiers :)
Of course, I don't think this is actually what's going on, but it's not irrational.
I mean this only works if Anthropic is the only game in town. In your analogy if anyone else builds a casino with a higher payout then they lose the game. With the rate of LLM improvement over the years, this doesn't seem like a stable means of business.
Dealing with organic and natural systems will, most of the time, have a variable reward. The real issue comes from systems and services designed to only be accessible through intermittent variable rewards.
Oh, and don't confuse Claude's artifacts working most of the time with them actually optimizing to be that way. They're optimizing to ensure token usage. I.E. LLMs have been fine-tuned to default to verbose responses. They are impressive to less experienced developers, often easier to detect certain types of errors (eg. Improper typing), and will make you use more tokens.
This is an incorrect understanding of intermittent variable reward research.
Claims that it "will induce compulsive behavior" are not consistent with the research. Most rewards in life are variable and intermittent and people aren't out there developing compulsive behavior for everything that fits that description.
There are many counter-examples, such as job searching: It's clearly an intermittent variable reward to apply for a job and get a good offer for it, but it doesn't turn people into compulsive job-applying robots.
The strongest addictions to drugs also have little to do with being intermittent or variable. Someone can take a precisely measured abuse-threshold dose of a drug on a strict schedule and still develop compulsions to take more. Compulsions at a level that eclipse any behavior they'd encounter naturally.
Intermittent variable reward schedules can be a factor in increasing anticipatory behavior and rewards, but claiming that they "will induce compulsive behavior" is a severe misunderstanding of the science.
The variability in eg soccer kicks or basketball throws is also there but clearly there is a skill element and a potential for progress. Same with many other activities. Coding with LLMs is not so different. There are clearly ways you can do it better and it's not pure randomness.
So you're saying businesses shouldn't hire people either?
There is absolutely no incentive to do that, for any of these companies. The incentive is to make the model just bad enough you keep coming back, but not so bad you go to a competitor.
We've already seen this play out. We know Google made their search results worse to drive up and revenue. Exact same incentives are at play here, only worse.
IF I USE LESS TOKENS, ANTHROPIC GETS MORE MONEY! You are blindly pattern matching to "corporation bad!" without actually considering the underlying structure of the situation. I believe there's a phrase for this to do with probabilistic avians?
Are you totally sure they are not measuring/optimizing engagement metrics? Because at least I can bet OpenAI is doing that with every product they have to offer.
The analogy was too strained to make sense.
Despite being framed as a helpful plea to gambling addicts, I think it’s clear this post was actually targeted at an anti-LLM audience. It’s supposed to make the reader feel good for choosing not to use them by portraying LLM users as poor gambling addicts.
If Dave the developer is paying, Dave is incentivized to optimize token use along with Anthropic (for the different reasons mentioned).
If the Dave's employer, Earl, is paying and is mostly interested in getting Dave to work more, then what incentive does Dave have to minimize tokens? He's mostly incentivized by Earl to produce more code, and now also by Anthropic's accidentally variable-reward coding system, to code more... ?