undefined

points

by g-mork1 days ago |

comments

by Syntaf1 days ago|

[-]

I put in probably thousands of Claude session hours a month, aggregated across work + personal.

I must be missing something or supremely lucky because I feel like I’ve never hit these “stupid” moments.

If I do, it’s probably because I forgot to switch off of haiku for some tiny side thing I was doing before going back to planning.

by saulpw13 hours ago|

parent|

[-]

There are 720 hours in a month. You'd have to be running 3 sessions in parallel continuously to be doing thousands of session-hours in a month. Are individual people really doing this?!

by closewith11 hours ago|

parent|

[-]

Our developers work office hours, but would frequently have 10 plus sessions open. Massive parallelism is one of the benefits of agentic coding.

by hakanderyal1 days ago|

parent|

prev|

[-]

Similar usage here. But I encountered this moments, and I chalk it up to the random nature of LLMs. Back in Sonnet 3.5 days, it would happen every other day. I even build an 'you are absolutely right' tracker back then to measure it. Opus 4.6, maybe once or twice a month.

by closewith11 hours ago|

parent|

[-]

Yes, subjectively there do seem to be moments where the quality of the output drops significantly - usually during US peak hours.

by g-mork22 hours ago|

parent|

prev|

[-]

It's possible that it's simply paranoia, but moments where Opus starts acting like Haiku seem to correlate with periods of higher latency and HTTP errors. Don't like reporting this because it's so hand-wavy and conspiratorial, but it's difficult not to think they're internally using extraordinary measures of some sort to manage capacity.

But even when Opus is running healthy, it still doesn't address the underlying issue that these models can only do so much. I have had Opus build out a bunch of apps but I'm still finding my time absorbed as soon as it comes to anything genuinely exceeding "CRUD level difficulty". Ask it to fix a subtle visual alignment issue, make a small change to a completely novel algorithm, or just fix a tiny bug without having to watch for "Oh, this means I should rewrite module <X>" is something that simply isn't possible while still being able to stand over the work.

It's not to say I don't get a massive benefit from these tools, I just think it's possible to be asking too much of them, and that's maybe the real problem to solve.

by gverrilla18 hours ago|

parent|

prev|

[-]

Most people hate reading. Therefore they don't know how to write. Therefore they can't prompt properly. Not to mention so many "enemies of logic" cults being so strong nowadays.

by SkyPuncher1 days ago|

prev|

[-]

I literally hit my 5 hour window limit in 1.5 hours every single day now.

2 weeks ago, I had only hit my limit a single time and that was when I had multiple agents doing codebase audits.

by perfmode23 hours ago|

parent|

[-]

Are you monitoring the size of your context windows? As they grow, so does the cost of every operation performed in that state.

by Aurornis1 days ago|

parent|

prev|

[-]

Anthropic had a special extra usage promotion going on during non-peak hours that ended recently.

They didn’t do a great job of explaining it. I wonder how many people got used to the 2X limits and now think Anthropic has done something bad by going back to normal

by stavros1 days ago|

parent|

[-]

They also reduced the peak time limits, so it's not just the promotion.

by SkyPuncher1 days ago|

parent|

prev|

[-]

Naw, it's not that. This is business-day usage for all of it.

by greenavocado1 days ago|

parent|

prev|

[-]

Irrelevant. I had at least ten times more usage then at any time

by paulddraper1 days ago|

parent|

prev|

[-]

Could it also have anything to do with Anthropic being deliberately opaque about usage in general?

by Razengan1 days ago|

parent|

prev|

[-]

I've been using Codex extensively, 5.4 at "Extra High" and yet to hit a limit. The $20 plan

by scotty7922 hours ago|

parent|

[-]

It very much depends on the workloads. If you inspect existing code (that somebody else wrote over the years) usage runs out quickly. If you are building your own greenfield stuff the sky is the limit.

by Razengan12 hours ago|

parent|

[-]

> If you inspect existing code (that somebody else wrote over the years) usage runs out quickly.

That's EXACTLY and ALL I've been doing!

Using Codex and Claude both side by side to view my Godot components framework open source project (link in profile)

Claude has been..ugh.. bad, to put it mildly, on the same content and the same prompts.

by estimator72921 days ago|

parent|

prev|

[-]

They've been running a "double credits" promo for several weeks, which expired on the first of this month.

by bethekind1 days ago|

prev|

[-]

I think my next steps are: 1) try out openai $20/month. I've heard they're much more generous. 2) try out open router free models. I don't need geniuses, so long as I can see the thinking (something that Claude code obfuscates by default) I should be good. I've heard good things about the CLIO harness and want to try openrouter+clio

by Flere-Imsaho1 days ago|

parent|

[-]

I'm taking a bet on local models to do the non genius work. Gemma 4 (released yesterday) has been designed to run on laptops / edge devices....and so far is running pretty well for me.

by neal_jones22 hours ago|

parent|

[-]

How’s Gemma 4 been?

by renewiltord22 hours ago|

parent|

[-]

Edge models are good for their purpose but putting them in agentic flow with current ollama quants on a Mac Mini I see high tool use error rate and output hallucination.

For JSON to text formatting it works well on a one-round basis. So I think you should realistically have an evaluation ready to go so you can use it on these models. I currently judge them myself but people often use a smart LLM as judge.

Today writing eval harness with Claude is 5 min job. Do it yourself so you can explore as quants on Gemma get better.

by beering1 days ago|

parent|

prev|

[-]

Word on the street is that Opus is much much larger of a model than GPT-5.4 and that’s why the rate limits on Codex are so much more generous. But I guess you could also just switch to Sonnet or Haiku in Claude Code?

by admiralrohan1 days ago|

parent|

prev|

[-]

Openrouter free models have 50 requests per day limit + data collection. As per their doc.

by nodja1 days ago|

parent|

[-]

You can charge $10 on the account and get unlimited requests. I abused this last week with the nemotron super to test out some stuff and made probably over 10000 requests over a couple of days and didn't get blocked or anything, expect 5xx errors and slowdowns tho.

by cmrdporcupine8 hours ago|

parent|

prev|

[-]

OpenAI has the better coding model anyways. You will be pleasantly surprised by Codex. The TUI tool is less buggy and runs faster and it's a more careful and less error-prone model. It's not as "creative" but it's more intelligent.

On top of that their $20 plan has much higher usage limits than Anthropic's $20 plan and they allow its use in e.g. opencode. So you can set up opencode to use both OpenAI's codex plan plus one of the more intelligent Chinese models so you can maximize your usage. Have it fully plan things out using GPT 5.4, write code using e.g. Qwen 3.6, then switch back to GPT 5.4 for review

by merlindru23 hours ago|

parent|

prev|

[-]

i tried out gpt 5.4 xhigh and it did meaningfully worse with the same prompt as opus 4.6. like, obvious mistakes

by josh_p7 hours ago|

parent|

[-]

I've been pretty satisfied using oh-my-openagent (omo) on opencode with both opus-4.6 and gpt-5.4 lately. The author of omo suggests different prompting strategies for different models and goes into some detail here. https://github.com/code-yeongyu/oh-my-openagent/blob/dev/doc... For each agent they define, they change the prompt depending on which model is being used to fit it. I wonder how much of the "x did worse than y for the same prompt" tests could be improved if the prompts were actually tailored to what the model is good at. I also wonder if any of this matters or if it's all a crock of bologna..

by kasey_junk18 hours ago|

parent|

prev|

[-]

Fwiw I run this eval every week on a set of known prompts and I believe the in group differences are bigger than out group.

That is I get more variance between opus 4.6 and itself than I do between the sota models.

I don’t have the budget for statistical relevance but I’m convinced people claiming broad differences are just vibing, or there are times when agent features make a big difference.

by danpalmer1 days ago|

prev|

[-]

Please don't use grossly offensive terms in this forum. That sort of language is not welcome here.

by g-mork1 days ago|

parent|

[-]

Oops, fixed

by klohto1 days ago|

parent|

prev|

[-]

Since when are you a moderator?

by danpalmer5 hours ago|

parent|

[-]

We're all a part of deciding what culture we want to have in our communities. Culture is what we make it, and I don't want a space where people use that sort of language regardless of whether anyone is a mod or not.

by imp0cat23 hours ago|

parent|

prev|

[-]

Since when are you a meta-moderator? ;)

by zdragnar1 days ago|

prev|

[-]

> I think we all jumped on the AI mothership with our eyes closed

Oh no, there's plenty of us willing to say we told you so.

What's more interesting to me is what it's going to look like if big companies start removing "AI usage" from their performance metrics and cease compelling us to use it. More than anything else, that's been the dumbest thing to happen with this whole craze.

by colechristensen1 days ago|

prev|

[-]

Every service is being sold at a deep discount chasing market share, but it's not lasting forever.

by g-mork1 days ago|

parent|

[-]

Speaking only personally of course, I'm completely over the chat idiom in almost every way. Where is all this future demand coming from? By the time Android lands a God mode ultimate voice assistant it's pretty much guaranteed I will be well beyond the point where I'd want to use it. The whole thing is starting to remind me of 3G video calling where the networks thought it'd change everything, and by the end of it with all the infrastructure in place, the average user has made something like 0.001 3G-native video calls over the lifetime of their usage.

Would really love some path forward where the AI parts only poke out as single fields in traditional user interfaces and we can forget this whole episode

by mark_l_watson16 hours ago|

parent|

[-]

I agree with you and the GP post, even though I am an LLM enthusiast.

My primary interest is using small edge models to perform specific engineering tasks. In this pursuit I do like to use gemini-cli or Antigravity with Claude a few times a week as coding assistants, but I am using relatively few tokens to do this.

I also waste a lot of time, but this is fun time: experimenting with open source coding agents with local models just to see what kinds of results I can get. This is mostly a waste of time, but I enjoy it.

My other favorite use pattern: once or twice a week I like to use the iOS Gemini app in voice mode, and once a month also use video input. I really like this, but it is not life changing.

Externalities matter: I never use frontier LLM-based AI without thinking of energy, data center, and environmental costs.

by colechristensen1 days ago|

parent|

prev|

[-]

I don't understand this perspective. I can't imaging a point where I won't want to ask "what's the weather like?" "please turn off the lights" "what is the airspeed of an unladen swallow?" likewise chatting through directing it to build something or solve a problem, voice or typing will each have their place.

And video calling did take off, plenty of people use facetime and almost everybody working in an office uses some form of video calls. Criticizing the early attempts at getting video calling working because they hadn't taken off yet (I remember them being advertised on "video phones" with 56k modems), of course someone was going to have the idea and implement before it was quite reasonable.

by neonstatic1 days ago|

parent|

[-]

> I can't imaging a point where I won't want to ask "what's the weather like?" "please turn off the lights"

To help with understanding that perspective, I cannot imagine a scenario where I would ask a device connected to the internet to turn off the lights. I literally never wanted this. A physical switch is a 100% non negotiable for me. I feel the same way about non-mechanical car doors.

Perhaps due to that outlook I was always puzzled about the entire idea of an "assistant". It's interesting for me to see, that there are people out there who actually want that "assistant".

by Barbing1 days ago|

parent|

[-]

The switch is a necessity.

Ever end up cooking or something when the phone/doorbell rings and you want to pause the music? Have your hands full and wanted to open a door? Hear the weather and then the news as you brew coffee or put your shoes on (without interaction with a bright screen)?

You should save some money and keep some privacy doing it your way :)

by colechristensen16 hours ago|

parent|

prev|

[-]

Have you never... asked a person a question? to do something for you? to pass the salt? what time it was?

Maybe you're a little strange but it cannot be that much of a stretch for you to consider using speech to ask for things.

Not wanting to hide things behind Internet connected computers is fine, being unable to imagine wanting to use your voice to ask for things is a little silly.

by thawawaycold12 hours ago|

parent|

[-]

Not OP but for me it comes down to "asking a person" ≠ "asking a device". Besides just to be pedantic one of the thing you've described is not something an llm would be able to do, and for the second one... That's what watches and clocks are for. You don't need to have a datacenter running smwh in the world or a beefy PC to take a glance at the time. If you think you do, I personally wouldn't call others "a little strange" if I were you.

by neonstatic1 hours ago|

parent|

prev|

[-]

I am not sure if you are stupid or trying to sound like a stupid person. Have I ever asked a person a question? That's what you are asking me after I said I don't want a computer assistant? And you call me strange? Can you take your passive aggressiveness and shove it up your ass?

by wat100001 days ago|

parent|

prev|

[-]

You don't watch Iron Man and want a JARVIS? Current systems are pretty far away from that, but that's the overall draw.

by neonstatic1 days ago|

parent|

[-]

I don't watch superhero stuff. But even with a more classical example of Space Odyssey 2001 - a talking computer has never been something I found even remotely interesting. It took me months to give LLMs a serious try due to this.

by wat1000016 hours ago|

parent|

[-]

I guess everybody's different. I personally like the idea of being able to, say, ask what events I have on my calendar for the day while I'm getting dressed, and be able to get a summary and then engage in followup conversation about it. Or have a little reminder that says, it's time to leave in a few minutes, would you like to turn on your car's climate control? It's not to replace my normal computer usage with a voice interface, but to add new capabilities.

by codybontecou1 days ago|

prev|

[-]

Are you using the Chinese models through their individual services or via an intermediary layer?

by mark_l_watson15 hours ago|

parent|

[-]

I am not the person you are responding to but I have tried both: using OpenRouter and also giving a Chinese company $5 on my credit card to buy tokens. If I know what model I want to experiment with, I much prefer to just pay $5 and have plenty of tokens to experiment. On a yearly basis, this is a very tiny expense for the benefits of getting plenty of tokens to experiment with.

by rhodysurf1 days ago|

prev|

[-]

This is what I did, downgraded to pro and pay for opencode zen for the open models. I like the combo of the two

by mark_l_watson15 hours ago|

parent|

[-]

Oh, https://opencode.ai/zen looks good. I like pay as you go plans since I usually don’t use many tokens compared to vibe coders.

I regret paying Google for a one year AI subscription last spring (although it was a deep discount over the regular $20/month cost) because it has kept me from experimenting with many venders (but it was a fantastic deal financially).

I just put a reminder on my calendar to try OpenCode zen when my subscription ends.

by jimbob456 hours ago|

prev|

[-]

constant HTTP errors

Dealing with these right now with ChatGPT. Bricked a thread which I didn’t even know was possible.

by Aurornis1 days ago|

prev|

[-]

> I think we all jumped on the AI mothership with our eyes closed and it's time to dial some nuance back into things.

I’m kind of confused by these takes from HN readers. I could see LinkedIn bros getting reality checked when they finally discover that LLMs aren’t magic, but I’m confused about how a developer could go all-in on AI and not immediately realize the limitations of the output.

by krupan1 days ago|

parent|

[-]

It has indeed been baffling. Ad I dig deeper into what developers are doing with AI, it's basically like what I did customizing and tweaking emacs when I was younger (and fine, I'll admit I still do it sometimes). They are having so much fun playing with these new tools that they aren't really noticing how little the new tools are actually helping them

by Flere-Imsaho1 days ago|

parent|

prev|

[-]

> immediately realize the limitations of the output.

I'm "all-in" on AI code generation. I very much realise their limitations, it's like any tool really. I do think they're magic, you just need to learn how to weld the power.