upvote
Don't keep up. Much like with news, you'll know when you need to know, because someone else will tell you first.
reply
This is only good advice if you don’t have the need to understand what’s happening on the edge of the frontier. If you do, then you’ll lose on compounding the knowledge from staying engaged with the major developments.
reply
Not all developments are equal. Many are experimental branches of testing things out that usually get merged back into the core, so to speak. For example, I knew someone who was full into building their own harness and implementing the Ralph loop and various other things, spending a lot of time on it and now, guess what? All of that is in Claude Code or another harness and I didn't have to spend any amount of time on it because ultimately they're implementation details.

It's like ricing your Linux distro, sure it's fun to spend that time but don't make the mistake of thinking it's productive, it's just another form of procrastination (or perhaps a hobby to put it more charitably).

reply
The players barely ever change. People don't have problems following sports, you shouldn't struggle so much with this once you accept top spot changes.
reply
I didn't express this well but my interest isn't "who is in the top spot", and is more _why and _how various labs get the results they do. This is also magnified by the fact that I'm not only interested in hosted providers of inference but local models as well. What's your take on the best model to run for coding on 24GB of VRAM locally after the last few weeks of releases? Which harness do you prefer? What quants do you think are best? To use your sports metaphor it's more than following the national leagues but also following college and even high school leagues as well. And the real interest isn't even who's doing well but WHY, at each level.
reply
The technical report discussing the why and how is here: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...
reply
Follow the AI newsletters. They bundle the news along with their Op-Ed and summarize it better.
reply
Tips on what newsletters are worth signing up for ?
reply
Can you suggest some good ones?
reply
I really like latent.space and simonwillison.com.

Also (shameless self-promo) I publish a 2x weekly blog just to force myself to keep up: https://aimlbling-about.ninerealmlabs.com/treadmill/

reply
It is funny seeing people ping pong between Anthropic and ChatGPT, with similar rhetoric in both directions.

At this point I would just pick the one who's "ethics" and user experience you prefer. The difference in performance between these releases has had no impact on the meaningful work one can do with them, unless perhaps they are on the fringes in some domain.

Personally I am trying out the open models cloud hosted, since I am not interested in being rug pulled by the big two providers. They have come a long way, and for all the work I actually trust to an LLM they seem to be sufficient.

reply
Their financial projections that to a big part their valuation and investor story is built on involves actually making money, and lots of money, at some point. That money has to come from somewhere.
reply
I find ChatGPT annoying mostly
reply
Open settings > personalization. Set it to efficient base style. Turn off enthusiasm and warmth. You’re welcome
reply
Yea but even then it's still annoying. "It's not about the enthusiasm and warmth but the general tone"
reply
Setting “base style and tone” to “efficient” works fine for me.
reply
It honestly has all kinda felt like more of the same ever since maybe GPT4?

New model comes out, has some nice benchmarks, but the subjective experience of actually using it stays the same. Nothing's really blown my mind since.

Feels like the field has stagnated to a point where only the enthusiasts care.

reply
For coding Opus 4.5 in q3 2025 was still the best model I've used.

Since then it's just been a cycle of the old model being progressively lobotomised and a "new" one coming out that if you're lucky might be as good as the OG Opus 4.5 for a couple of weeks.

Subjective but as far as I can tell no progress in almost a year, which is a lifetime in 2022-25 LLM timelines

reply
Another annoyance (for more API use) is summarized/hidden reasoning traces. It makes prompt debugging and optimization much harder, since you literally don't have much visibility into the real thinking process.
reply
holy shit im right there with you
reply