undefined

points

[-]

"5. DON'T FUCKING OVERENGINEER! WRITE THE SIMPLEST CODE THAT CAN POSSIBLY WORK! NO NESTED LAYERS OF ABSTRACTION! NO UNNECESSARY CLASSES OR METHODS! NO DESIGN PATTERNS UNLESS THEY ARE ABSOLUTELY NECESSARY! NO MAGIC! NO SHENANIGANS! JUST THE DAMN CODE THAT GETS THE JOB DONE IN THE MOST STRAIGHTFORWARD WAY POSSIBLE! THE FIRST PRIORITY IS TO WRITE CODE THAT IS EASY TO READ AND UNDERSTAND AND READ!!!"

this is the line I keep in Agents.md that helps me prevent Codex from playing smart

by bertil5 days ago|

parent|

[-]

The urge to put capitalized, repetitive, borderline abusive instructions should be studied. I haven't read many academic papers looking at the frustrations around repetitive patterns.

by reactordev5 days ago|

parent|

[-]

There have been a few studies that have shown models produce worst responses when under duress from a frustrated user posting insults in all caps.

https://arxiv.org/abs/2602.10144

by notnaut5 days ago|

parent|

prev|

[-]

It reminds me of FIRMLY telling my cat to stop jumping up on the counter

by anakaine5 days ago|

parent|

[-]

If my cat was an LLM, I'd use a different model. The current one is stuck in noisy useless arsehole mode.

by phoh5 days ago|

parent|

[-]

are you asking it questions about security?

by 5 days ago|

parent|

prev|

[-]

deleted

by LordDragonfang5 days ago|

parent|

prev|

[-]

It's fundamentally because, despite (nearly) everyone's claims otherwise, the fact that we interact with them through language means we (our brains) model them as a sort of person. (Note that this fact is totally orthogonal as to whether it's actually sentient or not.) We then try and instruct them the same way we would a person totally subordinate to us.

When a "person" that you don't view as a "real" person repeatedly does exactly what you just told it not to do (often amid false assurances it understands and will avoid doing so in the future), most people get angry.

Compare it to how the kind of people who treat children like property treat their kids, or other examples of keeping people as property.

by lxgr5 days ago|

parent|

[-]

It should be relatively clear at this point that the model will in turn also model you as somebody that shows unrestrained anger with subordinates and adapt its responses accordingly. This might or might not be what you want.

by LordDragonfang5 days ago|

parent|

[-]

Good addition. Fully agreed on that point, yes. (At the very least for larger models, if not also for smaller ones)

by ur-whale5 days ago|

parent|

prev|

[-]

> borderline abusive instructions

who, or rather what, is being abused here exactly ?

by sirsinsalot5 days ago|

parent|

[-]

I think intent, rather than target, is implied and important.

You should see the abuse my motorbike gets. Poor thing.

by rimliu4 days ago|

parent|

prev|

[-]

inanimate fucking object.

by saligne5 days ago|

parent|

prev|

[-]

Yeah says way more about the user than the model

by jlawer5 days ago|

parent|

prev|

[-]

I have a theory that swearing actually results is less comprehension of instructions by the model due to lack of training data over more conventional MUST.

We were reviewing reports of situations where the models failed to follow directions and there was a common thread of some where when the operator got the model to acknowledge the rule breach, it quoted back something that included swearing.

I don’t have the data to truely look into it, but I did give the instruction to my engineers to avoid it as a “might be a problem”.

by acjohnson555 days ago|

parent|

[-]

It would be interesting to understand the data on this. But I suspect that the results would vary by model.

But I avoid unnecessary emotion in my prompts because I don't want potentially distracting activations. Kind of like communicating with humans.

by throwaway858255 days ago|

parent|

prev|

[-]

It's divination for people with STEM degrees.

by Xmd5a5 days ago|

parent|

prev|

[-]

https://arxiv.org/abs/2510.04950

> impolite prompts consistently outperformed polite ones, with accuracy ranging from 80.8% for Very Polite prompts to 84.8% for Very Rude prompts.

by acjohnson555 days ago|

parent|

[-]

> These findings differ from earlier studies that associated rudeness with poorer outcomes, suggesting that newer LLMs may respond differently to tonal variation.

Unless the mechanism is understood, my assumption is that this is a moving target.

by beachy5 days ago|

parent|

prev|

[-]

I have a theory that swearing at AI generally is not a good idea - when the singularity arrives and every human's postings ever made are scanned for compatibility, then people who show courtesy to AI will be favoured. Joking, kind of, but only partly.

by fhars4 days ago|

parent|

[-]

https://en.wikipedia.org/wiki/Roko%27s_basilisk

by beachy4 days ago|

parent|

[-]

Fantastic rabbit hole - until it segued into Elon's love life.

by cdelsolar5 days ago|

parent|

prev|

[-]

https://images.teepublic.com/derived/production/designs/3478...

by re-thc5 days ago|

parent|

prev|

[-]

> I have a theory that swearing actually results is less comprehension of instructions by the model due to lack of training data over more conventional MUST.

How so? Plenty of swearing in lots of training data, especially older code, e.g. in Linux.

by jlawer5 days ago|

parent|

[-]

Purely observed correlation between catastrophic error reports. So now I carry a “tiger rock” with me. I figure there wasn’t much of a downside to avoiding swearing in my agent instructions.

by yencabulator5 days ago|

parent|

prev|

[-]

Apparently, when a "desperation" pattern is triggered, the AI is significantly more likely to cheat and do hacky workarounds:

https://www.anthropic.com/research/emotion-concepts-function

by ghurtado5 days ago|

parent|

prev|

[-]

You haven't really lived until you've had to type this whole thing, aware of the fact that the all-caps doesn't change much, but they stay because the rage has to go somewhere

Bonus points if you find yourself actually saying it out loud while typing it.

I have used the word "shenanigans" way more in a couple of years of agentic coding than in 30 years of writing code with humans.

by ozim5 days ago|

parent|

prev|

[-]

Will save you some tokens: „write code like Linus Torvalds” - model should have all his swearing included in training data.

by johnisgood5 days ago|

parent|

prev|

[-]

I have found many mode of failures with Opus during some task related to writing letters (not legal), and I actually put it into the memory and it works more or less for these specific tasks. For example when I want it to draft something, it always ends up being so flat, yet when it explains them to me, it is usually really great but not when I am telling it to put it in the draft. Adding these to memories with the help of Opus ended up resulting in a much better experience. There are still some blind spots but I also figured out how to make it give me the charitable version, without less protection, so I do not have to now go back and forth it.

by pkaye5 days ago|

parent|

prev|

[-]

I noticed that when trying to use Codex and compared to Opus. So many layers of simple functions added by Codex. I need to try this out in my Agents.md.

by prasanthabr5 days ago|

parent|

prev|

[-]

Curious : why would you say no design patterns?

by PhilipDaineko4 days ago|

parent|

[-]

Because design patterns are only applicable at a scale. I noticed codex inventing factories, components, etc when the task was simply to draft HTML page. Instead, it build the entire layered architecture for imaginary future complexity - classical right-after-graduation student - it knows how to build the cool stuff, but does not know it is not applicable everywhere

by carterschonwald5 days ago|

parent|

prev|

[-]

i actually think this is too tame. it really has to be stuff youd mever say to a real person.

by lxgr5 days ago|

parent|

[-]

Does it really? I'd be surprised if abuse actually worked better than sternly worded warnings/instructions, and even if it did, it doesn't seem healthy to get used to that type of prompting.

by apercu5 days ago|

parent|

prev|

[-]

It might be a salient point but I didn't read it as it was yelling at me.

by GoToRO5 days ago|

parent|

prev|

[-]

you forgot to sign it with Donald J Trump

by thewebguyd5 days ago|

parent|

[-]

Thank you for your attention to this matter.

by superkickstart5 days ago|

prev|

[-]

I'm not sure if i do something differently but i have the exact opposite experience with these models. Claude always feels like it's generating way too overdesigned and hard to understand code with the vibe oriented feel while codex is cleaner and more "task at hand" and easier to work with.

by sebmellen5 days ago|

parent|

[-]

Agreed

by syzygyhack5 days ago|

prev|

[-]

I echo your observations. I expect you will enjoy deepseek-v4-pro for writing code. Much closer to that Opus experience, and very cost-effective too. With 5.5 as a reviewer and specialist, all bases are covered.

by dilap5 days ago|

prev|

[-]

Have you tried iterating on style feedback in AGENTS.md? I've been reasonably successful using this to get it to output code in a terse, non-defensive style that matches my hand-written code.

by trollbridge5 days ago|

prev|

[-]

GPT-5.5 did a significantly worse job than Qwen-3.7-Max on a job today (some devops tasks I wanted to create some reusable scripts for). Kind of disappointing.

by CamperBob25 days ago|

parent|

[-]

I've also seen Qwen 3.6 beat GPT 5.5 a couple of times. The ball is definitely in OpenAI's court now. Qwen is not going to fare so well against Fable, from what I've seen so far.

by trollbridge4 days ago|

parent|

[-]

In theory, GPT-5.5-Pro would do better, but it’s so expensive it’s not worth experimenting to find out.

by vruiz5 days ago|

prev|

[-]

This is my experience as well. I have defined a CLAUDE.md rule to ask codex to automatically code review, and I tell it that the reviewer is very picky and to only implement what it considers valuable feedback. I hope they don't converge over time, currently, in combination they works really well.

by moomoo115 days ago|

prev|

[-]

i had this same complaint but no offense to you it turned out i was just not using the models right.

ai llm are doing what i tell them to.

if you’re building something meaningful (in my case a platform used by many people across many companies) you want to ensure you

1. have actual systems engineering and architecture in mind that you want the models to

2. implement based on what you tell it to do

when i was just telling the models what i want done without doing due diligence it would go and do some moronic implementation that was awful. mid input = mid output

these days i just maintain specifications documents and the AI follows everything i tell it to in that document. so when i tell it to dos one thing, the result is made following those architecture specs.

i have code that is single resp, modular, easy to extend and test.

i would ballpark 95% of the time i get what i asked for.

sometimes it tries to be clever in cases that weren’t covered in my arch specs. in those 5% of cases i go and update my specs.

source: used billions of tokens worth to build something actually in production across both mobile platforms and web, deployed on my own cloud infra. i use codex mainly. some claude.

by GoToRO5 days ago|

prev|

[-]

I noticed too, that whatever they offer in the chat, for free, is smarter, as in no more bs. I use claude code and I want to try codex too but I don't need two subscriptions. I did try codex for some planning and it was really good. Thanks for giving me an insight into how it generates code.