upvote
> If good code was enough on its own we would read the source instead of documentation.

That's 100% how I work -- reading the source. If the code is confusing, the code needs to be fixed.

reply
Confusing code is one thing, but projects with more complex requirements or edge cases benefit from additional comments and documentation. Not everything is easily inferred from code or can be easily found in a large codebase. You can also describe e.g. chosen tradeoffs.
reply
Exactly, that's why a good project will use comments sparingly and have them only where they matter to actually meaningfully augment the code. The rest is noise.
reply
There's no way around just learning the codebase. I have never seen code documentation that was complete or correct, let alone both.
reply
I have written code that was correct and necessarily written the way it was oly to have it repeatedly altered by well meaning colleagues who thought it looked wrong, inefficient, or unidiomatic. Eventually I had to fill it with warning comments and write a substantial essay explaining why it had to be the way it was,

Code tells you what is happening but it doesn't always do it so that it is easy to understand and it almost never tells you why something is the way it is.

reply
Difficult to say without an example, but "code isn't enough" is just one possible conclusion in this case. Another one could be that the code is not actually as good as expected, and another one is that the colleagues may need to... do something about it.

An obvious example I have is CMake. I have seen so many people complaining about CMake being incomprehensible, refactoring it to make it terrible, even wrapping it in Makefiles (and then wrapping that in Dockerfiles). But the problem wasn't the original CMakeLists or a lack of comments in it. The problem was that those developers had absolutely no clue about how CMake works, and felt like they should spend a few hours modifying it instead of spending a few hours understanding it.

However, I do agree that sometimes there is a need for a comment because something is genuinely tricky. But that is rare enough that I call it "a comment" and not "literate programming".

reply
I always think the biggest mistake is using CMake in the first place. I’ve never come across a project as convoluted and poorly documented as it.
reply
What do you mean by "poorly documented"? I have been using it for 20 years, I have yet to find something that is not documented.

As for convoluted, I don't find it harder than the other build systems I use.

Really the problem I have with CMake is the amount of terribly-written CMakeLists. The norm seems to be to not know the basics of CMake but to still write a mess and then complain about CMake. If people wrote C the way they write CMake, we wouldn't blame the language.

reply
But the documentation can really help in telling why we are doing things. That also seeps in to naming things like classes. If that were not so, we'd just name everything Class1, Class2, Method1, Method2 and so on.
reply
My point is that if your code is well written, it is self-documenting. Obviously Class1 and var2 are not self-documenting.
reply
The code is what it does. The comments should contain what it's supposed to do.

Even if you give them equal roles, self-documenting code versus commented code is like having data on one disk versus having data in a RAID array.

Remember: Redundancy is a feature. Mismatches are information. Consider this:

// Calculate the sum of one and one

sum = 1 + 2;

You don't have to know anything else to see that something is wrong here. It could be that the comment is outdated, which has no direct effects and is easily solved. It could be that this is a bug in the code. In any case it is information and a great starting point for looking into a possible problem (with a simple git blame). Again, without needing any context, knowledge of the project or external documentation.

My take on developers arguing for self-documenting code is that they are undisciplined or do not use their tools well. The arguments against copious inline comments are "but people don't update them" and "I can see less of the code".

reply
> Redundancy is a feature. Mismatches are information. Consider this:

Respectfully, if someone wrote code like this, I wouldn't want to work with them. I mean next step is "I copy paste code instead of writing functions, and in the comment above I mention all the other copies, so that it's easy to check that they are all doing the same thing redundantly".

> The arguments against copious inline comments are "but people don't update them" and "I can see less of the code".

Well no, that's not my argument. I have been navigating code for 20 years and in good codebases, comments are rare and describe something "surprising". Good code is hardly surprising.

My problem with "literate programming" (which means "add a lot of comments in the implementation details") is that I find it hard to trust developers who genuinely cannot understand unsurprising code without comments. I am fine with a junior needing more time to learn, but after a few years if a developer cannot do it, it concerns me.

reply
You did not engage with my main arguments. You should still do so.

1. Redundancy: "The code is what it does. The comments should contain what it's supposed to do. [...] You don't have to know anything else to see that something is wrong here." and specifically the concrete trivial (but effective) example.

2. "My take on developers arguing for self-documenting code is that they are undisciplined or do not use their tools well. The arguments against copious inline comments are "but people don't update them" and "I can see less of the code"."

> Respectfully, if someone wrote code like this, I wouldn't want to work with them. I mean next step is "I copy paste code [...]

This is an nonsensical slippery slope fallacy. In no way does that behavior follow from placing many comments in code. It also says nothing about the clearly demonstrated value of redundancy.

> I have been navigating code for 20 years and in good codebases, comments are rare and describe something "surprising".

Your definition of good here is circular. No argument on why they are good codebases. Did you measure how easy they were to maintain? How easy it was to onboard new developers? How many bugs it contained? Note also that correlation != causation: it might very well be that the good codebases you encountered were solo-projects by highly capable motivated developers and the comment-rich ones were complicated multi-developer projects with lots of developer churn.

> My problem with "literate programming" [...] is that I find it hard to trust developers who genuinely cannot understand unsurprising code without comments.

This is gatekeeping code by making it less understandable and essentially an admission that code with comments is easier to understand. I see the logic of this, but it is solving a problem in the wrong place. Developer competence should not be ascertained by intentionally making the code worse.

reply
You talk as if you had scientific proof that literate programming is objectively better, and I was the weirdo contradicting it without bringing any scientific proof.

Fact is, you don't have any proof at all, you just have your intuition and experience. And I have mine.

> It also says nothing about the clearly demonstrated value of redundancy.

Clearly demonstrated, as in your example of "Calculate the sum of one and one"? I wouldn't call that a clear demonstration.

> This is gatekeeping code by making it less understandable

I don't feel like I am making it less understandable. My opinion is that a professional worker should have the required level of competence (otherwise they are not a professional in that field). In software engineering, we feed code to a compiler, and we trust that the compiler makes sure that the machine executes the code we write. The role of the software engineer is to understand that code.

Literate programming essentially says "I am incapable of writing code that is understandable, ever, so I always need to explain it in a natural language". Or "I am incapable of reading code, so I need it explained in a natural language". My experience is that good code is readable by competent software engineers without explaining everything. But not only that: code is more readable when it is more concise and not littered with comments.

> and essentially an admission that code with comments is easier to understand.

I disagree again. Code with comment is easier to understand for the people who cannot understand it without the comments. Now the question is, again: are those people competent to handle code professionally? Because if they don't understand the code without comments, many times they will just have to trust the comments. If they used the comments to actually understand the code, pretty quickly they would be competent enough to not require the comments. Which means that at the point where they need it, they are not yet professionals, but rather apprentices.

reply
def reallyDumbIdeaByManagerWorkaroundMethodToGetCoverageToNinetyPercent(self): """Dont worry, this is a clear description of the method. """ return False
reply
You exaggerate, but in this situation, I think putting a link to a Jira ticket or Slack convo (or whatever) as comment is best
reply
Code alone can never describe intent or rationale.
reply
Indeed, you need both!

But documentation should not go too deep in the "how" otherwise it risks telling a lie after a while as the code changes but the documentation lags.

reply
https://diataxis.fr/

(originally developed at: https://docs.divio.com/documentation-system/) --- divides documentation along two axes:

- Action (Practical) vs. Cognition (Theoretical)

- Acquisition (Studying) vs. Application (Working)

which for my current project has resulted in:

- readme.md --- (Overview) Explanation (understanding-oriented)

- Templates (small source snippets) --- Tutorials (learning-oriented)

- Literate Source (pdf) --- How-to Guides (problem-oriented)

- Index (of the above pdf) --- Reference (information-oriented)

reply
I've been trying to implement this as closely as possible from scratch in an existing FOSS project:

https://github.com/super-productivity/super-productivity/wik...

Even with a well-described framework it is still hard to maintain proper boundaries and there is always a temptation to mix things together.

reply

    README => AGENTS.md
    HOWTO => SKILLS.md
    INFO => Plan/Arch/Guide
    REFERENCE => JavaDoc-ish
I'm very near the idea that "LLM's are randomized compilers" and the human prompts should be 1000% more treated with care. Don't (necessarily) git commit the whole megabytes of token-blathering from the LLM, but keeping the human prompts:

"Hey, we're going to work on Feature X... now some test cases... I've done more testing and Z is not covered... ok, now we'll extend to cover Case Y..."

Let me hover over the 50-100 character commit message and then see the raw discussion (source) that led to the AI-generated (compiled) code. Allow AI.next to review the discussion/response/diff/tests and see if it can expose any flaws with the benefit of hindsight!

reply
> If good code was enough on its own we would read the source instead of documentation.

An axiom I have long held regarding documenting code is:

  Code answers what it does, how it does it, when it is used, 
  and who uses it.  What it cannot answer is why it exists.  
  Comments accomplish this.
reply
An important addendum: code can sometimes, with a bit of extra thinking of part of the reader, answer the 'why' question. But it's even harder for code to answer the 'why not' question. Ie what were other approaches that we tried and that didn't work? Or what business requirements preclude these other approaches.
reply
> But it's even harder for code to answer the 'why not' question.

Great point. Well-placed documentation as to why an approach was not taken can be quite valuable.

For example, documenting that domain events are persisted in the same DB transaction as changes to corresponding entities and then picked up by a different workflow instead of being sent immediately after a commit.

reply
I don't think this is enough to completely obsolete comments, but a good chunk of that information can be encoded in a VCS. It encodes all past approaches and also contains the reasoning and why not in annotation. You can also query this per line of your project.
reply
Git history is incredible important, yes, but also limited.

Practically, it only encodes information that made it into `main`, not what an author just mulled over in their head or just had a brief prototype for, or ran an unrelated toy simulation over.

reply
In fairness to GP, they said VCS, not Git, even if they are somewhat synonomous today. Other VCSes did support graph histories.

Still, "3rd dimension" code reasoning (backwards in time) has never been merged well with code editing.

reply
> In fairness to GP, they said VCS, not Git

I did say VCS, but I also don't know what Git is missing in this relation.

> Other VCSes did support graph histories.

How does Git do not?

> Still, "3rd dimension" code reasoning (backwards in time) has never been merged well with code editing.

Maybe it's not perfect, but Git seems to do that just fine for my taste. What is missing there?

reply
> Other VCSes did support graph histories.

Yes, git ain't the only one, but apart from interface difference, they are pretty much compatible in what they allow you to record in the history, I think?

Part of the problem here is that we use git for two only weakly correlated purposes:

- A history of the code

- Make nice and reviewable proposals for code changes ('Pull Request')

For the former, you want to be honest. For the latter, you want to present a polished 'lie'.

reply
> - A history of the code

Which is a causal history, not a editing log. So I don't perceive these to be actually different.

reply
Not really. Launchpad.net does not have any public branches I could share atm as an example, but Bazaar (now breezy) allowed having a nested "merge commit": your trunk would have "flattened" merge commits ("Merge branch foo"), and under it you could easily get to each individual commit by a developer ("Prototype", "Add test"...). It would really be shown as a tree, but smartness was wven richer.

This was made possible by using a DAG for commit storage and referencing, instead of relying on file contents and series of commits per reference. Merge behaviour was much smarter in case of diverging tip or criss-cross merges. But this ultimately was harder and slower to implement, and developers did not value this enough and they instead accepted the Git trade-offs.

So you seamlessly did both with a different VCS without splitting those up: in a sense, computers and software worried about that for us.

reply
I am not quite sure what you are describing here. Git's underlying commit graph is a DAG.

You can use different, custom merge-drivers (or whatever it's called) for Git to get the behaviour you describe here.

reply
Certainly, but merges are treated differently by default, and getting to this sort of output would require "custom" tooling for things like "git log".

Whereas bzr just did the expected thing.

reply
You can select whether you want the diff to the first or the second parent, which is the difference between collapsing and expanding merges. You can also completely collapse merges by showing first-parent-history.

Or I do not understand what you mean with "the expected thing".

reply
If you throw away commit messages, that is on you, it is not a limitation of Git. If I am cleaning up before merging, I'm maybe rephrasing things, but I am not throwing that information away. I regularly push branches under 'draft/...' or 'fail/...' to the central project repository.
reply
Sure, but you are still supposed to clean things up to make the life of the reviewer easier.

There's an inherent tension between honest history and a polished 'lie' to make the reviewer's life easier.

reply
The WIP commits I initially recorded also don't necessarily existed as such in my file system and often don't really work completely, so I don't know why the commit after a rebase is any more a lie then the commit before the rebase.
reply
The "honest" historical record of when I decided to use "git commit" while working on something is 100% useless for anyone but me (for me it's 90% useless).

git tracks revisions, not history of file changes.

reply
Sounds easier (for everybody) to just use comments.
reply
You put past failed implementation in comments? That sounds like a nightmare. I rather only include a short description in the comment that can then link to the older implementation if necessary.
reply
But why would you ever put that into your VCS as opposed to code comments?

The VCS history has to be actively pulled up and reading through it is a slog, and history becomes exceptionally difficult to retrace in certain kinds of refactoring.

In contrast, code comments are exactly what you need and no more, you can't accidentally miss them, and you don't have to do extra work to find them.

I have never understood the idea of relying on code history instead of code comments. It seems like it's all downsides, zero upsides.

reply
Because comments are a bad fit to encode the evolution of code. We implemented systems to do that for a reason.

> The VCS history has to be actively pulled up and reading through it is a slog

Yes, but it also allows to query history e.g. by function, which to me gets me to understand much faster than wading through the current state and trying to piece information together from the status quo and comments.

> history becomes exceptionally difficult to retrace in certain kinds of refactoring.

True, but these refactorings also make it more difficult to understand other properties of code that still refers to the architecture pre-refactoring.

> I have never understood the idea of relying on code history instead of code comments. It seems like it's all downsides, zero upsides.

Comments are inherently linear to the code, that is sometimes what you need, for complex behaviour, you rather want to comment things along another dimension, and that is what a VCS provides.

What I write is this:

    /* This used to do X, but this causes Y and Z 
       and also conflicts with the FOO introduced 
       in 5d066d46a5541673d7059705ccaec8f086415102.
       Therefore it does now do BAR, 
       see c7124e6c1b247b5ec713c7fb8c53d1251f31a6af */
reply
Both have their place. While I mostly agree with you, there's a clear example where git history is better: delete old or dead or unused code, rather than comment it out.
reply
Good naming and good tests can get you 90% of the way to "why" too.
reply
Agreed. Tests are documentation too. Tests are the "contract": "my code solves those issues. If you have to modify my tests, you have a different understanding than I had and should make sure it is what you want".
reply
Having "grown up" on free software, I've always been quick to jump into code when documentation was dubious or lacking: there is only one canonical source of truth, and you need to be good at reading it.

Though I'd note two kinds of documentation: docs how software is built (seldom needed if you have good source code), and how it is operated. When it comes to the former, I jump into code even sooner as documentation rarely answers my questions.

Still, I do believe that literate programming is the best of both worlds, and I frequently lament the dead practice of doing "doctests" with Python (though I guess Jupyter notebooks are in a similar vein).

Usually, the automated tests are the best documentation you can have!

reply
I do read the code instead of the documentation, whenever that is an option.

Interesting factiod. The number of times I've found the code to describe what the software does more accurately than the documentation: many.

The number of times I've found the documentation to describe what the software does more accurately than the code: never.

reply
You seem to misunderstand the purpose of documentation.

It's not to be more accurate than the code itself. That would be absurd, and is by definition impossible, of course.

It's to save you time and clarify why's. Hopefully, reading the documentation is about 100x faster than reading the code. And explains what things are for, as opposed to just what they are.

reply
Clearly.

Crazy thing.

Number of times reading the source saved time and clarified why: many.

Number of times reading the documentation saved time and clarified why: never.

Perhaps I've just been unlucky?

EDIT:

The hilarious part to me is that everyone can talk past each other all day (reading the documentation) or we can show each other examples of good/bad documentation or good/bad code (reading the code) and understand immediately.

reply
> Number of times reading the documentation saved time and clarified why: never.

OK, so let's use an example... if you need to e.g. make a quick plot with Matplotlib. You just... what? Block off a couple weeks and read the source code start to finish? Or maybe reduce it to just a couple days, if you're trying to locate and understand the code just for the one type of plot you're trying to create? And the several function calls you need to set it up and display it in the end?

Instead of looking at the docs and figuring out how to do it in 5 or 10 min?

Because I am genuinely baffled here.

reply
Literate programming is not about documenting the public API, it's about documenting the implementation details, right? Otherwise no need for a new name, it's just "API documentation".

> if you need to e.g. make a quick plot with Matplotlib. You just... what?

Read the API documentation.

Now if you need to fix a bug in Matplotlib, or contribute a feature to it, then you read the code.

reply
> If good code was enough on its own we would read the source instead of documentation.

Uh. We do. We, in fact, do this very thing. Lots of comments in code is a code smell. Yes, really.

If I see lots of comments in code, I'm gonna go looking for the intern who just put up their first PR.

> I believe part of good software is good documentation

It is not. Docs tell you how to use the software. If you need to know what it does, you read the code.

reply
> Lots of comments in code is a code smell. Yes, really.

No, not really. It's actually a sign of devs who are helping future devs who will maintain and extend the code, so they can understand it faster. It's professionalism and respect.

> If I see lots of comments in code, I'm gonna go looking for the intern who just put up their first PR.

And I'm going to find them to say good job, keep it up! You're saving us time and money in the future.

reply
> It's professionalism and respect.

If someone gives me code full of superfluous comments, I don't consider it professional. Sounds like an intern who felt the need to comment everything because ever single line seemed very complex to them.

reply
Nobody said anything about "superfluous" comments.

I'm assuming "lots of comments" means lots of meaningful comments. As complex code often requires. Nobody's talking about `i++; // increment i` here.

reply
> I'm assuming "lots of comments" means lots of meaningful comments.

That's not what literate programming is. Literate programming says that you explain everything in a natural language.

IMO, good code is largely unsurprising. I don't need comments for unsurprising code. I need comments for surprising code, but that is the exception, not the rule. Literate programming says that it is the rule, and I disagree.

reply
> Literate programming says that you explain everything in a natural language.

At a high level. Not line-by-line comments.

> IMO, good code is largely unsurprising. I don't need comments for unsurprising code.

I've never heard anything like that, and could not disagree more. Twenty different considerations might go into a single line of code. Often, one of them is something non-obvious. So you comment that thing. The idea that "good" code avoids anything non-obvious, that those are "exceptions", is frankly bizarre to me. Unless the code you write is 99% boilerplate or something.

reply
> So you comment that thing. The idea that "good" code avoids anything non-obvious, that those are "exceptions", is frankly bizarre to me.

What I find interesting from the comments here is that there are obviously different perspectives on that. Granted, I cannot say that my way is better. Just as you cannot say that your way is better.

But I am annoyed when I have to deal with code following your standards, and I assume you are annoyed when you have to deal with code following mine :-).

Or maybe, I imagine that people who defend literate programming mean more comments than I think is reasonable, and people who disagree with me (like you) imagine that I mean fewer comments than you think is reasonable. And maybe in reality, given actual code samples, we would totally agree :-).

Communication is hard.

reply
> If you need to know what it does, you read the code.

True.

But If you need to know why it does what its does, you read the comments. And often you need that knowledge if you are about to modify it.

reply
Do you have an example of such knowledge that you need to get from the comments? I have been programming for 20 years, and I genuinely don't see that much code that is so complex that it needs comments.

Not that it doesn't exist; sometimes it's needed. But so rarely that I call it "comments", and not a whole discipline in itself that is apparently be called "literate programming". Literate programming sounds like "you need to comment pretty much everything because code is generally hard to understand". I disagree with that. Most code is trivial, though you may need to learn about the domain.

reply
> Literate programming sounds like "you need to comment pretty much everything because code is generally hard to understand".

You and I read code. Came so naturally for me that I didn't realize others don't. But over the years and with some weird chats I've realized that for a lot of developers it's more like "deciphering code", like they're slowly translating a human language they only vaguely know - and it never even crossed their mind that it was possible to learn a programming language to the point you could just read it.

reply
I've never properly tried literate programming, overkill for hobby projects and not practical for a team unless everyone agrees.

Examples of code that needs comments in my career tend to come from projects that model the behaviour of electrical machines. The longest running such project was a large object oriented model (one of the few places where OOP really makes sense). The calculations were extremely time consuming and there were places where we were operating with small differences between large numbers.

As team members came and went and as the project matured the team changed from one composed of electrical engineers, physicists, and mathematicians who knew the domain inside out to one where the bulk of the programmers were young computer science graduates who generally had no physical science background at all.

This meant that they often had no idea what the various parts of the program were doing and had no intuition that would make them stop and think or ask a question before fixing a bug in wat seemed the most efficient way.

The problem in this case is that sometimes you have to sacrifice runtime speed for correctness and numerical stability. You can't always re-order operations to reduce the number of assignments say and expect to get the same answers.

Of course you can write unit and functional tests to catch some such errors but my experience says that tests need even better comments than the code that is being tested.

reply
Because the why can be completely unrelated to the code (odd business requirements etc). The code can be known to be non-optimal but it is still the correct way because the embedded system used in product XYZ has some dumb chip in it that needs it this weird way etc. Or the CEO loves this way of doing things and fires everyone who touches it. So many possibilities, most technical projects have a huge amount of politics and weird legacy behavior that someone depends on (including on internal stuff, private methods are not guaranteed to not be used by a client for example). And comments can guard against it, both for the dev and the reviewer. Hell we currently have clients depend on the exact internal layout of some PDF reports, and not even the rendered layout but that actual definitions.
reply
Again, if it's a comment saying "we need this hack because the hardware doesn't support anything", I don't call it "literate programming".

Literate programming seems to be the idea that you should write prose next to the code, because code "is difficult to understand". I disagree with that. Most good code is simple to understand (doesn't mean it's easy to write good code).

And the comments here prove my point, I believe: whenever I ask for examples where a comment is needed, the answer is something very rare and specific (e.g. a hardware limitation). The answer to that is comments where those rare and specific situations arise. Not a whole concept of "literate programming".

reply
Most of my comments related to the outside world not behaving quite as you would expect.

Usually something like the spec says this but the actual behaviour is something else.

reply
Not for everything. For code you own, yes this is often the case. For the majority of the layers you still rely on documentation. Take the project you mention going straight to source, did you follow this thread all the way down through each compiler involved in building the project? Of course not.
reply
My understanding is that "literate programming" doesn't say "you should document the public API". It says "you should document the implementation details, because code is hard to understand".

My opinion is that if whoever is interested in reading the implementation details cannot understand it, either the code is bad or they need to improve themselves. Most of the time at least. But I hear a lot of "I am very smart, so if I don't understand it without any effort, it means it's too complicated".

reply