undefined

upvote

points

by embedding-shape11 hours ago |

upvote

by rsav11 hours ago|

[-]

There's also:

>I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships.

-- Linus Torvalds

reply

upvote

by aleph_minus_one6 hours ago|

[-]

> >I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships.

> -- Linus Torvalds

What about programmers

- for whom the code is a data structure?

- who formulate their data structures in a way (e.g. in a very powerful type system) such that all the data structures are code?

- who invent a completely novel way of thinking about computer programs such that in this paradigm both code and data structures are just trivial special cases of some mind-blowing concept ζ of which there exist other special cases that are useful to write powerful programs, but these special cases are completely alien from anything that could be called "code" or "data (structures)", i.e. these programmers don't think/worry about code or data structures, but about ζ?

reply

upvote

by sph8 hours ago|

[-]

From what I understand from the vibe coders, they tell a machine what the code should do, but not how it should do it. They leave the important decisions (the shape of data) to an LLM and thus run afoul of this, which is gonna bite them in the ass eventually.

reply

upvote

by mikepurvis10 hours ago|

[-]

I think this is sometimes a barrier to getting started for me. I know that I need to explore the data structure design in the context of the code that will interact with it and some of that code will be thrown out as the data structure becomes more clear, but still it can be hard to get off the ground when me gut instinct is that the data design isn't right.

This kind of exploration can be a really positive use case for AI I think, like show me a sketch of this design vs that design and let's compare them together.

reply

upvote

by sph8 hours ago|

[-]

AI is terrible for this.

My recommendation is to truly learn a functional language and apply it to a real world product. Then you’ll learn how to think about data, in its pure state, and how it is transformed to get from point A to point B. These lessons will make for much cleaner design that will be applicable to imperative languages as well.

Or learn C where you do not have the luxury of using high-level crutches.

reply

upvote

by ignoramous9 hours ago|

[-]

> This kind of exploration can be a really positive use case for AI I think

Not sure if SoTA codegen models are capable of navigating design space and coming up with optimal solutions. Like for cybersecurity, may be specialized models (like DeepMind's Sec-Gemini), if there are any, might?

I reckon, a programmer who already has learnt about / explored the design space, will be able to prompt more pointedly and evaluate the output qualitatively.

> sometimes a barrier to getting started for me

Plenty great books on the topic (:

Algorithms + Data Structures = Programs (1976), https://en.wikipedia.org/wiki/Algorithms_%2B_Data_Structures...

reply

upvote

by mikepurvis9 hours ago|

[-]

Yeah key word is exploration. It's not "hey Claude write the design doc for me" but rather, here's two possible directions for how to structure my solution, help me sketch each out a bit further so that I can get a better sense what roadblocks I may hit 50-100 hours into implementation when the cost of changing course is far greater.

reply

upvote

by Zamicol6 hours ago|

[-]

That is excellent. I'm putting that in my notes.

reply

upvote

by Intermernet11 hours ago|

[-]

I believe the actual quote is:

"Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious." -- Fred Brooks, The Mythical Man Month (1975)

reply

upvote

by bfivyvysj11 hours ago|

[-]

This is the biggest issue I see with AI driven development. The data structures are incredibly naive. Yes it's easy to steer them in a different direction but that comes at a long term cost. The further you move from naive the more often you will need to resteer downstream and no amount of context management will help you, it is fighting against the literal mean.

reply

upvote

by nostrademons8 hours ago|

[-]

The rule may not hold with AI driven development. The rule exists because it's expensive to rewrite code that depends on a given data structure arrangement, and so programmers usually resort to hacks (eg. writing translation layers or views & traversals of the data) so they can work with a more convenient data structure with functionality that's written later. If writing code becomes free, the AI will just rewrite the whole program to fit the new requirements.

This is what I've observed with using AI on relatively small (~1000 line) programs. When I add a requirement that requires a different data structure, Claude will happily move to the new optimal data structure, and rewrite literally everything accordingly.

I've heard that it gets dicier when you have source files that are 30K-40K lines and programs that are in the million+ line range. My reports have reported that Gemini falls down badly in this case, because the source file blows the context window. But even then, they've also reported that you can make progress by asking Gemini to come up with the new design, and then asking it to come up with a list of modules that depend upon the old structure, and then asking it to write a shim layer module-by-module to have the old code use the new data structure, and then have it replace the old data structure with the new one, and then have it remove the shim layer and rewrite the code of each module to natively use the new data structure. Basically, babysit it through the same refactoring that an experienced programmer would use to do a large-scale refactoring in a million+ line codebase, but have the AI rewrite modules in 5 minutes that would take a programmer 5 weeks.

reply

upvote

by Intermernet11 hours ago|

[-]

Naive doesn't mean bad. 99% of software can be written with understood, well documented data structures. One of the problems with ai is that it allows people to create software without understanding the trade offs of certain data structures, algorithms and more fundamental hardware management strategies.

You don't need to be able to pass a leet code interview, but you should know about big O complexity, you should be able to work out if a linked list is better than an array, you should be able to program a trie, and you should be at least aware of concepts like cache coherence / locality. You don't need to be an expert, but these are realities of the way software and hardware work. They're also not super complex to gain a working knowledge of, and various LLMs are probably a really good way to gain that knowledge.

reply

upvote

by dotancohen10 hours ago|

[-]

Then don't let the AI write the data structures. I don't. I usually don't even let the AI write the class or method names. I give it a skeleton application and let it fill in the code. Works great, and I retain knowledge of how the application works.

reply

upvote

by andsoitis11 hours ago|

[-]

> This is the biggest issue I see with AI driven development. The data structures are incredibly naive.

Bill Gates, for example, always advocated for thinking through the entire program design and data structures before writing any code, emphasizing that structure is crucial to success.

reply

upvote

by neocron10 hours ago|

[-]

Ah Bill Gates, the epitome of good software

reply

upvote

by andsoitis10 hours ago|

[-]

> Ah Bill Gates, the epitome of good software

While developing Altair BASIC, his choice of data structures and algorithms enabled him to fit the code into just 4 kilobytes.

reply

upvote

by dotancohen10 hours ago|

[-]

Yes, actually. Gates wrote great software.

Microsoft is another story.

reply

upvote

by jll299 hours ago|

[-]

And Paul Allen wrote a whole Altair emulator so that they could use an (academic) Harvard computer for their little (commercial) project and test/run Bill Gates' BASIC interpreter on it.

reply

upvote

by PaulDavisThe1st7 hours ago|

[-]

I'd like to see Gates or anyone else do that for a project that lasts (at least) a quarter century and sees a many-fold increase in CPU speed, RAM availability, disk capacity etc.

reply

upvote

by mock-possum8 hours ago|

[-]

I’m really going to need to see both. There’s a lot of business logic that simply is not encoded in a data storage model.

reply

upvote

by jerf9 hours ago|

[-]

As I'm sure more and more people are using AI to document old systems, even just to get a foothold in them personally if they don't intend to share it, here's a hint related to that: By default, if you fire an AI at a programming base, at least in my experience you get the usual documentation you expect from a system: This is the list of "key modules", this module does this, this module does that, this module does the other thing.

This is the worst sort of documentation; technically true but quite unenlightening. It is, in the parlance of the Fred Brooks quote mentioned in a sibling comment, neither the "flowchart" nor the "tables"; it is simply a brute enumeration of code.

To which the fix is, ask for the right thing. Ask for it to analyze the key data structures (tables) and provide you the flow through the program (the flowchart). It'll do it no problem. Might be inaccurate, as is a hazard with all documentation, but it makes as good a try at this style of documentation as "conventional" documentation.

Honestly one of the biggest problems I have with AI coding and documentation is just that the training set is filled to the brim with mediocrity and the defaults are inferior like this on numerous fronts. Also relevant to this conversation is that AI tends to code the same way it documents and it won't have either clear flow charts or tables unless you carefully prompt for them. It's pretty good at doing it when you ask, but if you don't ask you're gonna get a mess.

(And I find, at least in my contexts, using opus, you can't seem to prompt it to "use good data structures" in advance, it just writes scripting code like it always does and like that part of the prompt wasn't there. You pretty much have to come back in after its first cut and tell it what data structures to create. Then it's really good at the rest. YMMV, as is the way of AI.)

reply

upvote

by 0xpgm10 hours ago|

[-]

Reminded me of this thread between Alan Kay and Rich Hickey where Alan Kay thinks "data" is a bad idea.

My interpretation of his point of view is that what you need is a process/interpreter/live object that 'explains' the data.

https://news.ycombinator.com/item?id=11945722

EDIT: He writes more about it in Quora. In brief, he says it is 'meaning', not 'data' that is central to programming.

https://qr.ae/pCVB9m

reply

upvote

by gregw29 hours ago|

[-]

Thanks for the pointer to this 2016 dialog!

One part of it has interesting new resonance in the era of agentic LLMs:

alankay on June 21, 2016 | root | parent | next [–]

This is why "the objects of the future" have to be ambassadors that can negotiate with other objects they've never seen. Think about this as one of the consequences of massive scaling ...

Nowdays rather than the methods associated with data objects, we are dealing with "context" and "prompts".

reply

upvote

by 0xpgm9 hours ago|

[-]

Quite a nice insight there!

I should probably be thinking more in this direction.

reply

upvote

by johnmaguire9 hours ago|

[-]

Hm, not sure. Data on its own (say, a string of numbers) might be meaningless - but structured data? Sure, there may be ambiguity but well-structured data generally ought to have a clear/obvious interpretation. This is the whole idea of nailing your data structures.

reply

upvote

by 0xpgm9 hours ago|

[-]

Yeah, structured data implies some processing on raw data to improve its meaning. Alan Kay seems to want to push this idea to encapsulate data with rich behaviour.

reply

upvote

by christophilus9 hours ago|

[-]

I’m with Rich Hickey on this one, though I generally prefer my data be statically typed.

reply

upvote

by 0xpgm9 hours ago|

[-]

Sure, static typing adds some sort of process that provides a coarse interpretation of the data.

reply

upvote

by mchaver10 hours ago|

[-]

I find languages like Haskell, ReScript/OCaml to work really well for CRUD applications because they push you to think about your data and types first. Then you think about the transformations you want to make on the data via functions. When looking at new code I usually look for the types first, specifically what is getting stored and read.

reply

upvote

by embedding-shape10 hours ago|

[-]

Similarly, that approach works really well in Clojure too, albeit with a lot less concern for types, but the "data and data structures first" principle is widespread in the ecosystem.

reply

upvote

by mchaver8 hours ago|

[-]

I've heard good things about Clojure, and it'ss different from what I am used to (bonus points because I like an intellectual challenge), so trying it out is definitely on my todo list.

reply

upvote

by tangus10 hours ago|

[-]

Aren't they basically saying opposite things? Perlis is saying "don't choose the right data structure, shoehorn your data into the most popular one". This advice might have made sense before generic programming was widespread; I think it's obsolete.

reply

upvote

by Rygian9 hours ago|

[-]

Pike: strongly typed logic is great!

Perlin: stringly typed logic is great!

reply

upvote

by embedding-shape8 hours ago|

[-]

> Perlis is saying "don't choose the right data structure, shoehorn your data into the most popular one"

I don't take it like that. A map could be the right data structure for something people typically reach for classes to do, and then you get a whole bunch of functions that can already operate on a map-like thing for free.

If you take a look at the standard library and the data structures of Clojure you'd see this approach taken to a somewhat extreme amount.

reply

upvote

by alberto-m10 hours ago|

[-]

This quote from “Dive into Python” when I was a fresh graduate was one of the most impacting lines I ever read in a programming book.

> Busywork code is not important. Data is important. And data is not difficult. It's only data. If you have too much, filter it. If it's not what you want, map it. Focus on the data; leave the busywork behind.

reply

upvote

by TYPE_FASTER10 hours ago|

[-]

> Rule 5. Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

If I have learned one thing in my 30-40 years spent writing code, it is this.

reply

upvote

by seanalltogether8 hours ago|

[-]

I agree. The biggest lesson I try to drive home to newer programmers that join my projects is that its always best to transform the data into the structure you need at the very end of the chain, not at the beginning or middle. Keep the data in it's purest form and then transform it right before displaying it to the user, or right before providing it in the final api for others to consume.

You never know how requirements are going to change over the next 5 years, and pure structures are always the most flexible to work with.

reply

upvote

by bluGill8 hours ago|

[-]

Related: your business logic should work on metric units. It is a UI concern if the user wants to see some other measurement system. Convert to feet, chains, cubits... or whatever obscure measurement system the user wants at display time. (if you do get an embedded device that reports non-metric units convert when it comes in - you will get a different device in the future that reports different units anyway)

You still have to worry about someone using kg when you use g, but you avoid a large class of problems and make your logic easier.

reply

upvote

by dcuthbertson9 hours ago|

[-]

But doesn't No. 2 directly conflict with Pike's 5th rule? It seems to me these are all aphorisms that have to be taken with a grain of salt.

> 2. Functions delay binding; data structures induce binding. Moral: Structure data late in the programming process.

reply

upvote

by linhns10 hours ago|

[-]

Nice to see Perlis mentioned once in a while. Reading SICP again, still learning new things.

reply

upvote

by WillAdams10 hours ago|

[-]

There is a matching video series for SICP:

https://ocw.mit.edu/courses/6-001-structure-and-interpretati...

which I found very helpful in (finally) managing to get through that entire text (and do all the exercises).

reply

upvote

by Hendrikto10 hours ago|

[-]

I feel like these are far more vague and less actionable than the 5 Pike rules.

reply

upvote

by JanisErdmanis10 hours ago|

[-]

With 100 functions and one datastructure it is almost as programming with a global variables where new instance is equivalent to a new process. Doesn’t seem like a good rule to follow.

reply

upvote

by embedding-shape9 hours ago|

[-]

The scope of where that data structure or functions are available is a different concern though, "100 functions + 1 data structure" doesn't require globals or private, it's a separate thing.

reply

upvote

by JanisErdmanis6 hours ago|

[-]

One can always look as global variables equivalent to a context object that’s is passed in every function. It’s just a syntactic difference whether one constructs such data structure or uses it implicitly via globals.

What I am getting at is that when one has such gigantic data structure there is no separation of concerns.

reply

upvote

by CyberDildonics4 hours ago|

[-]

Does one need one's separation of concerns if one's concerns shouldn't be separated in the in the first place?

Anytime one has access to a database one has access to one large global data structure that one can access from anywhere is a program.

This same concept goes for one's global state in one's game if one is making a game.

reply

upvote

by JanisErdmanis4 hours ago|

[-]

Separation of concerns is still a valid paradigm with a single global datastructure like GUI, Microservice, Database and etc. In such situation one can still seperate concerns via composing the global datastructure from a smaller units and define methods with respect to thoose smaller units. In that way one does not need to wonder whether there are some unattended side effects when calling a function that mutates the state.

reply

upvote

by CyberDildonics3 hours ago|

[-]

Seems like one is backpedaling because one was just talking about one's separation of one's concerns and now one is defending one's separation of concerns with respect to one's global data structure.

reply

upvote

by JanisErdmanis2 hours ago|

[-]

I still firmly believe that one ctx object and hundred functions/methods is as bad as programming with plain variables defined in the global scope. If the ctx is composed from smaller data structures with whom the functions are defined, then all is good. This is the opposite of the rule.

reply

upvote

by CyberDildonics1 hours ago|

[-]

But why?

You keep saying you believe it, but that is literally what a database is, game state manipulation, string manipulation, iterator algorithms, list comprehensions, range algorithms, image manipulations, etc. These are all instances where you use the same data structures over and over with as many algorithms and functions and you need.

reply

upvote

by Pxtl10 hours ago|

[-]

As much as relational DBs have held back enterprise software for a very long time by being so conservative in their development, the fact that they force you to put this relationship absolutely front-of-mind is excellent.

reply

upvote

by embedding-shape10 hours ago|

[-]

I'd personally consider "persistence" AKA "how to store shit" to be a very different concern compared to the data structures that you use in the program. Ideally, your design shouldn't care about how things are stores, unless there is a particular concern for how fast things read/writes.

reply

upvote

by mosura8 hours ago|

[-]

Often significant improvements to every aspect of a system that interacts with a database can be made by proper design of the primary keys, instead of the generic id way too many people jump to.

The key difficulty is identifying what these are is far from obvious upfront, and so often an index appears adjacent to a table that represents what the table should have been in the first place.

reply

upvote

by embedding-shape8 hours ago|

[-]

I guess that might be true also, to some extent. I guess most of the times I've seen something "messy" in software design, it's almost always about domain code being made overly complicated compared to what it has to do, and almost never about "how does this domain data gets written/read to/from a database", although it's very common. Although of course storage/persistence isn't non-essential, just less common problem than the typical design/architecture spaghetti I encounter.

reply

upvote

by Pxtl8 hours ago|

[-]

I'm a firm believer in always using an auto-generated surrogate key for the PK because domain PKs always eventually become a pain point. The problem is that doing so does real damage to the ergonomics of the DB.

This is why I fundamentally find SQL too conservative and outdated. There are obvious patterns for cross-cutting concerns that would mitigate things like this but enterprise SQL products like Oracle and MS are awful at providing ways to do these reusable cross-cutting concerns consistently.

reply

upvote

by Pxtl8 hours ago|

[-]

I meant to reply to a different comment originally, specifically the one including this quote from Torvalds:

> Good programmers worry about data structures and their relationships.

> -- Linus Torvalds

I was specifically thinking about the "relationship" issues. The worst messes to fix are the ones where the programmer didn't consider how to relate the objects together - which relationships need to be direct PK bindings, which can be indirect, which things have to be cached vs calculated live, which things are the cache (vs the master copy), what the cardinality of each relationship is, which relationships are semantically ownerships vs peers, which data is part of the system itself vs configuration data vs live, how you handle changes to the data, (event sourcing vs changelogging vs vs append-only vs yolo update), etc.

Not quite "data structures" I admit but absolutely thinking hard about the relationship between all the data you have.

SQL doesn't frame all of these questions out for you but it's good getting you to start thinking about them in a way you might not otherwise.

reply

upvote

by DaleBiagio10 hours ago|

[-]

" 9. It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures."

That's great

reply

upvote

by mpalmer11 hours ago|

[-]

Was the "J" short for "Cassandra"?

    When someone says "I want a programming language in which I need only say what I wish done," give him a lollipop.

reply

upvote

by bandrami10 hours ago|

[-]

Also basically everything DHH ever said (I stopped using Rails 15 years ago but just defining data relationships in YAML and typing a single command to get a functioning website and database was in fact pretty cool in the oughts).

reply

upvote

by mosura11 hours ago|

[-]

Perlis is just wrong in that way academics so often are.

Pike is right.

reply

upvote

by Intermernet11 hours ago|

[-]

Hang on, they mostly agree with each other. I've spoken to Rob Pike a few times and I never heard him call out Perlis as being wrong. On this particular point, Perlis and Pike are both extending an existing idea put forward by Fred Brooks.

reply

upvote

by mosura10 hours ago|

[-]

Perlis absolutely is not saying the same thing, and as the commenter notes the functional community interpret it in a particularly extreme way.

I would guess Pike is simply wise enough not to get involved in such arguments.

reply

upvote

by jacquesm10 hours ago|

[-]

Perlis is right in the way that academics so often are and Pike is right in the way that practitioners often are. They also happen to be in rough agreement on this, unsurprisingly so.

reply

upvote

by hrmtst9383710 hours ago|

[-]

Treating either as gospel is lazy, Perlis was pushing back on dogma and Pike on theory, while legacy code makes both look cleaner on paper.

reply

upvote

by AnimalMuppet10 hours ago|

[-]

Could you be more specific?

reply

upvote

by mosura10 hours ago|

[-]

Promoting the idea of one data structure with many functions contradicts:

“If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident.”

And:

“Use simple algorithms as well as simple data structures.”

A data structure general enough to solve enough problems to be meaningful will either be poorly suited to some problems or have complex algorithms for those problems, or both.

There are reasons we don’t all use graph databases or triple stores, and rely on abstractions over our byte arrays.

reply

upvote

by AnimalMuppet8 hours ago|

[-]

I think you are badly misinterpreting the statement.

Let's say you're working for the DMV on a program for driver's licenses. The idea is to use one structure for driver's license data, as opposed to using one structure for new driver's licenses, a different one for renewals, and yet a third for expired ones, and a fourth one for name changes.

It is not saying that you should use byte arrays for driver's license records, so that you can use the same data structure for driver's license data and missile tracks. Generalize within your program, not across all possible programs running on all computers.

reply

upvote

by mosura7 hours ago|

[-]

Your admittedly exaggerated example is arguing against the entire concept of relational databases, which is not a winning proposition.

You do not write programs with one map of id to thing as you are suggesting here.

reply