upvote
> Do we, really?

Yes, or pretty close to it. What we don't know how to do (AFAIK) is do it at a cost that would be acceptable for most software. So yes, it mostly gets done for (components of) planes, spacecraft, medical devices, etc.

Totally agreed that most software is a morass of bugs. But giving examples of buggy software doesn't provide any information about whether we know how to make non-buggy software. It only provides information about whether we know how to make buggy software—spoiler alert: we do :)

reply
There is a huge wetware problem too. Like if I can send you an email or other message that tricks you and gets you to send me $10k, what do I care if the industry is 100% effective at blocking RCE?
reply
The social hack executed in digital space. 100% agree.
reply
> So yes, it mostly gets done for (components of) planes, spacecraft, medical devices, etc.

I have to disagree here. All of these you mentioned have regularly bugs. Multiple spacecraft got lost because of these. For planes there's not so distant Boeing 737 MAX fiasco (admittedly this was bad software behavior caused by sensor failure). And medical devices, the news about their bugs semi-regularly pop up. So while the software for these might do a bit better than the rest, they certainly are not anywhere close to being bug free.

And same goes for specifications the software is based on. Those aren't bug-free either. And writing software based on flawed specification will inevitably result in flawed software.

That's not to say we should give up on trying to write bug free software. But we currently don't know how to do so.

reply
That software also often has bugs. It's usually a bit more likely that they are documented, though, and unlikely to cause a significant failure on their own.
reply
building around bugs that you know exists but dont know where is also a part of it. Reliability in the face of bugs. The mere existence of bugs isn't enough to call the software buggy, if the outcome is reliable (e.g., a triple module redundancy).
reply
For a silly example, see how Python programs have plenty of bugs, but they still (usually) don't allow for the kind of memory exploits that C programs give you.

You could say that Python is designed around preventing these memory bugs.

reply
Then we can't do it. Cost is a requirement
reply
Cost is a parameter subject to engineering tradeoffs, just like performance, feature sets, and implementation time.

Security and reliability are also parameters that exist on a sliding scale, the industry has simply chosen to slide the "cost" parameter all the way to one end of the spectrum. As a result, the number of bugs and hacks observed are far enough from the desired value of zero that it's clear the true requirements for those parameters cannot be honestly said to be zero.

reply
> the number of bugs and hacks observed are far enough from the desired value of zero

Zero is not the desired number, particularly not when discussing "hacks". This may not matter in current situation, but there's a lot of "security maximalism" in the industry conversations today, and people seem to not realize that dragging the "security" slider all the way to the right means not just the costs becoming practically infinite, but also the functionality and utility of the product falling down to 0.

reply
I know a lot of security researchers will disagree with this notion, but I personally think that security (& privacy, I'm going to refer to both as "security" for brevity here) are an overhead. I think that's why it needs to exist *and be discussed* as a sliding scale. I do find a lot of people in this space chase some ideal without a consideration for practicality.

Mind, I'm not talking about financial overhead for the company/developer(s), but rather an UX overhead for the user. It often increases friction and might even need education/training to even make use the software it's attached to. It's much like how body armor increases the weight one has to carry and decreases mobility, security has (conceptually) very similar tradeoffs (cognitive instead of physical overhead, and time/interactions/hoops instead of mobility). Likewise, sometimes one might pick a lighter Kevlar suit, whereas othertimes a ceramic plate is appropriate.

Now, body armor is still a very good idea if you're expecting to be engaged in a fight, but I think we can all agree that not everyone on the street in, say, a random village in Austria, needs to wear ceramic plates all the time.

The analogy does have its limits, of course ... for example, one issue with security (which firmly slides it towards erring on the safe side) as compared to warfare is that you generally know if someone shot at you and body armor saved you; with security (and, again, privacy), you often won't even know you needed it even if it helped you. And both share the trait that if you needed it and didn't have it, it's often too late.

Nevertheless, whether worth it or not (and to be clear, I think it's very worth it), I think it's important that people don't forget that this is not free. There's no free lunch --- security & privacy are no exception.

Ultimately, you can have a super-secure system with an explicit trust system that will be too much for most people to use daily; or something simpler (e.g. Signal) that sacrifices a few guarantees to make it easier to use ... but the lower barrier to entry ensuring more people have at least a baseline of security&privacy in their chats.

Both have value and both should exist, but we shouldn't pretend the latter is worthless because there are more secure systems out there.

reply
In my experience, the proper infosec professionals both know and balance this well, it's the amateurs and posers who gets it wrong.
reply
> utility of the product falling down to 0.

Today a bank really sent me a legitimate email about trying their new site. Went over, it was their site alright, logged in with correct username and password - poof, instantly blocked for suspicious access (from my usual home machine), call helpline to fix.

Now that's safe ... and useless. But safe.

reply
Reminds me of repl.it, which perma-blocked my newly created account before I even had a chance to type in e-mail verification code; in fact the notice about account block came before the one-time e-mail verification code.

I still wonder what did I do wrong (support isn't responsive). But it's true that we're both safe from having a user/vendor relationship now.

reply
Is it the industry making this choice or the customer?

You could make a car that's safer than others at 10x the price but what would the demand look like at that price?

Would you pay 2x for your favourite software and forego some of the more complex features to get a version with half the security issues?

reply
Sometimes I want VSCode and sometimes I want Notepad.

Well.. except that I never want either of those. So sometimes I want Kate editor and sometimes I want Akelpad.

reply
The question was not if it was possible within price boundary X, but if it was possible at all. There is a difference, please don't confound possibility with feasibility.
reply
Is having problematic features that causes problems also a requirement?

The answer to the above question will reveal if someone an engineer or a electrician/plumber/code monkey.

In virtually every other engineering discipline engineers have a very prominent seat at the table, and the opposite is only true in very corrupt situations.

reply
Unlimited budget and unlimited people won't solve unlimited problems with perfection.

Even basic theorems of science are incorrect.

reply
Also people keep insisting on using unsafe languages like C.

It depends on exactly what you are doing but there are many languages which are efficient to develop in if less efficient to execute like Java and Javascript and Python which are better in many respects and other languages which are less efficient to develop in but more efficient to run like Rust. So at the very least it is a trilemma and not a dilemma.

reply
C is about the safest language you can choose, between cbmc, frama-c and coccinelle there is hardly another language with comparable tooling for writing actually safe software, that you can actually securely run on single-core hardened systems. I would be really interested to hear the alternatives, though!
reply
deleted
reply
> if less efficient to execute like Java and Javascript and Python

One of these is not like the others...

Java (JVM) is extremely fast.

reply
JVM is fast for certain use cases but not for all use cases. It loads slowly, takes a while to warm up, generally needs a lot of memory and the runtime is large and idiosyncratic. You don't see lots of shared libraries, terminal applications or embedded programs written in Java, even though they are all technically possible.
reply
The JVM has been extremely fast for a long long time now. Even Javascript is really fast, and if you really need performance there’s also others in the same performance class like C#, Rust, Go.

Hot take, but: Performance hasn’t been a major factor in choosing C or C++ for almost two decades now.

reply
I think it is the perception of performance instead of the actual performance, also that C/C++ encroaches on “close to the metal” assembly for many applications. (E.g. when I think how much C moves the stack pointer around meaninglessly in my AVR-8 programs it drives me nuts but AVR-8 has a hard limit and C programs are portable to the much faster ESP32 and ARM.

A while back when my son was playing Chess I wrote a chess engine in Python and then tried to make a better one in Java which could respect time control, it was not hard to make the main search routine work without allocating memory but I tried to do transposition tables with Java objects it made the engine slower, not faster. I could have implemented them with off-heap memory but around that time my son switched from Chess to guitar so I started thinks about audio processing instead.

The Rust vs Java comparison is also pointed. I was excited about Rust the same way I was excited about cyclone when it came out but seeing people struggle with async is painful for me to watch and makes it look like the whole idea doesn’t really work when you get away from what you can do with stack allocation. People think they can’t live with Java’s GC pauses.

reply
The language plays a role, but I think the best example of software with very few bugs is something like qmail and that's written in C. qmail did have bugs, but impressively few.

Write code that carefully however is really not something you just do, it would require a massive improvement of skills overall. The majority of developers simply aren't skilled enough to write something anywhere near the quality of qmail.

Most software also doesn't need to be that good, but then we need to be more careful with deployments. The fact that someone just installs Wordpress (which itself is pretty good in terms of quality) and starts installing plugins from un-trusted developers indicates that many still doesn't have a security mindset. You really should review the code you deploy, but I understand why many don't.

reply
I was qmail fanbois back in the day and loved how djb wrote his own string handling library. I built things with qmail that were much more than an email server (think cgi-bin for web servers) and knew the people who ran the largest email installation in the world (not sure how good they were about opt-in…)

Djb didn’t allow forking and repackaging so quail did not keep up with an increasingly hostile environment where it got so bad that when the love letter virus came out it was insufficient to add content filtering to qmail and I had to write scripts that blocked senders at the firewall. Security was no longer a 0 and 1 problem, it was certainly possible to patch up and extend qmail to survive in that environment but there was something to say for having it all in one nice package…. And once the deliverability crisis started, I gave up on running email servers entirely.

reply
qmail was a lot of fun, so was djbdns and daemontools, but you're right it failed to keep up and DJBs attitude didn't help.

We built a weird solution where two systems would sync data via email. Upstream would do a dump from an Oracle database, pipe it to us via SMTP and a hook in qmail would pick up the email, get the attachment and update our systems. I remember getting a call one or two years after leaving the organisation, the new systems administrator wanted to know how their database was always kept up to date. It worked brilliantly, but they felt unsafe not knowing how. I really should have documented that part better.

reply
> No matter where I look, up and down the stack, across different OSes and tech stacks, there are bugs.

I’m not sure I’d go quite as far as GP, but they did caveat that we often choose not to write software with few bugs. And empirically, that’s pretty true.

The software I’ve written for myself or where I’ve taken the time to do things better or rewrite parts I wasn’t happy with have had remarkably few bugs. I have critical software still running—unmodified—at former employers which hasn’t been touched in nearly a decade. Perhaps not totally bug-free, but close enough that they haven’t been noticed or mattered enough to bother pushing a fix and cutting a release.

Personally I think it’s clear we have the tools and capabilities to write software with one or two orders of magnitude fewer bugs than we choose to. If anything, my hope for AI-coded software development is that it drops the marginal cost difference between writing crap and writing good software, rebalancing the economic calculus in favor of quality for once.

reply
> I’m not sure I’d go quite as far as GP, but they did caveat that we often choose not to write software with few bugs. And empirically, that’s pretty true.

Blame PMs for this. Delivering by some arbitrary date on a calendar means that something is getting shipped regardless of quality. Make it functional for 80% of use, then we'll fix the remaining bits in releases. However, that doesn't happen as the team is assigned new task because new tasks/features is what brings in new users, not fixing existing problems.

reply
I don’t disagree but is the alternative unbounded dev where you write code until it’s perfect? That doesn’t sound like a better business outcome. The trade off can’t be “take as long as you want”
reply
“The alternative is that nothing will ever get released because devs will take forever making it perfect” is a really lame take.

We have literally countless examples of software that devs have released entirely of their own volition when they felt it was ready.

If anything, in my experience, software that’s written a little slower and to a higher standard of quality is faster-releasing in the long (and medium) run. You’d be shocked at how productive your developers are when they aren’t task-switching every thirty minutes to put out fires, or when feature work isn’t constantly burdened by having to upend unrelated parts of the code due to hopelessly interwoven design.

reply
I'm happy to be reoriented with examples. Please provide some? You said countless but mentioned none.
reply
I'd say SQLite is one good example:

https://sqlite.org/chronology.html

Regular releases for over a quarter of a century now, and it's renowned for its reliability.

reply
tex was pretty bug free.
reply
I think PMs fail to understand categories of change in terms of complexity because they focus on the user facing surface and deal in timelines. A change that brings in a big feature can be straightforward because it perfectly fits the existing landscape. A seemingly trivial change can have lot of complexities that are hard to predict in terms of timelines.

There is also the angle of asking for estimate without allocating time for estimation itself.

For lack of a better word, I think it should drive from "complexity". Hardness of estimate should be inversely proportional to the complexity. Adding field to a UI when it is also exposed via the API is generally low complexity so my estimate would likely hold. We can provide estimate for a major change but the estimate would be soft and subject to stretch and it is the role of the PM to communicate it accordingly to the stakeholders.

reply
Some coding doesn't fit your schedule. If you've scheduled 2 weeks, but it takes 3, then it takes 3. Scheduling it to take 2 does nothing to actually make the coding faster.
reply
3 sounds fine.

Then I ask: why not add a week to how long that thing will take, meaning it stretches two sprints (or whatever you call it).

Add upfront. Then if you get to hard convo where someone says “do it sooner” you say “not possible.”

reply
The fundamental problem remains: it’s difficult to predict how long it will take to solve a series of puzzles. I worked in a dev group where we’d take the happy path estimate and double it… it didn’t help much. So often I’d think something would take me a week, so two walls was allotted, but I made a discovery in my first like hour/day whatever that reduced the dev time to like a couple days. Then, there were tasks that I thought I’d solve in a few days that took me weeks because I couldn’t foresee some series of problems to overcome. Taking a guess and adding time to it just shifts the endpoint of the guess. That didn’t help us much.
reply
That's the point I am making, and the point of asking "what is the alternative"

Developers aren't alone in adhering to schedules. Many folks in many roles do it. All deal with missed deadlines, success, expectation management, etc. No one operates in magical no-timeline land unless they do not at all answer to anyone or any user. Not the predominant model, right?

So rather than just say "you can blame the PMs" I'd love to hear a realistic-to-business flow idea.

I am not saying I have the answers or a "take". I've both asked for and been asked for estimates and many times told people "I can't estimate that because I don't know what will happen along the way."

So, it's not just PMs. It's the whole system. Is there a real solution or are we pretending there might be? Honest inquiry.

reply
Software release dates are so arbitrary though. We no longer make physical media that needs time to make and ship. Why does software need to be released on February 15th instead of March 7th?
reply
You could ask the same question about the contents of the release. Why does software need to be released with features X, Y, and Z on March 7th when it could be released with features X and Y on February 15th?

It's inevitable that work will slip. That doesn't necessarily mean the release will slip. Sometimes you actually need the thing, but often the work is something you want to include in the release but don't absolutely have to. Then you can decide which tradeoff you prefer, delaying the release or reducing its scope.

reply
This is the direction of my thinking, too.

Earlier discussion focuses on writing software at a slower pace to inject more accuracy and robust thinking/design/code. Conceptually, yes, I get it!

But in numerous practical scenarios, some adherence to a recurring schedule seems like the only way to align software to business outcomes. My thinking is tied more to enterprise products (both external and internal) rather than open-source.

I like an active dialog with engineers. (I'm neither SWE nor PM). Let's talk together about estimates. What's possible and not possible. Where do you feel most uncertain, most certain. What dependencies/externalities do you expect to cause problems.

Those conversations help me (business/analytics-side) do things like adjust my own deadlines, schedules. Communicate with c-suite to realign on what's possible and not. Adjust time.

reply
The main problem I’ve had is the unpredictability of where the complexity lies. Unless you’ve done exactly what you’re doing, before, with the same tools and requirements, there’s a good chance that some discrete trivial aspect could take up an incredible amount of time, and that won’t indicate whether the main goal will take more or less time. I’ve worked both as a developer and as a designer, and while some aspects of design can be really nebulous and uncertain compared to dev work, it lacks some of the unpredictability — it’s not like I’m going to unexpectedly have to re-make the logo.

I feel for anyone that has to wrangle these tasks into a business-consumable time frame.

reply
> The main problem I’ve had is the unpredictability of where the complexity lies. Unless you’ve done exactly what you’re doing, before, with the same tools and requirements, there’s a good chance that some discrete trivial aspect could take up an incredible amount of time, and that won’t indicate whether the main goal will take more or less time

What a great articulation. Completely agree.

This is why I don't blame PMs anymore than devs anymore than business folks throwing requirements at PMs. Possible to find fault everywhere.

I think the broader problem is scale and growth. Many people in many roles are caught in growth-mind or scale-mind companies where the business wants to operate at a velocity that may not align with the realistic development work we're discussing. PMs are similarly caught with less time to understand, scope, plan, etc. Business folks ask questions like "why isn't this ready" to devs that may not understand the reasons why the business operates the way it does, or the business at all.

Full disclosure: I'm in insurance. Seeing lots of these problems play out in front of me. C-suite moving at speed 100, devs moving at a perceived speed of 50. Silos and communication problems and unclear requirements up and down the stack.

So, in my interactions, the way I try to help is just to understand the most basic components and their ability to come alive or not. Is there anything to show? Yes, ok - let's celebrate a small win. Is there a rather large delay? Why - ok, let's use that to reinforce building something robust vs. crap.

But, there are schedules! Someone above mentioned sqlite. Another example comes to mind: Obsidian. I think they're anomalies (good ones) rather than examples that broadly prove the point to slow down.

reply
> Why does software need to be released on February 15th instead of March 7th?

Because it has to be released at some point, and without picking a point in advance, you can never reach it.

https://en.wikipedia.org/wiki/Parkinson%27s_law

reply
I disagree with that entirely. Some features just take longer to develop. If that feature is part of the release, then release it when it is finished and not kind of working. If that feature is just not achievable, then PMs have really screwed up their role by putting it in the release in the first place.
reply
You assume that PMs will just accept whatever estimate you give and not just say 2 weeks from the off and refuse to budge.
reply
So, could you say "ok, but I still can't do that"
reply
In this day and age of code-in-bulk enabled by AI, they will find someone who does in a blink of an eye.
reply
Hearty agree. I think the PMs fall victim to wildly optimistic imagination of how fast and easy it will be to correct from “good enough to not get yelled at by CEO for not shipping by X date” to “works correctly and isn’t creating more bugs” - and importantly, it seems like they repeat this mistake every project, compounding the problem. So we perpetually have an increasing number of hacks, interacting with each other to cause difficult issues, all of which the PM says we will fix next sprint, just as soon as we ship one more Important Feature.

Not all orgs of course. But most I’ve personally seen, seem to be like this.

reply
I think this discussion distracts a bit from the main point.

The main point is that there are super widespread software systems in use that we know aren't secure, and we certainly could do better if we (as the industry, as customers, as vendors) really wanted.

A prime example is VPN appliances ("VPN concentrators") to enable remote access to internal company networks. These are pretty much by definition Internet-facing, security-critical appliances. And yet, all such products from big vendors (be they Fortinet, Cisco, Juniper, you name it) had a flood of really embarrassing, high-severity CVEs in the last few years.

That's because most of these products are actually from the 80s or 90s, with some web GUIs slapped on, often dredged through multiple company acquisitions and renames. If you asked a competent software architect to come up with a structure and development process that are much less prone to security bugs, they'd suggest something very different, more expensive to build, but also much more secure.

It's really a matter of incentives. Just imagine a world where purchasing decisions were made to optimize for actual security. Imagine a world where software vendors were much more liable for damage incurred by security incidents. If both came together, we'd spend more money on up-front development / purchase, and less on incident remediation.

reply
But we need to be careful, such strict liability rewards larger companies that can afford such risk. Small companies and freelancers could be left out to dry.
reply
> Do we, really?

Yes. There’s a ton of lessons learned, best practices, etc. We’ve known for decades.

It’s just expensive and difficult. Since end-users seem to have no issue, paying for crud, why bother?

reply
> > We know how to write software with very few bugs

> Do we, really? Because a week doesn’t go by when I don’t run into bugs of some sort.

I mean, we do know how to do it, but we don't because business needs tend to throw quality under the bus in exchange for almost everything else: (especially) speed to develop, but also developer comfort, feature cram, visual refreshes, and so on always trump bugs, so every project ends up with bugs.

I have a few hobby projects which I would stick my neck out and say have no bugs. I know, I'm going to get roasted for this claim, but the projects are ultra simple enough in scope, and I'm under no pressure to ever release them publicly, so I was able to prioritize getting them right. No actual businesses are going to be doing this level of polish and care, and they all need to cut corners and actually ship, so they have bugs. And no ultra-complex project (even if it's done with love and care) is capable of this either, purely due to its size and number of moving parts.

So, it's not like we don't know how to do it, but that we choose not to for practical reasons.

reply
The simplest recipe for writing "almost bug-free" software is:

  1.  Freeze the set of features.
  2.  Continue to pay programmers to polish the software for several years while it is being actively used by many people.
  3.  Resist adding new features or updating the software to feel modern.
If you do that, your program will asymptomatically approach zero bug.

Of course, your users will complain about missing features, how ugly and ancient your products look, and how they wished you were more like your buggy competitors.

And if your users are unhappy, then you probably lose the "used heavily by a lot of people" part that reveals the bugs.

reply
There is no system without exploitable breaches, whether technical or social ones. The biggest point is, who have the incitives to exploit them, how much resources it costs to run a trial, how much resources do they control and are they ready to throw at attempts.
reply
deleted
reply
> Do we, really?

Formal verification to EAL7[0] in theory, as long as your requirements are correct.

In practice I'm not aware of any bugs being discovered in any EAL7 software, but it's so expensive there isn't a lot of it.

[0]https://en.wikipedia.org/wiki/Evaluation_Assurance_Level

reply
>>> often due to factors outside of their control.

That’s the beauty of OSS - the level we could write code is way less than the level the culture / timescale / management allows. I recently saw OSS as akin to (good) journalism for enterprise - asking why is this hidden part of society not doing the minimum (jails, corruption etc).

Free software does sooo much better compared to much in-house it is like sunlight

reply
We do.

The issue is almost always feature management.

Back in the days I was making Flash games, usually a 3-5 weeks job, with no real QA, and the project was live for 3-5 months. Every time I was ahead of schedule someone came with a brilliant idea to test few odd things and add couple new features that was not discussed prior. Sometimes literally hours before the launch.

Every time I was making the argument that adding one new feature will create two bugs. And almost always I was right about it.

Fast forward and I'm working for BigCo. Few gigs back I was working for a major bank which employed supper efficient and accountable workflow - every release has to be comprised of business specific commits, and commits that are not backed by explicit tickets are not permitted.

This resulted in team having to literally cheat and lie to smuggle refactors and optimizations.

Add to that that most enterprise projects start not because the requirements were gathered but because the budget was secured and you have a recipe for disaster.

reply