upvote
CRDTs should be able to give you better merge and rebase behaviour. They essentially make rebase and merge commits the same thing - just different views on a commit, and potentially different ways to present the conflict. CRDTs also behave better when commits get merged multiple times in complex graphs - you don’t run into the problem of commits conflicting with themselves.

You should also be able to roll back a single commit or chain of commits in a crdt pretty easily. It’s the same as the undo problem in collaborative editors - you just apply the inverse of the operation right after the change. And this would work with conflicts - say commits X and Y+Z conflict, and you’re in a conflicting state, you could just roll back commit Y which is the problem, while keeping X and Z. And at no point do you need to resolve the conflict first.

All this requires good tooling. But in general, CRDTs can store a superset of the data stored by git. And as a result, they can do all the same things and some new tricks.

reply
In theory, maybe. In practice… last write wins (LWW) is a CFDT operator, so replace every mention of CRDT with LWW and issues will more obvious.

Really though, the problem with merges is not conflicts, it’s when the merged code is wrong but was correct on both sides before the merge. At least a conflict draws your attention.

When I had several large (smart but young) teams merging left and right this would come up and they never checked merged code.

Multiply by x100 for AI slop these days. And I see people merge away when the AI altered tests to suit the broken code.

reply
When you say "unit of work", unit of _which_ work are you referring to? The problem with rebasing is that it takes one set of snapshots and replays them on top of another set, so you end up with two "equivalent" units of work. In fact they're _the same_ indeed -- the tree objects are shared, except that if by "work" you mean changes, Git is going to tell you two different histories, obviously.

This is in contrast with [Pijul](https://pijul.org) where changes are patches and are commutative -- you can apply an entire set and the result is supposed to be equivalent regardless of the order the patches are applied in. Now _that_ is unit of work" I understand can be applied and undone in "isolation".

Everything else is messy, in my eyes, but perhaps it's orderly to other people. I mean it would be nice if a software system defined with code could be expressed with a set of independent patches where each patch is "atomic" and a feature or a fix etc, to the degree it is possible. With Git, that's a near-impossibility _in the graph_ -- sure you can cherry-pick or rebase a set of commits that belong to a feature (normally on a feature branch), but _why_?

reply
By "unit of work", I mean the atomic delta which can, on its own, become part of the deployable state of the software. The thing which has a Change-Id in Gerrit.

The delta is the important thing. Git is deficient in this respect; it doesn't model a delta. Git hashes identify the tip of a tree.

When you rebase, you ought to be rebasing the change, the unit of work, a thing with an identity separate and independent of where it is based from.

And this is something that the jujutsu / Gerrit model fixes.

reply
I used to use rebase much more than merge but have grown to be more nuanced over the years:

Merge commits from main into a feature branch are totally fine and easier to do than rebasing. After your feature branch is complete you can do one final main-to-feature-branch merge and then merge the feature branch into main with a squash commit.

When updating any branch from remote, I always do a pull rebase to avoid merge commits from a simple pull. This works well 99.99% of the time since what I have changed vs what the remote has changed is obvious to me.

When I work on a project with a dev branch I treat feature branches as coming off dev instead of main. In this case I merge dev into feature branches, then merge feature branches into dev via a squash commit, and then merge main into dev and dev into main as the final step. This way I have a few merge commits on dev and main but only when there is something like an emergency fix that happens on main.

The problem with always using a rebase is that you have to reconcile conflicts at every commit along the way instead of just the final result. That can be a lot more work for commits that will never actually be used to run the code and can in fact mess up your history. Think of it like this:

1. You create branch foo off main.

2. You make an emergency commit to main called X.

3. You create commits A, B, and C on foo to do your feature work. The feature is now complete.

4. You rebase foo off main and have to resolve the conflict introduced by X happening before A. Let’s say it conflicts with all three of your commits (A, B, and C).

5. You can now merge foo into main with it being a fast forward commit.

Notice that at no point will you want to run the codebase such that it has commits XA or XAB. You only want to run it as XABC. In fact you won’t even test if your code works in the state XA or XAB so there is little point in having those checkpoints. You care about three states: main before any of this happened since it was deployed like that, main + X since it was deployed like that, and main with XABC since you added a feature. git blame is really the only time you will ever possibly look at commits A and B individually and even then the utility of it is so limited it isn’t worth it.

The reality is that if you only want fast forward commits, chances are you are doing very little to go back and extract code out of old versions a of the codebase. You can tell this by asking yourself: “if I deleted all my git history from main and have just the current state + feature branches off it, will anything bad happen to my production system?” If not, you are not really doing most of what git can do (which is a good thing).

reply
I am now wholly bought into the idea of having a feature branch with (A->B->C) commits is an anti-pattern.

Instead, if the feature doesn't work without the full chain of A+B+C, either the code introduced in A+B is orphaned except by tests and C joins it in; or (and preferably for a feature of any significance), A introduces a feature flag which disables it, and a subsequent commit D removes the feature flag, after it is turned on at a time separate to merge and deploy.

reply
I treat each feature branch as my own personal playground. There should be zero reason for anyone to ever look at it. Sometimes they aren’t even pushed upstream. Otherwise, just work on main with linear history and feature flags and avoid all this complexity that way.

Just like you don’t expect someone else’s local codebase to always be in a fully working state since they are actively working on it, why do you expect their working branch to be in a working state?

reply
I think you're somewhat missing the point - if the code from A and B only works if joined with C, then you should squash them all into one commit so that they can't be separated. If you do that then the problem you're describing goes away since you'll only be rebasing a single commit anyway.

Whether this is valuable is up to you, but IMO I'd say it's better practice than not. People do dumb things with the history and it's harder to do dumb things if the commits are self-contained. Additionally if a feature branch includes multiple commits + merges I'd much rather they squash that into a single commit (or a couple logical commits) instead of keeping what's likely a mess of a history anyway.

reply
That is literally what I advocate you do for the main branch. A feature branch is allowed to have WIP commits that make sense for the developer working on the branch just like uncommitted code might not be self contained because it is WIP. Once the feature is complete, squash it into one commit and merge it into main. There is very little value to those WIP commits (rare case being when you implement algorithm X but then change to Y and later want to experiment with X again).
reply
One downside of squash merging is that when you need to split your work across branches, so that they're different PRs, but one depends on the other, then you have to do a rebase after every single one which had dependencies is merged.
reply
People see that CRDTs have no conflicts and proclaim them as the solution to all problems, not seeing that some problems inherently have conflicts and either can't be represented by CRDTs at all, or that the use of CRDTs resolves conflicts in a way that's worse than if you actually thought about conflict resolution. E.g. that multiplayer text editor that interleaved characters from simultaneous edits.
reply