None of these modern editors (Wordgard, ProseMirror, Lexical, Slate) use contenteditable for the document model. Rather, they have their own internal document model and use contenteditable as a kind of input layer where the editor monitors what the browser does, then translates that into actual edits.
Early editors like FCKEditor and TinyMCE were only wrappers around contenteditable. They used the DOM as the real document model, then intercepted certain keypresses and events and "fixed" the behavior when it wasn't correct (e.g. double enter inside a bullet list should switch to paragraph mode).
The result was rife with bugs and inconsistencies, and didn't allow for a proper split between the model and the view (e.g. to represent columns, video embeds, and so on).
"just contenteditable" is really understating things. contenteditable is a god awful API full of bugs and inconsistencies. Making something reliable on top of it is a very significant amount of work.
For context: my day job for last 2 years has been building a new browser engine from scratch, and I think the contenteditable-based WYSIWYG editor I wrote 15 years ago might have been a harder project (albeit I was lot of less experienced then).