Imagine you have a to-do list of 100 items. What happens when a remote user drags and drops item 100 to lie between items 1 and 2? In retained mode, that's clearly communicated to the UI toolkit; the widget representing that todo is told to change its position in the list. In immediate mode, you can't just destroy and re-create the a11y tree on each render. You need some kind of tree diffing algorithm to figure out that it's one op (move item) instead of ~200 ops (change the name and checkbox state for all items between 2 and 100).
You don't need to do it each render, just when something structural changes, right?
You don't lose anything.
This is but my opinion, but toolkits tend to be opinionated piles of **. Talking directly with the GPU to paint stuff is often just saner to me.