Updating plane properties such as to move the cursor plane around or disable it would by itself not block on render activities, as they are completely distinct blocks.
The render hardware could be powered down, but I doubt powering it up and compositing the cursor would take long enough to complete to cause any noticable lag.
Under the Linux APIs, updates to the display controller are done through KMS atomic commits, and one mistake you could do display-server side would be to provide a fence in this atomic commit that the scheduler will use to wait on long-running GPU work before using the provided graphics buffers. Under this API, none of the changes - including mouse movements - would then be applied until that fence is signalled. Changing plane associations can lead to resource reallocations that can be a bit heavy.
Not sure if the kernel driver in macOS works anything remotely similar to this, and the driver could also just be dumb and block on unrelated things ("let's just wait another vblank to see this apply....", "as we only need one plane now let's power down hardware and wait for that to settle..."). It could also just be windowserver that waits for work to finish on its own, not providing any cursor updates in the meantime.
The reality is that it will take reverse engineering or looking at actual code to know what's going on.
EDIT: Also note that there is nothing new with the Neo here, as all Macs since the M1 have used the same chip architecture as the iPhone.
Desktop GPU designs did not focus on tiny efficiency gains, and often only has a primary plane, a single overlay plane (for e.g., a video), and a dedicated cursor plane. Some even have to share a single overlay plane between all connected displays. It's a recent thing for desktop GPUs to get more flexible in this area, in part to improve laptop battery life in the cases where the laptop is almost entirely idle.
(For those unaware, a "plane" here is the entity in the display controller you configure to show a rendered graphics buffer, in a particular location and with particular transforms. You commonly have one plane that just covers the whole screen, and then sometimes put dynamic content on top in other planes so you can avoid having to redraw the main buffer when smaller bits of it change, like a video player or cursor. You could also e.g., scroll by rendering an entire document in advance and then move the plane around to reveal parts of it.)
I'm not sure they're all that tiny if you can squeeze out 70% of top end performance for 25% of the power draw :)
Long story short, performance was disappointing and we abandoned the approach. It's easy to believe it's a real problem especially when there are other factors including GPU being clocked down to save power.
Same caveat as parent, I have no direct knowledge of MacBook Neo or this specific issue.
Do modern machines still have custom hardware for cursors? That would surprise me, as a GPU can easily blit a small cursor on top of whatever gets drawn.
the cursor could just be another small rectangle texture you position on top of the other surfaces. there is no need to read the framebuffer/write into it, its just a z-stack of 3d surfaces now
The problem with rendering the cursor into the primary plane is that, often, only the cursor changes, and you'd have to re-render the whole plane that contains the cursor. That is easily doable for modern hardware, but bad for power consumption and may also be higher latency. (The latency aspect gets interesting when dragging something on the primary plane - I think most compositors temporarily disable the hardware cursor in order to keep cursor and dragged object in sync.)
AFAIK this hasn't been true for a long time on most platforms, certainly on macOS. The desktop image is composited on the GPU by assembling the underlying windows with appropriate effects like shadows and scrolling/scaling. A software cursor is just another overlay which may also have a transparent shadow.
Actually preserving what was under the cursor and putting it back is the sort of thing you wouldn't do anymore, because that's a cache which requires babysitting based on everything that's underneath and around it.
e.g. On macOS there's full screen zooming for accessibility, and if you wiggle the mouse, the cursor grows in size briefly (maybe even too big for hardware cursor to support).
If a hardware layer is not being used the cursor layer will be treated like any other layer in the compositor. Modern compositors don't try and save and write pixels like that. It will just rerender it.
>(which is normally done per mouse interrupt);
It's normally done every frame the compositor makes.
>or it may end up drawing over what software wants to draw
The compositor composites everything at that will be shown on the next refresh of the display. Things don't indepently step on each others toes since it's just the compositor rendering and synchronizing all hardware layers (planes).