One of the ways to achieve this is to not actually transition between states, but simply animate the "end bounce" of an introduced element, as if it was eased into position. So not actually slid from the left, for example, but rebounding the last few pixels from an imaginary slide. Our eyes just draw their conclusions to inform us of a movement, and in exchange the component is readable and usable immediately.
[1] ~100ms represents optimal reflex time in recent research. [2] Anything that requires user attention to interact after the component appears is very comfortable with a 150ms transition. One important note is that for components you can navigate across (i.e. one key shortcut invokes a modal state, another key runs a command in that modal), experienced users will "type" consecutive shortcuts in one go, and you must have the second behaviour responsive from frame 1.
[2] Some athletes seem to train down to ~80ms on very specific reflexes, which recently lead to race-start controversies when block timers disqualify sub-100ms reactions for runners.
Consider a toolbar with a mix of enabled and disabled buttons. Hover effects (which I would consider animations) convey that something is clickable, and on-click effects confirm an action. These effects convey meaningful information to both beginner users and power users of any software, and are in no way inconvenient to either group.
I generally agree animations tend to get in the way when you want to get shit done, but the idea that animations are only applicable as artistic effects rings untrue to me.