The 32-bit ARM architecture included a barrel shifter as part of its basic design, as in every instruction had a shift field.
If a CPU built in 1985 with a grand total of 26 000 transistors could afford it, I am pretty sure that anything built in this century could afford it too.
IIUC this is a lot less true in the modern era. Even with 24nm transistors (the cheapest transistor last time I checked), modern microcontrollers have a fairly big transistor budget for the core (since 80+% of the transistors are going to sram anyway).
You can save a lot of silicon by doing 8 or 16 bit shifters and then doing the rest at the code generation level. Not having any seems really anemic to me.