Higher end stuff will use a ton of inputs (visual odometry, binocular vision, lidar, range finding, etc) fused into some kind of proprietary blended algorithm that you could probably call an MPC.
RL is pretty cutting edge, especially for fast path motor control; there are a lot of university competitions for drone control that lead to a lot of papers and projects in the space (some promising) but most commercial stuff has not adopted this yet, certainly not at the low end.
On top of this (Maybe at a few hundred hz), you can add outer controls to set attitude. This could be an autopilot, or having the controls command attitude instead of rate. Betaflight pilots usually don't both with this, and have the simple setup of control maps to rate.
I've programmed firmware using a weird hybrid where the controls command a change in the target attitude. So it flies like rate, but has the forced attitude stability of an attitude-based control system. Non-standard, but makes it so you don't need to worry as much about tuning the PID loop. In practice, you can do full aerobatic flight with this like you'd do with a rate-only setup. (Basically, there is a commanded attitude quaternion; controls nudge it; the PIDs update motor power to maintain this commanded quaternion.)
And frankly as a pilot, I'd rather not see any completely autonomous drones with no oversight in the sky - that's one incident away in which blame cannot be put solely on the operator from getting the hobby completely banned.
The delta between what is possible with current autonomous flight missions and manual FPV style flight is by having a brain on board that can dynamically adapt to a changing environment. There are a finite amount of PID profiles for each steadystate solution that a researcher can preprepare for. But RL allows an overarching heuristic to transiently alter the PIDs depending on the changing environment.
We use PIDs because analyzing robotics platforms as seeking a steadystate dramatically simplifies the math needed to where its computationally possible for us to solve for a situation.
We use RL in systems that have continuously changing environments with transient solution spaces that are easier to model in hyperspace with a RL model.
Take for example platforms that have tiltrotors. They ideally have a minimum of 3 PID profiles for flying. One when it best fits a multirotor profile. A second when it is transitioning from multirotor to fixed wing flight, and a third for when fixed wing flight is established. What happens when the researcher has a need to fly in the transition state, or subconfigurations of the states? How many PID profiles are you looking to think of and train for? This is where RL has dividends.