I don't doubt that it's a very safe system with enough slack allowing for intentional redundancy. But as it is, some of these controllers seem to be limited by their ability to pronounce instructions, leaving absolutely no margin for error and presumably very little room for conscious thought.
Almost all voice transmissions are routine instructions/clearances from ground to air, with the pilots reading them back to reduce the chance of errors. In fact, this already exists and is in wide use in (at least) the US, EU, and in transoceanic airspace.
Of course, now you have two systems that can fail, and reducing reliance on the older one can easily cause automation complacency (which is a well-researched source of errors) and require more frequent refresher courses if the skill is not practiced on a continuos basis.
I suspect that that these are the reasons it's not commonly used for approach and tower operations: There's a lot more spontaneous and/or nonstandard stuff happening in those flight phases, and as you say you don't want a pilot's eyes on a tiny screen/keyboard instead of on their instruments or out the window.