Or you could accept that there's already a large collection of known useful special functions, and work with shallower trees of those instead, e.g. https://arxiv.org/abs/1905.11481
Thats not really necessary, imagine somewhere near the top of the binary tree a leaf ("1" or "x" or ...), with the current brute force method thats a whole binary subtree with parameters going unused.
One could just as well use that whole culled binary subtree as a DAG node.
It does require more complex routing, but selecting input nodes is a sparse task, so those routing parameters can use sparse virtual parameters, say inner products of dense vectors in some vector space, so it doesn't need to take up much memory...