Thats not really necessary, imagine somewhere near the top of the binary tree a leaf ("1" or "x" or ...), with the current brute force method thats a whole binary subtree with parameters going unused.
One could just as well use that whole culled binary subtree as a DAG node.
It does require more complex routing, but selecting input nodes is a sparse task, so those routing parameters can use sparse virtual parameters, say inner products of dense vectors in some vector space, so it doesn't need to take up much memory...