upvote
Yes, metamath uses a large collection of specialized but reusable building blocks, so it doesn't blow up exponentially. However, if you want to "just do gradient descent" on general trees built from a single universal primitive, you now have to rediscover all those building blocks on the fly. And while the final result may have a compact representation as a DAG by merging common subexpressions, you also need to be able to represent potential alternative solutions, and that's where the exponential blowup comes in.

Or you could accept that there's already a large collection of known useful special functions, and work with shallower trees of those instead, e.g. https://arxiv.org/abs/1905.11481

reply
> And while the final result may have a compact representation as a DAG by merging common subexpressions, you also need to be able to represent potential alternative solutions, and that's where the exponential blowup comes in.

Thats not really necessary, imagine somewhere near the top of the binary tree a leaf ("1" or "x" or ...), with the current brute force method thats a whole binary subtree with parameters going unused.

One could just as well use that whole culled binary subtree as a DAG node.

It does require more complex routing, but selecting input nodes is a sparse task, so those routing parameters can use sparse virtual parameters, say inner products of dense vectors in some vector space, so it doesn't need to take up much memory...

reply