upvote
yes, and even this search doesn't actually require trillions of parameters, since the switching parameters will be sparse, which means you can apply a FakeParameter trick: suppose I want a trillion sparse parameters, thats a million by a million. Let's just model those parameters as inner products of a million vectors each of some dimension N. Now its in the regime of megabytes or a GB.

For extreme regularization, one can even go down to 10 arbitrary precision numbers: if we have a single vector of 10 dimensions, we can re-order the components 10! different ways.

10! = 3 628 800

so we can retrieve ~3M vectors from it, and we can form about 10 T inner products.

reply
Ah I see I misunderstood your point, thanks for clarifying.

I think you are right, each node in some other expression tree formed of primitives in some set Prim would be increased by at most max(nodes(expr)) for expr in Prim.

That's essentially what the EML compiler is doing from what I understand.

reply