undefined

upvote

points

by DoctorOetker2 days ago |

upvote

by ikrima2 days ago|

[-]

From my experience of working in this problem domain for the last year, I'd say it is pretty powerful but the "too good to be true part" comes from that EML buys elegance through exponential expression blow-up. Multiplication alone requires depth-8 trees with 41+ leaves i.e. minimal operator vocabulary trades off against expression length. There's likely an information-theoretic sweet spot between these extremes.

It's interesting to see his EML approach whereas mine was more on generating a context sensitive homoiconic grammar.

I've had lots of success combining spectral neural nets (GNNs, FNOs, Neural Tangent Kernels) with symbolic regression and using Operad Theory and Category Theory as my guiding mathematical machinery

reply

upvote

by DoctorOetker2 days ago|

[-]

In my experience this exponential expression blow-up is less the result of the approach of decomposing into a minimum of primitives, but rather a result from repetition in expression trees:

If we make the analogy from Bertrand Russel's Principia Mathematica, he derived fully expanded expressions, i.e. trees where the leaves only may refer to the same literals, everyone claimed this madness underscored how formal verification of natural mathematics was a fools errand, but nevertheless we see successful projects like metamath (us.metamath.org) where this exponential blow-up does not occur. It is easy to see why: instead of representing proofs as full trees, the proofs are represented as DAG's. The same optimization would be required for EML to prevent exponential blow-up.

Put differently: if we allow extra buttons besides {1, EML} for example to capture unary functions the authors mentally add an 'x' button so now the RPN calculator has {1, EML, x}; but wait if you want multivariate functions it becomes an RPN calculator with extra buttons {1, EML, x,y,z} for example.

But why stop there? in metamath proofs are compressed: if an expression or wff was proven before in the same proof, it first subproof is given a number, and any subsequent invocations of this N'th subproof refers to this number. Why only recall input parameters x,y,z but not recall earlier computed values/functions?

In fact every proof in metamath set.mm that uses this DAG compressibility, could be split into the main proof and the repeatedly used substatements could be automatically converted to explicitly separate lemma proofs, in which case metamath could dispose of the single-proof DAG compression (but it would force proofs to split up into lemma's + main proof.

None of the proven theorems in metamath's set.mm displays the feared exponential blowup.

reply

upvote

by yorwba2 days ago|

[-]

Yes, metamath uses a large collection of specialized but reusable building blocks, so it doesn't blow up exponentially. However, if you want to "just do gradient descent" on general trees built from a single universal primitive, you now have to rediscover all those building blocks on the fly. And while the final result may have a compact representation as a DAG by merging common subexpressions, you also need to be able to represent potential alternative solutions, and that's where the exponential blowup comes in.

Or you could accept that there's already a large collection of known useful special functions, and work with shallower trees of those instead, e.g. https://arxiv.org/abs/1905.11481

reply

upvote

by DoctorOetker19 hours ago|

[-]

> And while the final result may have a compact representation as a DAG by merging common subexpressions, you also need to be able to represent potential alternative solutions, and that's where the exponential blowup comes in.

Thats not really necessary, imagine somewhere near the top of the binary tree a leaf ("1" or "x" or ...), with the current brute force method thats a whole binary subtree with parameters going unused.

One could just as well use that whole culled binary subtree as a DAG node.

It does require more complex routing, but selecting input nodes is a sparse task, so those routing parameters can use sparse virtual parameters, say inner products of dense vectors in some vector space, so it doesn't need to take up much memory...

reply

upvote

by gopalv2 days ago|

[-]

> Multiplication alone requires depth-8 trees with 41+ leaves i.e. minimal operator vocabulary trades off against expression length.

That is sort of comparable to how NAND simplify scaling.

Division is hell on gates.

The single component was the reason scaling went like it did.

There was only one gate structure which had to improve to make chips smaller - if a chip used 3 different kinds, then the scaling would've required more than one parallel innovation to go (sort of like how LED lighting had to wait for blue).

If you need two or more components, then you have to keep switching tools instead of hammer, hammer, hammer.

reply

upvote

by tripletao2 days ago|

[-]

I'm not sure what you mean by this? It's true that any Boolean operation can be expressed in terms of two-input NAND gates, but that's almost never how real IC designers work. A typical standard cell library has lots of primitives, including all common gates and up to entire flip-flops and RAMs, each individually optimized at a transistor level. Realization with NAND2 and nothing else would be possible, but much less efficient.

Efficient numerical libraries likewise contain lots of redundancy. For example, sqrt(x) is mathematically equivalent to pow(x, 0.5), but sqrt(x) is still typically provided separately and faster. Anyone who thinks that eml() function is supposed to lead directly to more efficient computation has missed the point of this (interesting) work.

reply

upvote

by juped2 days ago|

[-]

Yeah, what you're going to get is more efficient proofs: you can do induction on one case to get results about elementary functions. Not sure where anyone's getting computational efficiency thoughts from this.

reply

upvote

by meindnoch2 days ago|

[-]

Are you under the impression that CPUs are made exclusively from NAND gates? You can't be serious.

reply

upvote

by peyton2 days ago|

[-]

Might’ve gotten mixed up with CMOS dominance, or I’m ignorant.

reply

upvote

by gugagore2 days ago|

[-]

https://en.wikipedia.org/wiki/Mead%E2%80%93Conway_VLSI_chip_...

I'm guessing is what they're really talking about. Which is not about NAND gates.

reply

upvote

by marcosdumay2 days ago|

[-]

Just to add a bit, but modern digital circuits are almost exclusively MOS, but even the "complementary" bit isn't universal in a large IC.

reply

upvote

by giik2 days ago|

[-]

I believe you're not ignorant. But many folks probably lack the process knowledge (CMOS) required to understand why :-)

reply

upvote

by canjobear2 days ago|

[-]

Link to your work?

reply

upvote

by danbruc2 days ago|

[-]

Where do you see exponential blow-up? If you replace every function in an expression tree with a tree of eml functions, that is a size increase by a constant factor. And the factor does not seem unreasonable but in the range 10 to 100.

reply

upvote

by wasabi9910112 days ago|

[-]

The exponential blowup is in the symbolic regression section, where to search among depth K trees requires 2^K parameters.

As an example, searching for sqrt(x) would require a tree of depth ~40 which is in the trillion-parameter regime.

reply

upvote

by danbruc2 days ago|

[-]

But that is not an increase in the expression size, that is the effort for searching for an expression tree that fits some target function. And that is no different from searching an expression based on common functions, that is of course also exponential in the expression tree height. The difference is that a eml-based tree will have a larger height - by some constant factor - than a tree based on common functions. On the other hand each vertex in an eml-tree can only be eml, one, or an input variable whereas in a tree based on common functions each vertex can be any of the supported basis functions - counting constants and inputs variables as nullary functions.

reply

upvote

by DoctorOetker1 days ago|

[-]

yes, and even this search doesn't actually require trillions of parameters, since the switching parameters will be sparse, which means you can apply a FakeParameter trick: suppose I want a trillion sparse parameters, thats a million by a million. Let's just model those parameters as inner products of a million vectors each of some dimension N. Now its in the regime of megabytes or a GB.

For extreme regularization, one can even go down to 10 arbitrary precision numbers: if we have a single vector of 10 dimensions, we can re-order the components 10! different ways.

10! = 3 628 800

so we can retrieve ~3M vectors from it, and we can form about 10 T inner products.

reply

upvote

by wasabi9910112 days ago|

[-]

Ah I see I misunderstood your point, thanks for clarifying.

I think you are right, each node in some other expression tree formed of primitives in some set Prim would be increased by at most max(nodes(expr)) for expr in Prim.

That's essentially what the EML compiler is doing from what I understand.

reply

upvote

by nazgulnarsil2 days ago|

[-]

Yeah, seems like classic representation vs traversal complexity trade-off a la David Marr.

reply

upvote

by irchans2 days ago|

[-]

ikrima, Do you have any links to your research or paper titles?

reply

upvote

by Folkert2 days ago|

[-]

Its not too good to be trough...

Its a way to make mathematical formulas completely unreadable. Its a way to spend more time on computing functions like log (3 ems reqd) while using more precision. Its a way to blow the mind of muggles reading hacker news.

reply

upvote

by gilgoomesh2 days ago|

[-]

> Why use splines or polynomials or haphazardly chosen basis functions if you can just fit (gradient descent) your data or wave functions to the proper computational EML tree?

Same reason all boolean logic isn't performed with combinations of NAND – it's computationally inefficient. Polynomials are (for their expressivity) very quick to compute.

reply

upvote

by brucehoult2 days ago|

[-]

The Cray 1 was built 100% from NOR gates and SRAM.

reply

upvote

by IAmBroom2 days ago|

[-]

50 years ago.

reply

upvote

by brucehoult1 days ago|

[-]

Correct. Your point being? Digital logic didn't change.

reply

upvote

by Nevermark2 days ago|

[-]

They are done with transistors though. Transistors form an efficient, single element, universal digital basis.

And are a much less arbitrary choice than NAND, vs. NOR, XOR, etc.

Using transistors as conceptual digital logic primitives, where power dissipation isn't a thing, Pass Logic is "The Way".

reply

upvote

by samus2 days ago|

[-]

Single transistors aren't yet logic gates by themselves; they are amplifiers with a very specific gain function that makes it possible to use them as switches. Logic gates usually consist of at least two transistors. See https://en.wikipedia.org/wiki/CMOS for an example of how it is done in CMOS technology.

reply

upvote

by adrian_b2 days ago|

[-]

That is correct.

Moreover, the amplifying function must exist at least in some gates, to restore the logic levels, but there is no need for it to exist in all gates.

For instance, any logic circuit can be implemented using AND gates and/or OR gates made with diodes, which have no gain, i.e. no amplifying function, together with 1-transistor inverters, which provide both the NOT function and the gain needed to restore the logic levels.

The logic functions such as NAND can be expressed in several way using simpler components, which correspond directly with the possible hardware implementations.

Nowadays, the most frequent method of logic implementation is by using parallel and series connections of switches, for which the MOS transistors are well suited.

Another way to express the logic functions is by using the minimum and maximum functions, which correspond directly with diode-based circuits.

All the logic functions can also be expressed using the 2 operations of the binary finite field, addition (a.k.a. XOR) and multiplication (a.k.a. AND).

This does not lead to simpler hardware implementations, but it can simplify some theoretical work, by using algebraic results. Actually this kind of expressing logic is the one that should have been properly named as Boolean logic, as the contribution of George Boole has been precisely replacing "false" and "true" with "0" and "1" and reinterpreting the classic logic operations as arithmetic operations. It is very weird to see in some programming languages a data type named Boolean, whose possible values are "false" and "true", instead of the historically correct Boolean values, "0" and "1", which can also be much more useful in programming than "false" and "true", by simplifying expressions, especially in array operations (which is why array-oriented languages like APL use "0" and "1", not "false" and "true").

reply

upvote

by Nevermark2 days ago|

[-]

"Pass transistors" are transistors being operated in pass/impedance switch mode.

Pass logic. Digital. [0]

This is extremely basic digital circuit design. You can create digital circuits as compositions of gates. But you can often implement the same logic, with fewer transistors, using pass logic.

Pass logic is also great for asynchronous digital design.

[0] https://en.wikipedia.org/wiki/Pass_transistor_logic

reply

upvote

by throw-qqqqq2 days ago|

[-]

> Transistors form an efficient, single element, universal digital basis

But transistors can be N or P-channel, so it’s not a single logical primitive, like e.g. NAND-gates.

reply

upvote

by Nevermark2 days ago|

[-]

True.

NAND also use both N and P-channel transistors, and have to use at least the same number, but frequently many more transistors for anything non-trivial.

But as you point out, as a computational basis, pass transistors have two variants.

reply

upvote

by siddboots2 days ago|

[-]

While I'm really enjoying this paper, I think you are way overstating the significance here. This is mathematically interesting, and conceptually elegant, but there is nothing in this paper that suggests a competitive regression or optimisation approach.

I might have misunderstood, but from the two "Why do X when you can do just Y with EML" sentences, I think you are describing symbolic regression, which has been around for quite some time and is a serious grown-up technique these days. But even the best symbolic regression tools do not typically "replace" other regression approaches.

reply

upvote

by canjobear2 days ago|

[-]

> Why use splines or polynomials or haphazardly chosen basis functions if you can just fit (gradient descent) your data or wave functions to the proper computational EML tree?

Because the EML basis makes simple functions (like +) hard to express.

Not to diminish this very cool discovery!

reply

upvote

by DoctorOetker2 days ago|

[-]

consider how a wavefunction might be stored for numerically searching the ground state of a molecule, or a crystal lattice, humans appreciate 2D imagery that is about 1 kilopixel x 1 kilopixel; so expanding this to 3 dimensions that means the wavefunction would have to store 10 ^ 9 complex numbers (easily 4 GB at 16 bit precision for real and imaginary compnents so 4 bytes per complex number), do we really believe that a DAG variant of the EML construction would consume a bigger value to represent the analytically correct solution? do we really believe that the 4GB DAG variant of EML would produce a less accurate representation (i.e. less fidelity with the schroedinger equation?) If the ground state hydrogen atom is any indication, my money is on EML-style constructions, not naive 3D arrays modelling the wavefunction by brute force.

This also re-opens a lot of "party pooper" results in mathematics: impossibility of representing solutions to general quintic (fine print: if we restrict ourselves to arithmetic and roots/radicals). In mathematics and physics there have been a lot of "party pooper" results which later found more profound and interesting positive results by properly rephrasing the question. A negative result for a myopic question isn't very informative on its own.

reply

upvote

by defmacr02 days ago|

[-]

> This also re-opens a lot of "party pooper" results in mathematics: impossibility of representing solutions to general quintic (fine print: if we restrict ourselves to arithmetic and roots/radicals).

Solving the quintic in terms of transcendental functions has already been done

https://en.wikipedia.org/wiki/Bring_radical#The_Hermite%E2%8...

reply

upvote

by eru2 days ago|

[-]

> I'm still reading this, but if this checks out, this is one of the most significant discoveries in years.

It seems like a neat parlour trick, indeed. But significant discovery?

reply

upvote

by jmyeet2 days ago|

[-]

This isn't all that significant to anyone who has done Calculus 2 and knows about Taylor's Series.

All this really says is that the Taylor's expansions of e^x and ln x are sufficient to express to express trig functions, which is trivially true from Euler's formula as long as you're in the complex domain.

Arithmetic operations follow from the fact that e^x and ln x are inverses, in particular that e^ln(x) = x.

Taylor's series seem a bit like magic when you first see them but then you get to Real Analysis and find out there are whole classes of functions that they can't express.

This paper is interesting but it's not revolutionary.

reply

upvote

by DoctorOetker1 days ago|

[-]

There is a huge number of people who understand Taylor series, know how to compute them, and the things you can do with Taylor (and other kinds of) expansions. Yet none of us identified that this binary operation spans that lot of them, but I'm willing to read references to predating observations of the same kind. The author does mention alternative systems (incomplete in some specific sense) in the paper.

I did however keep thinking there was a lot of attention to trying to include special constants even though we don't know that much about these constants yet, while comparatively little attention went to say elliptic integrals etc.

When aiming for a wide catch, you'd include some of those esoteric functions, or erf() etc...

I also wished they had attempted to find a representation for derivative and integrals.

reply

upvote

by PaulHoule2 days ago|

[-]

I can't say I'm surprised at this result at all, in fact I'm surprised something like this wasn't already known.

reply

upvote

by SideQuark1 days ago|

[-]

The compute, energy, and physical cost of this versus a simple x+y is easily an order of magnitude. It will not replace anything in computing, except maybe fringe experiments.

reply

upvote

by cryptonector2 days ago|

[-]

Given this amazing work, an efficient EML operator HW implementation could revolutionize a bunch of things. So the next thing might be an efficient EML HW implementation.

reply

upvote

by ted_dunning1 days ago|

[-]

You should mark sarcasm as subtle as this.

reply

upvote

[-]

deleted

reply

upvote

by dang2 days ago|

[-]

What URL should we change it to?

reply

upvote

by lioeters2 days ago|

[-]

The URL is already pointing at v2, which apparently is the newest one requested by the comment above.

> Submitted on 23 Mar 2026 (v1), last revised 4 Apr 2026 (this version, v2)

reply

upvote

by dang2 days ago|

[-]

Thanks! that's what it looked like to me too.

reply

upvote

by DoctorOetker2 days ago|

[-]

I agree: I don't understand what happened, but the first "View PDF" resulted in a PDF where the hyperlinks to the figures didn't work. Upon closer inspection it wasn't v1 at all, thats a PNAS article. I am unable to remove the EDIT:... line in my original comment at this time...

reply

upvote

by dang1 days ago|

[-]

No worries! We definitely want the best link so such comments are helpful.

reply

upvote

by fc417fc8022 days ago|

[-]

IMO arxiv links should pretty much always be to the abstract (ie .../abs/...) as opposed to particular versions of the html or pdf.

reply