undefined

points

[-]

Integer multiplication x * y can be trivially done in O(k): k = log₂(min(x, y)). This is because we can do addition in constant time, adding all bits in parallel.

By combining many more adding units, we can do (fixed-size) multiplication in constant time, too: https://en.wikipedia.org/wiki/Dadda_multiplier

by 4 hours ago|

prev|

[-]

deleted

by actionfromafar8 hours ago|

prev|

[-]

Doesn't that have to do with how many bits you allow in the actual calculation in physical reality?

by hellohello27 hours ago|

parent|

[-]

Well, for multiplication complexity is defined in terms of on the number of digits/bits digits directly. For attention, complexity is defined on terms of the number of input vectors which are all at fixed precision. I don't understand what happens to the method proposed in the paper at higher precision (since I don't understand the paper), but in reality in doesn't matter since there is no value in anything over float16 for machine learning.

by logicchains4 hours ago|

prev|

[-]

Multiplication has some properties like being cumulative. If we assume the sequence has any specific properties then we no longer have a general sequence model.

by direwolf202 hours ago|

parent|

[-]

I think you meant commutative.

Attention also has some specific properties.

And sometimes results are just unexpected. Did you know that anything a Turing machine can do in t tome steps, a different Turing machine can do in O(sqrt(t log t)) memory cells? https://news.ycombinator.com/item?id=44055347