undefined

points

[-]

MapReduce is nice but it doesn't, by itself, help you reason about pushdowns for one. Parquet, for example, can pushdown select/project/filter, and that's lost if you have MapReduce. And a reduce is just a shuffle + map, not very different from a distributed join. MapReduce as an escape hatch over what is fundamentally still relational algebra may be a good intuition.

by mememememememo9 hours ago|

prev|

[-]

Performance aside it seems you could do most maybe a the ops with those three. I say three because your sneaky plus is a union operation. So map, reduce and union.

But you are also allowing arbitrary code expressions. So it is less lego-like.

by bjourne6 hours ago|

prev|

[-]

Reductions are painful because they specify a sequence of ordered operations. Runtime is O(N), where N is the sequence length, regardless of amount of hardware. So you want to work at a higher level where you can exploit commutativity and independence of some (or even most) operations.

by toxik3 hours ago|

parent|

[-]

You can reduce in parallel. That was the whole point of MapReduce. For example, the sum abcdefgh can be found by first ab, cd, ef, gh; then those results (ab)(cd), (ef)(gh); then the final result by (abcd)(efgh). That's just three steps to compute seven sums.

by ux2664783 hours ago|

parent|

prev|

[-]

You're right it's primarily a runtime + compiler + language issue. I really don't understand why people tried to force functional programming in environments without decent algebraic reasoning mechanisms.

Modern graph reducers have inherent confluence and aren't reliant on explicit commutation. They can do everything parallel and out of order (until they have to talk to some extrinsic thing like getting input or spitting out output), including arbitrary side-effectual mutation. We really live in the future.

by heavenlyblue2 hours ago|

parent|

prev|

[-]

Reduce is massively parallel for commutative operations