undefined

points

[-]

Yea, I get that the actual comparison instruction itself is insignificant. It's everything that goes along with it. Seems like quaternary is fetching more data, however.

For instance, if you have 8 elements, 01234567, and you're looking for 1, with binary, you'd fetch 4, 2, and then 1. With quaternary, you'd fetch 2, 4, 6, then 1. Obviously, if you only have 8 elements, you'd just delegate to the SIMD instruction, but if this was a much larger array, you'd be doing more work.

I guess on a modern processor, eliminating the data dependency is worth it because the processor's branch prediction and speculation only follows effectively a single path.

Would be interesting to see this at a machine cycle level on a real processor to understand exactly what is happening.

by LoganDark20 hours ago|

parent|

[-]

It's not about doing more or less work; it's about doing the work faster. For instance, it's relatively common to discover that some recomputation can be faster than caching or lookup tables. Similarly, fetching more from memory also can be faster if it means you make less roundtrips.

by crdrost19 hours ago|

parent|

[-]

Well that's where I thought this link was going to go before it went down the simd path... We have a way to beat binary search, it is called b-trees, it has the same basic insight that you can easily take 64 elements from your data set evenly spaced, compare against all of those rapidly, and instead of bifurcating your search space once, you do the same as six times, but because you store the 64 elements in an array in memory, they only take one array fetch and you get cache locality... But as you have more elements, you need to repeat this lookup table like three or four or five times, so it costs a bit of extra space, so what if we make it not cost space by just storing the data in these lookup tables...

by LoganDark5 hours ago|

parent|

[-]

A B-tree is not a search algorithm though, it is a data structure. While it would nice to be able to somehow instantly materialize a B-tree from a linear array, CPUs aren't quite there yet. It would also be nice not to have to deal with linear arrays where B-trees would be better fit in the first place, but we are not quite there yet either.