Also I think I did not optimize for memory usage, and my implementation might keep copies of subsets of data points for each branch. I was mostly focused on the algorithm, not that much on data representation.
Another point, that is not really efficiency related, is that data frames come with lots of functionality to handle non-numeric data. If I recall correctly, they have functionality like doing one-hot encoding and such things. My implementation simply assumes all you have is numbers.
There might also be efficiency left on the table in my implementation, because I use the native number types of Guile, which allow for arbitrarily large integers (which one might not need in many cases) and I might even have used fractions, instead of inexact floats.
I guess though, with good, suitable data structures and a bit of reworking the implementation, one could get a production ready thing out of my naive implementation, that is even trivially parallelized and still would have the linear speedup (within some bounds only, probably, because decision trees usually shouldn't be too deep, to avoid overfitting) that my purely functional implementation enables.
Thanks for the links!
Not so convinced about decision trees though (that process one row at a time).
Yeah, unless you had to deal with arbitrarily large integer features, Guile integers would come with a big efficiency hit.
I think one could parallelize processing rows, at the very least when classifying from learned model. Probably also during learning the model.
What I had not articulated well is that linear classifiers have the opportunity to use matvecs that have a different level of L1 L2 cacheable goodness and non-branchy code. There using proper memory layout gives an outstanding win. The win for decision trees are less impressive in comparison, so you needn't be feeling bad about your code.