That, however, performed quite poorly at compile-time, and was not really ODR-safe (forceinline was used as a workaround). At least one of the forks moved to using a dedicated meta-language and a custom compiler to generate the code instead. There are better ways to do that in modern C++ now.
We also focused on higher-level constructs trying to capture the intent rather than trying to abstract away too low-level features; some of the features were explicitly provided as kernels or algorithms instead of plain vector operations.