upvote
You might like having a go at Lush. It has fallen out of favor of late but is a very interesting language/system.

https://scottlocklin.wordpress.com/2024/11/19/lush-my-favori...

reply
Sounds interesting but I'm using very spare very high rank tensors, e.g. rank 3 neuron equivalents.

As such pretty much all numerical optimisations are useless for my work. Racket however chugs along happily, if slowly.

reply
That sounds kind of amazing. But you're not actually doing the machine learning in Racket, are you? Is your Racket code generating other code like PyTorch?
reply
I'm doing the learning in racket because the bottleneck is human understanding.

That mnist takes 30 minutes per epoch isn't a worry when I don't even know what vector addition should look like.

reply
This is a complete tangent, but since you mentioned MNIST: I accidentally discovered Tsetlin machines this week when someone on r/Julia asked if anyone with an AMD GPU could run the benchmark in their package called Tsetlin.jl. I've got an AMD GPU so I was happy to oblige. Then I looked at what the benchmark was doing: it was training an MNIST classifier to 98% accuracy in 9 seconds - that seemed like a couple of orders of magnitude too fast. I was flabbergasted and wondered what the heck this thing was and that's when I learned about Tsetlin machines. I went on (with the help of Claude) to implement one in an FPGA and again was flabbergasted when it only took 2k LUTs to implement a Tsetlin machine for MNIST classification in hardware.
reply
Well yes, you have to use one of the newer mnist variants these days if you want to get anything meaningful. A linear classifier gets something like 87% on the original one.
reply
> I don't even know what vector addition should look like.

I think you're trying to imply you're inventing something new and racket enables you to explore... But what I read (as someone with a PhD in deep learning that has worked on sparsity) is you actually don't know the prior art and you're using racket as an excuse to reinvent a whole bunch of stuff that already exists in plenty of mature libraries in more mundane languages (including python/pytorch). Which is of course fine for personal growth but please don't oversell racket as a "superpower" - to wit I can manipulate any part of my stack too because it's all written in cpp.

reply
I once replaced IEEE 754 floating point numbers in a model by balanced ternary floating point numbers.

It took me 20 minutes.

Tell me how you'd do that in cpp?

reply
lol the same way we implement all of the reduced precision fp8, fp4 types today: by storing them in the corresponding uint:

https://github.com/ggml-org/llama.cpp/discussions/15095

reply
Balanced ternary fp is not a reduced precision type of binary fp: https://arxiv.org/abs/2512.10964
reply
deleted
reply