Now, there are a bunch of interesting things about this project. Seeing the example of a tiny transformer running on FPGA is informative, and that it was apparently a pretty quick project for one person + robot assistance. Probably some transferable lessons for anyone else doing robo-FPGA development.
but anyone who can fit QWEN-3.6 35B with a sustained ~30 token/s and ~100k context with cache could print money as a hardware vendor.
You can scroll through r/localllama and find tons of people getting useable speeds out of Qwen 35B.
24 tok / second on an ancient 1080ti
https://old.reddit.com/r/LocalLLaMA/comments/1tcc7h5/24_toks...
100 tok / second on a 4070
https://old.reddit.com/r/LocalLLaMA/comments/1tjh7az/110_tok...