As a follow-up, I can see there is not a lot of belief which is why it is also hard to find a company to partner with on this. So, how -do- you make money on something like this as an independent researcher. Maybe I release trick one, show how guided window attn (and nn memory and probably a lot of robotics) can be trained? Thoughts? I can do that pretty quickly. By itself that is a pretty great tech (combined with fixed windows of full attn it is pretty amazing). The second trick, I think, is a bit more powerful although both are general purpose. If I do this, think people will believe trick two (and all the real time multi-modal streaming stuff)?
I´m super curious about those "Two Weird Tricks". I would like that you would release more. It remember me the MiniMax Sparse Attention https://arxiv.org/html/2606.13392v1
Yeah, looks like fun stuff. You still need to preserve the entire kv cache though right? So even if compute is drastically less, memory keeps growing. The system I described keeps memory constant (well, if you keep the entire token history you technically are gaining one long of data per token generated but I think we can agree that is negligible and could be capped at something high like 1B or so with no meaningful impact). I think I will probably release trick one and see if people then believe trick two even without seeing it.
It's not quite true to say that if you release it you get nothing. If it's worthwhile and picked up by the open-weights labs, you get much bigger and better models implementing it than you would have had access to or been able to train otherwise, quicker than if they had to figure it out de novo.
Yeah. I am about to the point of just releasing it all. I love the tech. It does amazing things. But I want to move to the next big things I can see doing with it and building the custom ops to get it to work efficiently is a pain. I am positive others would run with it and make it all way better which would free me up to do more.
Expensive and if someone figures out a slight different way to do it you arent really “covered” its not a unique umbrella plus you would sort of give away the secrets.