Taalas has a running demo here: https://chatjimmy.ai/
It's eye opening: generated an AVX-512 optimized Mersenne Twister in C in 0.076s, 13,706 tok/s. Too fast for the tok/s to be terribly accurate.
The studies and efforts are ongoing and public, and there are technical hurdles to be faced - but the relevant works go back in time quite a lot and there is heightened interest in it now.
It seems that you simply took the "hyped headlines" for the whole of the work.
Well, yeah, that's what I'm saying. It's odd that there haven't been any major headlines (customer interest, competitors' announcements, etc) other than their initial demo. Good to hear it's being worked on though!
I'd say it pretty consistently starts in the underground.
The real revolution in the context is that it /could/ be done practically - overcoming the hurdles. But for what the interest in the matter is concerned, I'd say there almost cannot be a greater interest at this stage: making NNs efficient. This must be absolutely evident, as evident it is that the separation of memory and processor is against the idea of NNs, as evident as it is that multiplication is achievable just physically.
Of course many have seen that and got on studying it. As soon as it will be optimally practical...
It has only been four months since they unveiled their first prototype. I don't understand your confusion. Chip development does not happen overnight...?
Their initial blog post laid out a roadmap, so theoretically they should have another thing to demonstrate this summer.
The person I replied to was acting as if Taalas was ancient history. I was pointing out it has only been a few months.
Universities are studying, startups are proposing - the «approach» is under the big headlines level but quite lively. Not just Taalas, not just their way - which remains remarkable in the scene as the HW is achieved, working, online, available... and amazing.
If that were the extent of the terms, then what could we call "baking the weights into silicon"? Setting parts of the circuits to determined values for multiplication is is like printing a Read-Only Memory. (And you compute at it: Compute In Memory.)
> CIM where you still need general purpose ALUs all over the place
If that were so, then why do taxonomists present analogue computing as part of CIM? Ohm's Law does not constitute an "ALU" the way you intend it.
Simply, I used CIM, "Compute In Memory", for lack of a better term - for "store data there where you modify data", for "beyond Von Neumann's separation of data storage and processor".
And I do not get your rant about "analog computing", which has everything to do with NNs (otherwise, well, prove it): they started with that - they are basically that in fact. Analogue computing is a very great temptation since it would solve the issues of inefficiency in digital NNs. Unfortunately, it has drawbacks which are massive for big NNs. Taalas' seems to be the best compromise.
I guess that makes sense. Is this feasible, or does the added latency between chips kill any of the performance gains?