undefined

points

[-]

this appeared some time ago, https://taalas.com/, but I'm sure there's others thinking these same thoughts. this would be best for small models imo, nothing frontier because that changes too fast

by 1e1a21 hours ago|

parent|

[-]

you can try it out here: https://chatjimmy.ai/

by Meetvelde17 hours ago|

parent|

[-]

that's so fast it feels fake

by the_sleaze_17 hours ago|

parent|

[-]

13,789 tok/s

Well I've gotten one of those "holy fuck this is the future" deeply unsettled anxious feelings in my gut again. It's been a week or 2, it was time.

by froh12 hours ago|

parent|

[-]

i only found one discussion of the tech here on HN

https://news.ycombinator.com/item?id=47103661

by agazso11 hours ago|

parent|

prev|

[-]

It's indeed super fast, but the output is complete BS hallucination. Not sure what's the value of this.

by runeks7 hours ago|

parent|

[-]

It's a proof of concept that it's possible to etch a neural net into a chip and get massive performance (and efficiency) boost

by Smaug12322 hours ago|

prev|

[-]

By the way, you've seen Cerebras? It's not gone as far as what you described - loads of cores and RAM but you still load up the weights onto it as software and they need to be streamed into the chip for large models - but it is a whole wafer.

by trouve_search22 hours ago|

parent|

[-]

Cerebras is a whole lot of SRAM, basically a ton more L1/L2 cache, hence increasing throughput.

They're pretty supply constrained right now though and their production costs seem prohibitive.

The interesting players at the moment are from Toronto: taalas (print the model onto the silicon) and tenstorrent (dataflow programming based hardware)

by londons_explore21 hours ago|

parent|

prev|

[-]

There is a huge downside to weights being modifiable - it means you need to have multipliers (not simply adders), and SRAM to store those weights.

I suspect for equal performance, that's probably a 5x increase in silicon area (and therefore cost).

by phkahler22 hours ago|

prev|

[-]

>> I wanna see an inference chip where the weights are part of the rom of the chip.

I've been wondering about that for a while now. For a lot of tasks putting weights in ROM is probably OK. OTOH:

>> There would be 1 multiplier per weight...

I'm not sure that is a good idea. Maybe if its quantized down to 2 bits... Otherwise maybe a small ROM near each multiplier (or row of them or whatever) so the multipliers could handle N distinct matrix operations without having to move the data from far away.

Another fun thought is to have a row of MAC units on DRAM so a DRAM row would be a vector. Row size might be 64Kbit or 8K weights if they're 8bit. This also keeps the weights and calcs on the same chip. I'm not sure this would put enough multipliers on one chip though. Systolic arrays can have tens or hundreds of thousands each doing one op per clock cycle.

by cyptus22 hours ago|

parent|

[-]

analog chips could also be very interessting instead of using digital signals and processing them against the weights in the ROM. I have no idea if that scales with such big models though.

by mdp202118 hours ago|

parent|

[-]

The drawback is in keeping signal fidelity (e.g. dissipation, temperature etc.) and in the conversion between analogue and digital.

Nonetheless, yes, there are already implemented solutions for small NNs (I understand mostly acting as triggers).

by whazor6 hours ago|

prev|

[-]

You don't need a single wafer, you can split the model into many smaller different chips and connect inputs/outputs.

Skip VHDL and directly go for GDSII / OASIS. Try to find similar vectors so you get re-usable blocks.

You can dynamically calibrate a chip by fine tuning output.

by freakynit14 hours ago|

prev|

[-]

This may be extreme, or, completely stupid, but, why are we not using genetics to "grow" chips in a chemical soup yet? Similar to Verilog/VHDL, don't we have some similar language to express circuits using gene sequences?

by marcosqanil10 hours ago|

parent|

[-]

I've worked for one of Europe's biggest synthetic biology labs and I know lots of biologists are low-key interested, but current players in semiconductors see it as kind of a tarpit.

IBM used to have a program using DNA origami for lithography back in 2009, which makes sense as lithography masks are a pain to make. I really wish I know why the program was stopped, but most of the researchers are retired by now.

As to whether you can just "grow" the whole chip from scratch, the answer is probably, but it would require lots of non-trivial scientific discoveries. For instance, we can't really make sizable chips using DNA without horrible defect rates. Biology is much better at making redundant rube goldberg machines, than very precise machines with no tolerance for errors.

I think we'd have a better chance of success if we made very weird kinds of chips that better took advantage of the medium, perhaps even something that we "train" rather than just use out of the box.

I'd love it if anyone here knew more about this !

by freakynit9 hours ago|

parent|

[-]

Would it be comparatively easy to make neuromorphic chips instead of traditional chips? I believe probabilistic algorithms like those employed by LLM's must be more tolerable to working with defects as well..?

by whalee13 hours ago|

parent|

prev|

[-]

We lack robust frameworks for 'forward engineering' stochastic thermodynamic computation over molecular free-energy landscapes (which is basically what a "chemical soup" is doing) like we do for analog/optical/digital computing. This is why, as a field, medicine is so heavily empirical and reverse engineering oriented.

by freakynit11 hours ago|

parent|

[-]

Man... I had to chatgpt your comment just to understand. But I do now.

Basically, unlike current chip manufacturing process where every stage is deterministic and precise, the soup-world, the chemistry, is not. And we do not have accurate enough models to handle them in deterministic way, or, model them precisely.

My respect for nature's engineering just shot up by 10 times more.

by AceJohnny213 hours ago|

parent|

prev|

[-]

Are referencing the 1998 short story "Taklamakan" by Bruce Sterling?

by freakynit11 hours ago|

parent|

[-]

Thanks.. just looked it up. Seems super interesting.

by fallat14 hours ago|

parent|

prev|

[-]

Do that at scale

by freakynit14 hours ago|

parent|

[-]

Bacteria do that at scale, far far bigger than all chips combined. All it takes is chemical soup and a few starter seed dna's.

by fallat5 hours ago|

parent|

[-]

Ah, so we're not talking creating full on brains after-all?

by voidUpdate11 hours ago|

prev|

[-]

> "Downside is this chip would be huuuuge - a whole wafer."

Why don't we have chips like that? If a CPU the size of a postage stamp can do x amount of performance, imagine how much performance you could get if you used an entire wafer of chips running in parallel. Obviously there would be certain use cases, like you couldn't fit an entire wafer in a phone, but still

by ngomez10 hours ago|

parent|

[-]

Using the space of an entire wafer for one chip would result in extremely low manufacturing yields. Even with state of the art silicon cleanrooms, there will still be defects in parts of the output.

With CPUs and GPUs, chip makers can disable faulty cores and bin them as lower SKUs to get some yield out of it. But if you're using an entire wafer to embed weights, and a speck of dust causes a printing defect that makes the weights wrong, the entire wafer is worthless.

by voidUpdate8 hours ago|

parent|

[-]

Do failed wafers have to go in the trash, or can you recycle them?

by Jyaif6 hours ago|

parent|

prev|

[-]

What's the difference between disabling faulty cores and disabling the parts of the wafer that have defects?

by RussianCow3 hours ago|

parent|

[-]

I'm not an expert, but I think those are the same thing. But for an LLM etched onto a whole wafer, it doesn't make sense to disable part of it since that would remove some weights entirely.

by cactusplant73748 hours ago|

parent|

prev|

[-]

Is that defect easy to detect?

by kimsey08 hours ago|

parent|

prev|

[-]

We do. The Cerebras line of Wafer Scale Engines is exactly an entire wafer of cores running in parallel with fast memory next to each one. It's intended for very high throughput LLM inference. https://www.cerebras.ai/chip

by WithinReason6 hours ago|

prev|

[-]

One token per clock cycle at 1B parameters would imply 2 ExaFLOPS, consuming about 10 KWs

by 22 hours ago|

prev|

[-]

deleted

by yuriyguts22 hours ago|

prev|

[-]

I've also been thinking about this. Although the forward pass of a transformer model also involves some heavier operations like normalization, reciprocals, exponentiations or other non-linearities (GeLU, SiLU) which may (though typically don't) involve learned weights as operands.

by Salgat19 hours ago|

prev|

[-]

Supposedly memristors would be ideal for this (and it would be reprogrammable), but then again, memristors seem to be the carbon nanotubes of the computing world.

by mdp202118 hours ago|

prev|

[-]

> weights [as] part of the rom of the chip

Not really that: you are pointing to Compute-In-Memory (CIM) - techniques where the data (here, a multiplier value) is part of the processor (here, the multiplying circuit).

The problem of "fetch and process" is bypassed completely architecturally: the data is there where the processing happens - it's not moved, there is no latency.

by zkmon21 hours ago|

prev|

[-]

firmware upgrade would mean flashing a huge BIN file.

by HDThoreaun17 hours ago|

prev|

[-]

How would the pipelining work when the next token depends on the last token?

by cruffle_duffle22 hours ago|

prev|

[-]

“ Wafer level faults probably won't matter though - neural nets are resistant to a few missing or wrong weights.”

Brain science people “love” traumatic brain injury cases because it can help explore what happens when bits of the “brain wafer” get damaged. We’ve learned a lot from such things.

I wonder if people are intentionally “destroying” parts of the model weights to learn more about what happens? Like could you strategically wipe a gig of the model so it’s “all zeros” and see what happens?

I have to wonder

by zurfer22 hours ago|

parent|

[-]

This is called mechanistic interpretability. There is lots of fascinating insights already since you can do basically everything down to the neuron or weight level thousands of times. The human brain is many orders of magnitude harder to make sense of.

by sometimelurker22 hours ago|

parent|

[-]

well its actually called ablation, and its one way to do mech interp. anthriopics got a bunch of work on mech interp here https://transformer-circuits.pub/, like SAEs and NLAs

by Cantinflas21 hours ago|

parent|

prev|

[-]

by mdp202118 hours ago|

parent|

prev|

[-]

Of course tampering with chunks or nodes in the NNs is a way to study the "spawned" (through gradient descent etc.) configuration and "reverse-engineer the black box" to get "AI transparency".

Anthropic published an important work around one year and a half ago.

by mdp202112 hours ago|

parent|

[-]

> Anthropic published an important work around one year and a half ago

> #Tracing the thoughts of a large language model#

https://www.anthropic.com/research/tracing-thoughts-language...

https://news.ycombinator.com/item?id=43495617 (27 March 2025)

by Computer022 hours ago|

parent|

prev|

[-]

Reminds me of Golden Gate Claude (https://www.anthropic.com/news/golden-gate-claude)