OpenAI unveils its first custom chip, built by Broadcom

[-]

Chip CEO here. It really depends on what "design" or "production" means. Does "design" mean that the design was complete? Does "production" mean the beginning of production, i.e. tapeout? If measuring from RTL-freeze to tapeout, this is a fairly typical (even somewhat unimpressive) timeline (accounting for some unexpected issues) for a large, complex 3nm chip. If measuring from concept (no RTL at all, block diagram of architecture) to tapeout, this is an amazing timeline. The truth is probably somewhere in between. A more concrete statement would use actual technical milestones and gates.

by otterdude22 hours ago|

[-]

Not a chip CEO, but I read this article and thought that they're working on some kind of application specific chip only for serving models. Similar to how an FPGA can optimize certain tasks.

Given constant weights / biases of a Transformer / DNN you could use pipelining to feed forward calculations through the array one layer at a time. For DNN's with thousands of layers you might see 1:1 speed up per layer channel.

I doubt they would undergo this process for marginal gains.

by kmacdough9 hours ago|

[-]

With a striking lack of numbers, I'm not confident. I my experience, everything underspecified in a marketing release is unflattering. They're also not a chip designing company, but they're probably trying to keep up on the eyes of investors. As the article mentions, several of their competitors are chip designers and already have working procuction inference chips.

by SwellJoe9 hours ago|

[-]

When you have a few billion dollars you can hire chip people and partner with a chip company.

That's not to say I expect they'll ship something competitive with Google's custom AI hardware on the first go, since Google has been at it for quite a while, but there's very few technical problems large sums of money won't solve.

by IX-1036 hours ago|

[-]

Yeah, I'm not sure how competitive it is without any specs. Just from it being "inference only" that puts it on the same level as Google's 2015 TPUv1.

by zgao20 hours ago|

[-]

Yes, my statement was not about the quality or performance of the chip -- simply the tapeout timeline that was stated, by itself.

by xdavidliu22 hours ago|

[-]

i don't understand what the second paragraph is saying.

by nine_k21 hours ago|

[-]

In very crude terms, AFAICT, if you have a bunch of matrix multiplications, but one of matrices (the one with model weights) doesn't change, you can seriously speed up the computation. One thing is that you don't need to re-fetch the elements of the constant matrix, you can keep it near the ALUs. Then you maybe can detect and ignore sparse / empty blocks by marking them once.

IDK how the custom hardware exploits this; would love to hear any ideas!

by guyomes21 hours ago|

[1]: https://inria.hal.science/hal-04689673/document

[-]

> IDK how the custom hardware exploits this; would love to hear any ideas!

You might like this article [1], titled "FPGA-based CNN Acceleration using Pattern-Aware Pruning". More context and details can be found in the PhD thesis of Léo Pradels [2].

[2]: https://theses.hal.science/tel-05021575v1/file/PRADELS_Leo.p...

by fulafel2 hours ago|

[-]

Current accelerators (TPUs, various onchip NPUs) are something close to this. Systolic array is the estabilished computer architecture term for flowing data from computation to computation without the overhead of a register file or von Neumann bottleneck.

by cm218718 hours ago|

[-]

Random thought. Once models stabilise, could you possibly hardcode the model in gates? Or are they too large for a single chip?

by 8note18 hours ago|

https://www.anuragk.com/blog/posts/Taalas.html

[-]

by lsaferite14 hours ago|

[-]

https://taalas.com/

by jwHollister13 hours ago|

[-]

wow if they can get something like this working, what happens to all this infrastructure? Hyperscalers have to be assuming the lifespan of that stuff wrong considering the next gen will be 1000x more efficient.

by otterley12 hours ago|

[-]

The question isn’t whether it works (it does); the question is whether there are buyers for hardware that is obsolete the day it ships. Models evolve much more quickly than hardware can keep up.

by simondotau12 hours ago|

[-]

Presumably at some point the rapid progress of models will plateau, at least insofar as a model could be frozen in time and remain economically useful for the expected life of hardware. Especially if it comes with compelling benefits e.g. dramatically lower latency and/or dramatically higher performance per watt.

If you can build chips that could run one specific LLM 100x faster than anything else, it would have a use case that nothing else could match.

by lsaferite4 hours ago|

[-]

Those taalus chips apparently run at 1/10 the power as the current SOTA GPU setups. If they can execute even partially on their plan, it'll be a literal game changer.

by fragmede10 hours ago|

[-]

https://www.cerebras.ai/ is exactly that! Holy shit it's fast.

by otterley6 hours ago|

[-]

Cerebras is not that. Cerebras isn’t tied to a particular model like Taalas is. The latter is even faster than Cerebras.

by wrsh075 hours ago|

[-]

Right, but there exist problems that need to be routinely solved and can be solved on glm 5.2. is the model state of the art when it is published? No. But when it comes out you could optimize it and let your solver run forever for quite cheap, and that could be useful if the only problems you want it to solve (for cheap) are solvable by that model.

And the high water mark of what can be solved by open models will keep going up.

by indigo94510 hours ago|

[-]

One obvious use case is edge computing, such as in industrial applications that cannot tolerate the risk of a network link or cloud service going down. Even embedded use cases are possible, such as an image classifier model in a security camera.

by cm21878 hours ago|

[-]

In fact any application where the task is stable and the model good enough to address that task. As you suggest, industrial applications where a robot must deal with variants of the same repetitive task. Or a military drone which needs to be jamming proof.

by Someone7 hours ago|

[-]

> Or a military drone which needs to be jamming proof.

That, if used in war, I would think, would need the ability to be updated frequently. For example, your enemy might find out (say by running tests on hardware they captured from you) that painting some red paint in a particular shape (a smiley might even work) on their hardware prevented your drones from attacking them because it confuses that pattern with the Red Cross logo.

by 5423542342351 hours ago|

[-]

Those are really two different things. One is the computer vision that could be “hard coded” and the other is the image library, that would be updated regularly. Look at facial recognition. You can download and run a facial recognition LLM on your GPU that looks at a library of your personal photos. The LLM doesn’t change when it scans your photos for faces, it just writes the data associated with a “face” to whatever library. When you add a new picture, it adds that face data and compares it to the library for a match. The actual LLM never needs to change. It is the same as the one I downloaded and ran on my GPU for my photos. If it was written on chips we both bought and installed, it would work the same way.[1]

[1] Yes, this is a massive simplification

by TeMPOraL4 hours ago|

[-]

You keep the "reasoning core" burned and play the cat-and-mouse game at the I/O edge. Enemy invents a smiley shield, your R&D figures out some filtering step that defeats this effect without compromising general image recognition. Then the enemy figures out a new trick, your R&D invents a countermeasure, and so on - point is, this can happen for a long time in layers on top of the core model. If the enemy invents some robust way to attack the core that cannot be filtered out, it's game over for that hardware, but that is a much more difficult task and might take longer than expected service time of a given batch of drones.

by SoftTalker1 hours ago|

[-]

Sort of mirrors how biological organisms work. E.g. in a bird, the core functionality of knowing how to fly is burned in. Hunting food is probably a combination of experiential learning on top of instinctive behavior, and is somewhat adaptable to local conditions.

by largbae4 hours ago|

[-]

There may be all sorts of stable use case models that this could be interesting for. Imagine permanent voice translation circuits at a tiny fraction of the current price, glasses that subtitle the world with long battery life.

by lsaferite5 hours ago|

[-]

They are betting on fast release cycles coupled with much lower costs (purchase and operations) mixed with the ability to have dynamic fine tunes on top of the static model.

by SwellJoe9 hours ago|

[-]

The models have to run on something or they're useless. They can't run on future hardware today, and people want to use models today. So, if hardware is obsolete the day it ships, we're all using obsolete hardware, and there's no alternative to that.

by otterley6 hours ago|

[-]

Taalas encodes the model into the hardware itself. The two are inextricably coupled. It’s like buying a CNC router that can’t be reprogrammed to build anything other than a specific predetermined kitchen cabinet. And the model used inside is frozen many months before the hardware ships, since the process from tapeout to production takes that long.

In contrast, tomorrow’s models will typically run, although perhaps more slowly, on general-purpose inference hardware that was released today or even years ago.

by otterdude22 hours ago|

[-]

Basically getting around the branch predictor problem with generalized compute architectures https://en.wikipedia.org/wiki/Branch_predictor

by pama16 hours ago|

[-]

If you look at the timelines for the hiring of the hardware team, this was an extremely fast and high risk implementation from concept to tapeout. Amazing it works at all during bringup.

by nonethewiser23 hours ago|

[-]

>If measuring from RTL-freeze to tapeout, this is a fairly typical (even somewhat unimpressive) timeline (accounting for some unexpected issues) for a large, complex 3nm chip.

Even for a company’s first design?

by hailwren23 hours ago|

[-]

I don't think you get the newcomer novelty buff when your val approaches 13 digits.

by RugnirViking21 hours ago|

[-]

Big companies are lumbering behemoth, crude assemblages of barely cobbled-together incentives and principal agent problems in a trenchcoat. Getting them to change direction, or worse, try something new at scale, is a massive undertaking

by mlinhares20 hours ago|

[-]

Nah, you just need to get the CEO behind it. Most coordination issues get solved when the CEO is breathing down your neck to get something done. Trouble is that they don't do this enough.

by eru11 hours ago|

[-]

CEOs have limited bandwidth, and can only breath down so many necks at once.

by NBJack18 hours ago|

[-]

Eh, zero guarantees on that one.

The Fire Phone was Jeff Bezos' personal baby, and we know how that went. Then there was the Apple G4 Cube with Steve Jobs, the Model X' Falcon Wing doors and Elon, and lets not even talk about the Metaverse and Zuck.

by aleph_minus_one18 hours ago|

[-]

> The Fire Phone was Jeff Bezos' personal baby, and we know how that went.

I'd rather guess that Jeff Bezos' opinion on what makes a good phone is/was different on the opinion of many potential buyers.

by AtlasBarfed2 hours ago|

[-]

An Amazon phone with Amazon Video, playing Amazon Music, making phone calls throug the Amazon messenger, with an Amazon Browser that overlays ads to Amazon products, and has Amazon Voice Recognition ... blah blah blah

I imagine when you are a billionaire from one company, every time you hear the name of the company you hear your name, so you can't really think about what Joe Schmoe wants in a phone independently of your ego.

I guess this is what Steve Jobs was better at. SOME focus on the customer independent of his ego and Apple Apple Apple. I did say ... SOME.

by kQq9oHeAz6wLLS17 hours ago|

[-]

Actually, you've provided examples that prove the point. None of those were especially good (though everyone wanted the G4 Cube), and yet they made it to market anyway. Why?

Because the CEO was behind it, breathing down their necks.

by NBJack13 hours ago|

[-]

Pretty much every example is considered an abysmal failure that often costed the actual workers their careers while their CEO carried on.

If you consider that outcome a worthwhile endeavor, I don't know what else to say.

by inemesitaffia11 hours ago|

[-]

He's definitely not talking about worthy endeavour.

He's talking about an endeavour reaching the market.

I'm sure if Zuckerberg wants to spend $10B on Nuclear Fusion it will happen.

by DANmode5 hours ago|

https://www.esgdive.com/news/meta-inks-nuclear-deals-terrapo...

[-]

It’s fission, not fusion:

…and if they do all of this, it’ll be closer to $20B than 10!

by TeMPOraL4 hours ago|

[-]

If all it took to get viable fusion power was a FAANG CEO with $10B to burn, I'd be first to petition for it to happen, and even throw whatever money I can spare onto that pyre.

by zgao20 hours ago|

[-]

The typical way a chip effort in a non-chip company works is that the "design" is the RTL (e.g. SystemVerilog that defines the behavior of the chip) and then this is handed off to a third-party "design house" (such as Broadcom) that turns that code into a real image of a chip, which is called a GDS (basically you can think of this as a very big layer by layer photoshop file) that can actually be sent to a fab. This is called "backend design", in contrast to the "frontend design" (the RTL itself).

As another commenter said, Broadcom is very experienced with backend design (as well as the supply chain management, testing, etc. that comes after the chip is taped out) and so this can't be regarded as a "first chip". Richard Ho (the head of hardware at OpenAI) is also extremely experienced and used to be the head of the Google TPU effort -- where he actually worked with Broadcom in a similar tapeout already. So yes, this is not a "first design"!

by surajrmal18 hours ago|

[-]

I wonder if broadcomm borrowed IP between the Google tpu and this design. How would you ever know it didn't happen?

by zgao16 hours ago|

[-]

There is no real way to prevent this, but there are ways to increase the cost of doing so. For example, one level of obfuscation is, OAI could internally run synthesis and adopt a “netlist-in” model in which Broadcom gets a netlist - a description of a huge amount of gates and wires and how they connect - instead of the plain Verilog (or other language). It is possible to reverse engineer the netlist, but it’s a certain level of indirection and effort.

A big part of the semiconductor industry also operates on a reputation basis. Broadcom (like TSMC) is a neutral party as a design house, but if they did something like this, it might ruin that reputation.

by 14 hours ago|

[-]

deleted

by kQq9oHeAz6wLLS17 hours ago|

[-]

More likely that the AI training set contained the IP of others, and we all know how that turns out.

by formerly_proven22 hours ago|

[-]

This isn't Broadcom's first design.

by swiftcoder21 hours ago|

[-]

Yeah, "first chip" here likely means they contracted Broadcom (or a firm with similar experience) to do all the heavy lifting. Building out your own in-house teams for this sort of thing is a decade-long project - just look how much inside Apple's early chips was licensed ARM / PowerVR cores

by MisterTea21 hours ago|

[-]

Apple didn't have the talent in-house until they bought Intrincity who worked with Samsung on Apple's earlier Arm chips as well. https://en.wikipedia.org/wiki/Intrinsity

by donavanm14 hours ago|

[-]

That’s not quite fair. As I recall there were about 1,500 people in that part of the hardware org circa mid 2000s. Before PA Semi there were pretty established teams already doing VLSI/PD/verification/validation, PCB, and of course analog/mixed hardware, in their own work and in conjunction with samsung, old broadcom, qualcomm, etc. Lots of inhouse work went in to all those bespoke monitors, phones, apple tv, airports, etc etc.

My recollection is that PA Semi was very much for the architectural and design talent, even though it was an “asset purchase” and all the existing Power & military chips were hived off.

For Intrinsity I recall a lot of interest was actually in their existing graphics work and EDA. ISTR that those early mobile GPUs were what they focused on.

I was in the mansfield org circa ‘07-11. I spent a lot of time flying between cupertino and austin/bee caves that first year.

by selectodude21 hours ago|

[-]

I think the folks at PA Semi had some chops too.

by reinitctxoffset19 hours ago|

[-]

The way I heard it PA Semi was the singular driving force that led to Apple Silicon, but I'm not any kind of insider that's just the chatter I heard.

Whoever it was, whooo, that's hot shit. I remember an M1 MacBook Air just cleaning the clock of an Intel MacBook Pro and thinking "x86_64 has real competition again".

Great silicon. I'm over it with not having root on my own machine, so I've left the ecosystem, but it's really nice hardware, can't dispute that.

by markhahn1 hours ago|

[-]

it would be interesting to know apple's true/inside attitude towards people putting linux on their hardware. they don't seem very interested in helping, but donno whether they actively sabotage either.

by re-thc16 hours ago|

[-]

> The way I heard it PA Semi was the singular driving force that led to Apple Silicon

And a lot of them are sitting under Qualcomm via the Nuvia acquisition.

by stinkbeetle19 hours ago|

[-]

PA Semi group did the logic designs. I think they're talking about physical design though.

by dndmfnfn17 hours ago|

[-]

[dead]

by Aurornis1 days ago|

[-]

The hardware description languages (HDL) used in chip development are like programming languages. The existing models understand them and can do a lot with them. You don’t need to have separate, specialty models designed for this work to use LLMs in chip design workflows.

Design verification also involves a lot of traditional programming which benefits from LLMs.

So it’s not meaningless at all. You could download some of the open source chip design software today and the LLMs could even help you get started on your own tiny chip if you are so interested.

by knicholes23 hours ago|

[-]

I tried making a button using Claude entirely (including the 3D printed enclosure) and it effed up pretty hard with the traces and the header spacing. The project was a big red arcade button that plays the "ah-my-groin.mp3" when pushed (from Simpsons). It did cool work on saving battery life, and the 3d enclosure was awesome, but yeah, I'm convinced I'd have to do another version or two of the custom chip until it came back right. I used a Blender MCP for the 3d modeling. I used a KiCAD MCP server for the chip design/validation.

I think we're not there yet. I've been meaning to look at this flux.ai to see if it has the prompts/workflow worked out better than what I was able to cobble together in a few hours. Maybe Alteryx's MCP server would have been better. I'll try that this weekend for another board I've got.

by Aurornis23 hours ago|

[-]

> I tried making a button using Claude entirely (including the 3D printed enclosure) and it effed up pretty hard with the traces and the header spacing.

PCB design and 3D CAD design are different topics.

Hardware Description Languages are closer to programming languages than CAD. Look at some Verilog to get an idea - https://en.wikipedia.org/wiki/Verilog

by knicholes23 hours ago|

[-]

Right. KiCAD for PCB design. Blender for 3D CAD. Oh, are you saying I should have used something other than the KiCAD MCP server for better results?

by VorpalWay21 hours ago|

[-]

Designing circuit board and 3D models (even using something like OpenSCAD) is a very spatial process today. You are dealing with coordinates one way or another.

This is very unlike how FPGA and (I assume) ASIC is done. That is more like a traditional programming language but everything happens all at once (no sequence of statements outside tests, if you need that you have to write a state machine yourself). You define logic expressions between signal, add stateful latches, etc. But you never specify the physical layout.

Instead you feed your description to a tool that acts a constraint solver/optimiser that computes the layout for you (this is for FPGAs called synthesising IIRC, it is akin to a compiler). Typically quite slow, even for small circuts like we did at university it took minutes, and for large circuits it might easily days.

Now, this raises the question, what if you design a PCB net list using AI, but then use traditional autorouting and layout? I believe that can also be done, but I have no experience designing PCBs, so I don't know how well it works.

by dubbie998 hours ago|

[-]

Autorouting PCBs doesn’t really give usable results on all but the simplest cases. It seems to be a very difficult problem to solve even though a human doing it is only following a relatively simple bunch of rules and goals in his or her head.

by TeMPOraL4 hours ago|

[-]

Simple bunch of rules and goals backed by extremely sophisticated visual intuition.

Pretty sure someone already tried throwing VLMs and diffusion models at this, wonder how that fared.

by quadrature22 hours ago|

[-]

VHDL is not a language for spatial design. Its more akin to a programming language with circuit semantics.

by y1n020 hours ago|

[-]

For digital chip design, research Verilog and/or SystemVerilog, and for tools, check out verilator and the OSS cad suite: https://github.com/YosysHQ/oss-cad-suite-build

by dcrazy23 hours ago|

[-]

They’re saying that VHDL is an entirely different concept than physical modeling.

by giancarlostoro22 hours ago|

[-]

You're comparing apples and oranges.

by cwillu22 hours ago|

[-]

Meta: can we not downvote people who are clarifying what they're saying and asking questions, even if they're wrong about something, if the content isn't otherwise objectionable?

by baq22 hours ago|

[-]

I didn’t downvote, but the OP is either a troll or someone who doesn’t want to notice he doesn’t know what he’s talking about. Either way we want less of that on HN.

by knicholes22 hours ago|

[-]

I'll acknowledge that I don't know what I'm talking about. I really appreciated the clarity! Surely you find value in knowing that creating your own custom chips is almost doable by someone who doesn't know what they're talking about! (also, I am a troll, but in this case, just clueless)

by Lukas_Skywalker21 hours ago|

[-]

Maybe the confusion stems from the word "chip". Creating a chip usually means designing and producing a microcontroller or a processor, not a printed circuit board that you populate with existing chips.

by knicholes19 hours ago|

[-]

Ohhhhhh! Yes, that's exactly the problem. It all makes sense now. I was just piecing together an existing microcontroller and a mp3 module by printing a custom circuit board.

by 20 hours ago|

[-]

deleted

by tamimio22 hours ago|

[-]

One (kicad) make the board, the other (blender) make the casing for it. Both are “hardware” but is electronics and the other is mechanical. Electronic one AI can do a good job, I can’t wait for it to fully built the whole circuit for you based on your specs.

by rpcope122 hours ago|

[-]

PCB layout is an art, and doesn't seem to map well to LLMs (I tried for shits and giggles recently). Claude in general, kind of like code, does a lot of redundant belt and suspenders stuff in the schematics it generates (if it can generate them at all). It's one of those things that's really not there yet outside of the simplest designs.

by BioGeek17 hours ago|

[1] https://deeppcb.ai/reinforcement-learning-pcb-routing-explai... [2] https://deeppcb.ai/cooper/ [3] https://deeppcb.ai/deeppcb-kicad-plugin-ai-pcb-routing/

[-]

DeepPCB has an AI autorouter [1] that uses reinforcment learning and works really well. Recently they also released an AI agent that analyzes your board, proposes plans and can route your board for you [2]. They have a KiCad plugin [3] and you can try it for free.

Disclaimer: I work at InstaDeep, the company behind DeepPCB, but I don't work on this product.

by chamomeal14 hours ago|

[-]

Sounds like a super cool project. Gonna post the design anywhere?

by knicholes49 minutes ago|

[-]

I'll update it this weekend with the updated AI-generated fun (and correct the flat-out ai-generated lies in the README). Meanwhile, you can see the project here. https://github.com/knicholes/ah-my-groin-button

by ses198423 hours ago|

[-]

The question isn’t whether or not they employed a particular tool, the question is how big of an impact did it have.

by 23 hours ago|

[-]

deleted

by nradov22 hours ago|

[-]

Most HDL code is locked up behind corporate firewalls and not available as training data. While LLMs can handle it to an extent there's a lot of room for improvement. I'll bet that OpenAI and their competitors are racing to license this IP from major hardware vendors in order to compete in the chip design vertical.

by tonfa21 hours ago|

[-]

Does it work better when using compiler based ecosystem (e.g. https://github.com/llvm/circt)

by bsder20 hours ago|

[-]

There is quite a lot of Verilog/SystemVerilog and VHDL code in the wild. And hardware description language code is very simple and straightforward relative to programming code.

And the two things that take up VAST amounts of time in ASIC design are testbenches and timing closure.

A LOT of hardware design is testbenches to verify things. AI is REALLY GOOD at generating things like testbenches. And nobody really cares if the quality of your testbench code sucks as long as it validates what it claims to.

I don't know how good AI is at timing closure, but I wouldn't necessarily be surprised if it is pretty good at it up to the physical point. That's lots of textual output which you can put a constraint on.

Everything involving physical design, though, tends to be a disaster waiting to happen if you let AI loose on it.

by doxeddaily22 hours ago|

[-]

This reminds me of the dude on youtube building a chip fab in his shed.

by einpoklum6 hours ago|

[-]

> The existing models understand them

No they don't.

by holoduke20 hours ago|

[-]

One day we can design our own pcb with chips, hardware and other io. Companies will accept these as files and you can collect your pcb the same day. I think in China they are doing this already

by remexre17 hours ago|

[-]

hasn't pcbway been doing this for years?

by IshKebab23 hours ago|

[-]

> The existing models understand them and can do a lot with them.

In my experience they are not especially good at SystemVerilog. There's a lot of knowledge about it that is locked behind paywalls and it's very niche.

My guess is the "from scratch" here is quite the exaggeration. Otherwise why did they need Broadcom?

by whynotminot23 hours ago|

[-]

Doesn’t Broadcom bring a lot more to bear here than just Verilog? Including relationships with the actual fabricators.

by IshKebab10 hours ago|

[-]

I doubt that is really significant - fabs are happy to work with anybody. What they will likely bring is:

* Physical design team (stupidly known as the "backend"). This is extremely specialised knowledge and most chip companies don't really want to have to deal with it if they can avoid it.

* IP blocks. Especially for annoying things like phys, memory controllers, USB controllers, PLLs, power, etc. These things are difficult to do, difficult to test, and often critical (good luck if your clock doesn't work...) I would not at all be surprised if Broadcom supplied CPUs too.

My total guess at what happened is Broadcom supplied most of a SoC and OpenAI added an LLM coprocessor module to it, and probably asked them to add like 10x more DRAM interfaces.

by aseipp21 hours ago|

[-]

Not having a free toolchain that can actually handle the real language has probably been pretty bad on the downstream public knowledgebase. Hopefully Verilator can eventually close that hole, and there can be more high-quality designs and codebases incorporated into future models. Claude is at least good enough to write SV that triggered a compiler crash or two. :)

by cloudengineer948 hours ago|

[-]

Broadcom also has direct allocation with TSMC, which is a big win

by 21 hours ago|

[-]

deleted

by aurareturn21 hours ago|

[0]https://developer.nvidia.com/blog/scaling-token-factory-reve...

[-]

Broadcom already has a ton of IP for AI SoCs. I'm guessing the hard parts of this inference chip was already designed by Broadcom and OpenAI simply told Broadcom what it wanted. It's likely very similar to Google's TPU.

  Early testing shows that the first-generation accelerator will deliver performance per watt substantially better than current state-of-the-art

What is substantial here? Vera Rubin is shipping in volume later this year and it is expected to be 10x more power efficient for inference than Blackwell.[0] Even if they're already taped out the chip, getting bugs fixed, getting chips manufactured, getting HBM allocation, getting a rack design, hooking them up together, putting them in a data center will likely take at least another 12 months or likely more. By the time this chip is in data centers in volume, they're likely competing against Vera Rubin Ultra or maybe even Feynman.

Personally, I don't think OpenAI should have invested in this project. It's too early for them. They should have focused on models like Anthropic and win there. When they're profitable, they can take on these projects.

The risk here is very high for OpenAI because AI has a hard cap in energy. If you have a gigawatt, you should only install the best chips. If Nvidia's chips are better, then this is a wasted project and likely wasted billions.

by cptskippy21 hours ago|

[-]

Why do you assume Broadcom has a ton of IP for AI SoCs but hasn't done any of the other work around data center scale deployments?

by aurareturn21 hours ago|

[0]https://developer-blogs.nvidia.com/wp-content/uploads/2026/0...

[-]

They have. That's why OpenAI was able to get a working demo in 9 months. But going from a small scale system to a full fledged data center deployment is likely much harder.

I don't know how much of the things outside of the chip Broadcom has vs Google's proprietary tech that is not shared with Broadcom.

Nvidia's Vera Rubin has 6 unique chips working together in a single rack.[0]

by threecheese21 hours ago|

[-]

I’m just happy to see diversity here; sometimes I feel like Nvidia is going to eat the world, with buying other fabs and branching out - or up, I guess - from chips and racks to models, frameworks, and end user stuff.

by surajrmal18 hours ago|

[-]

I thought most of the Google tpu magic is on wiring up these chips into supercomputer like clusters with specialized interconnects and whatnot. The chips themselves are less interesting in isolation.

by luma16 hours ago|

[-]

I know nothing of what is happening here but Broadcom has a lot of IP in high speed/low latency data transfer from chip to datacenter scales.

by AtlasBarfed2 hours ago|

[-]

"Substantial" seems like a damning word.

So one of my pet theories I haven't seen in general discourse is that AI came from the massive vector processing jump available commercially in GPUs when it left CPU bound processing behind. That's a factor of 100x-1000x of processing power.

AI is not-quite-there, and to get even another leap might take another 10-100x processing power.

Now... what? ASICs probably won't deliver even a 10x? There's only so much you get out of node shrinks.

"Substantial" doesn't even mean twice IMO. "Substantial" almost sounds like ... 15% better?

by dofm1 days ago|

[-]

Right. There are two possible meanings and shades in-between:

1) OpenAI genuinely have AI technologies that can improve chip design (bold, unlikely claim, needs evidence)

2) OpenAI designed test/verification models and kernels that could be run on the simulated hardware to test its performance

As you and others have said, it's hard to trust when they are happy to write something that could easily only mean the latter but sounds like the former.

by lovasoa1 days ago|

[-]

3) The engineers working on the chip used ChatGPT from time to time.

by Catloafdev1 days ago|

[-]

I'd be shocked if it was anything more than this.

by changoplatanero1 days ago|

[-]

Browsing openai's job postings in the past few months is enough to contirm that it's more than this. They are for sure making serious efforts at building ai for chip design.

by xnx23 hours ago|

[-]

Impossible to know. Could be fake/aspirational roles to impress investors with their grand vision.

by NitpickLawyer22 hours ago|

[-]

Jesus. This is tinfoil hat territory now. Why would they fake something like that? ANY company in this field would try to become free from nvda. Goog has done it already, amazon has their own thing, so it can be done. Not saying they'll 0shot this vertical, but ffs, they don't need to fake anything. They are making an effort, and it would be insane to think they aren't. Might work, might not work, but to even think that the effort is fake is going too far.

by kdheiwns8 hours ago|

[-]

Asking "why would a company lie?" is probably the funniest thing I've seen all month. Every big company puts out BS nonstop.

by Planktonne20 hours ago|

[-]

They have a history of lying and making grandiose claims. It's unreasonable to extend them the benefit of the doubt again.

by reinitctxoffset18 hours ago|

[-]

It kinda depends on what your prior is. Some companies do a press release and I immediately pay attention or even take action.

Other companies? Fool me once Altman, let's see the thing at scale making money.

Near frontier AI is clearly relevant to some kinds of logic design, I'm learning some Hardcaml at the moment and yeah, AI is super helpful.

Can it leapfrog a company without hardware experience to near the front of the pack of companies with decades of hardware experience? Less obvious.

Unrelatedly, would OpenAI dramatically overstate something to manipulate the press and public and capital markets?

It's arguably their core competency .

AI is going to matter in logic design and synthesis. How much, how soon, and where are open questions.

by luqtas22 hours ago|

https://antoniocortes.com/en/2026/03/10/ghost-jobs-the-econo...

[-]

by NitpickLawyer21 hours ago|

[-]

I'm not saying this isn't a thing. I'm saying oAI doesn't need to fake trying to make a chip or hiring people to make AI better at chip making, or dogfooding or anything like this. It's obvious they're doing it. They'd have 0 reason to fake something like this "for the investors". Come on!

by signatoremo22 hours ago|

[-]

Do you have inside knowledge?

by fl4regun23 hours ago|

[-]

at the hardware company I work at, people are now using claude code and developing skills for it to do basic stuff like triage or do initial debug on failing tests, search for potential causes in RTL, generate skeleton documentation for designs etc

by dofm23 hours ago|

[-]

But isn't this rather the ordinary product of an LLM, now?

Is it worth the claim that they are making in a press release?

by girvo19 hours ago|

[-]

> Is it worth the claim that they are making in a press release?

Definitely, yes, because being vague about it like they have been lets investors fill-in-the-blanks with whatever they want it to mean.

by reducesuffering23 hours ago|

[-]

From time to time? Lol you must realize, frontier lab eng are using Codex/Claude-Code 99% in loops, on models the public doesn't have access to. Why? Because it works. Just a matter of time before humans are out of the loop and what comes next is a black hole

"The future is here, it's just not evenly distributed"

by wongarsu1 days ago|

[-]

Or OpenAI accelerated the design and optimization process by summarizing emails exchanged during the design and optimization process, or made it possible to ask an AI questions about meeting notes

by Aurornis23 hours ago|

[-]

> 1) OpenAI genuinely have AI technologies that can improve chip design (bold, unlikely claim, needs evidence)

Chip design languages (HDLs like Verilog or VHDL) are well understood by LLMs. They don’t need specialty tools to use GPT-5.5 or other LLMs with them.

You could even try it yourself with open source chip design tooling if you wanted to see it.

by dofm23 hours ago|

[-]

Yes, obviously. But do we think LLMs without access to proprietary information do a better job with them than Broadcom's human experts or existing proprietary tools at this level of operations?

It is still a bold claim and it still needs evidence.

We would obviously get a bit more of the evidence if it were to be more useful for the upcoming IPO than this rather open-ended, reinterpretable phrasing.

by fc417fc80217 hours ago|

[-]

> do a better job with them than Broadcom's human experts or existing proprietary tools

No, obviously. They'd be expected to do a substantially worse job and yet still drastically accelerate the design process.

LLMs make all sorts of dumb mistakes when writing c++ or python yet are nonetheless massively beneficial.

by dpe8223 hours ago|

[-]

I don't understand why you're getting downvoted.

I've used GPT-5.5 and Opus both for FPGA design with good results. We built a lot of tooling around it to help the models, but even without that they're definitely capable of designing digital logic.

by dmitrygr22 hours ago|

[-]

My guess: it is that those who KNOW the subject realize that LLMs suck at it, and those who do not, do not realize it, since their output is plausible, and sometimes even works.

This actually plays out across every field and is well documented. An expert can recognize the hallucinations and bullshit coming out of LLMs, while non-experts see plausible output and do not know enough to know it is BS.

by stevenhuang20 hours ago|

[-]

Wrong. Myself and colleagues know the subject and they are useful in FPGA design. You should stop hallucinating about topics you don't have experience in.

by wmf23 hours ago|

https://dl.acm.org/doi/10.1145/3785362

[-]

https://developer.nvidia.com/culitho

https://www.synopsys.com/blogs/chip-design/analog-layout-syn...

https://arxiv.org/abs/2302.06415

by etempleton22 hours ago|

[-]

I feel like they would be very specific if it was no.1.

by scrollop1 days ago|

[-]

Perhaps they used gpt 5.5 mini to draft emails. Create a coffee schedule.

by oceanplexian23 hours ago|

[-]

> OpenAI genuinely have AI technologies that can improve chip design (bold, unlikely claim, needs evidence)

Why is that a bold and unlikely claim?

Are you saying that AI, which has been proven to cure diseases, solve our hardest math problems, write complex computer code and generate entire generated worlds and HD video from a simple prompt would somehow be like, my bad, I guess I can't design chips?

by smokel23 hours ago|

https://en.wikipedia.org/wiki/List_of_unsolved_problems_in_m...

[-]

> solve our hardest math problems

We're not quite there yet :)

by dofm23 hours ago|

[-]

> Why is that a bold and unlikely claim?

Because they could have offered even slightly more evidence.

by cess1123 hours ago|

[-]

Because then they'd likely have stfu and outperformed Intel, Nvidia and AMD, or at least one of them.

They're burning more cash than pretty much anyone else and doesn't have anything public that looks like a matching revenue stream so they probably need one very badly.

by nullsanity23 hours ago|

[-]

[dead]

by nixon_why691 days ago|

[-]

There is a lot of verilog out there, it's pretty feasible that they had AI assistance writing more to design their chip.

It doesn't have to be revolutionary, it could just be AI-assisted design and lined up well enough with their operations for a custom ASIC to be worth it.

by KeplerBoy1 days ago|

[-]

Also there's some much boilerplate around everything. Writing a testbench with codex is extremely feasible. This is the kind of verifiable feedback loop the agents shine at.

by u1hcw9nx7 hours ago|

[-]

Written with AI is the new written in Rust. Both are nonsensical statements and tell noting about the quality of the software.

Without context, both are warnings about the quality of the developers.

by blitzar22 hours ago|

[-]

> the use of email, spam filters and spellchecker to accelerate parts of the design and optimization process

honestly you don't realise how much more efficient it is until you are stuck using the wrong flavour of outlook, the spam filter breaks or sloppy spelling, punctuation and grammar force you to clarify details needlessly.

by nickvec22 hours ago|

[-]

I feel like "the use of OpenAI models to accelerate parts of the design and optimization process" just means that engineers were using ChatGPT to sanity check their designs and suggest potential optimizations, though that's just my take (and I'm quite cynical about AI marketing in general!)

by Kiro9 hours ago|

[-]

I think this kind of "hard work" is a perfect fit for AI, and something where the complexity for a human is incorrectly extrapolated to LLMs.

Tirelessly wading through heaps of specifications and documentation with very clear goal definitions is hard for a human but easy for an AI. Meanwhile, taking UX and edge cases into account in a business application is easy for a human but hard for an AI.

by SCUSKU21 hours ago|

[-]

My girlfriend works at Broadcom doing chip design, and based on what she's told me they JUST got claude code like 3 weeks ago, so I really doubt this means anything beyond them vibe coding some scripts or something...

by 17 hours ago|

[-]

deleted

by 22 hours ago|

[-]

deleted

by figassis23 hours ago|

[-]

VHDL, VLSI are well documented languages, with well build test and verification frameworks and harnesses. Even just by iteration you could get there if you have the money to pay for it.

by FanaHOVA1 days ago|

[-]

NVIDIA already designs most of their chips using AI. Why would you assume it's meaningless marketing?

by fecal_henge23 hours ago|

[-]

Perhaps because they are suggesting what they are doing is novel.

by DoctorOetker22 hours ago|

[-]

novel to whom, the reader or the industry?

something can be non-novel in the industry, yet novel to the reader, at which point it is useful ... for such readers.

by nullsanity1 days ago|

[-]

[dead]

by seydor23 hours ago|

[-]

realistically, how hard are AI accelerators to design?

by WithinReason8 hours ago|

[-]

The hardware? Not too difficult, there are dozens of startups. The software? Only NVIDIA could do it so far sufficiently well.

by sentinalien5 hours ago|

[-]

How many profitable startups are there?

by WithinReason5 hours ago|

[-]

0 because they lack the software, not the HW. The HW works and is relatively easy to make.

by therealcamino14 hours ago|

[-]

Uh, pretty hard?

by HarHarVeryFunny22 hours ago|

[-]

I would assume they've already made as big a deal of it as they can without outright lying too much. Read the rest of the press release.

FWIW, Google is now on their 8th generation TPU, having put out the last 4 generations on a 1-year cadence.

by davidpapermill2 hours ago|

[-]

> Google is now on their 8th generation TPU

Remarkable that the TPU pre-dates the attention paper. Was a solid bet on energy efficient dense matrix multiplication and has stood the test of time.

by vanyaland10 hours ago|

[-]

[dead]

by napierzaza19 hours ago|

[-]

[dead]

by xnx23 hours ago|

https://deepmind.google/blog/how-alphachip-transformed-compu...

[-]

AlphaChip is what a chip design with AI is. I'm very suspicious that OpenAI has anything like this or they would be bragging about it.

by shellcromancer1 days ago|

1. https://www.investing.com/news/stock-market-news/openai-unve...

[-]

Probably obvious but still omitted in the OpenAI post: chips are being made by TSMC [1]. Wasn't sure if Intel got it.

by HarHarVeryFunny1 days ago|

[-]

I just read a claim on Twitter that the reason these companies (Google and Amazon as well as OpenAI) are using Broadcom isn't just for design expertise, but because Broadcom have allocation agreements in place with TSMC and the memory manufacturers.

by alephnerd1 days ago|

[-]

Most design partners have allocation agreements. The thing is Broadcom is an absolute GIANT in the ASIC design space, and it's closest competitor Marvell is a fraction of it's size.

There are a lot of large tech companies that most of HN has never heard about that completely dominate entire segments.

by yieldcrv1 days ago|

[-]

[flagged]

by ahartmetz1 days ago|

[-]

...and because most hardware sales except AI accelerators are down due to RAM prices, Broadcom probably can't otherwise use their allocation at TSMC.

by NavinF21 hours ago|

[-]

Nope, not down. "total Personal Computing Device (PCD) market — comprising traditional PCs and tablets — posted 2.8% year-over-year growth in Q1 2026, with combined shipments reaching 103.3 million units. PC shipments grew 3% YoY with 65.6 million units" https://www.idc.com/promo/pcdforecast/

Q2 is forecasted to be negative, partly because of RAM prices like you said, but for the most part this is something that only price sensitive nerds care about. Broadcom sells a ton of server chips. Server sales are up 30% vs last year so I highly doubt they're desperate to use their allocation

by ahartmetz19 hours ago|

[-]

I was actually thinking of smartphones first because they seem to be the best-selling "personal computing devices" (different definition from IDC) and come with a lot of RAM (8-16 GB or so? Mine has 12) these days. And there I confused Broadcom with Qualcomm - Qualcomm's biggest end customers seem to be smartphone buyers.

I thought of PCs second since most chip manufacturers make some thing or another that goes into them (Broadcom probably more than Qualcomm), and yes it's very suprising that PC sales don't seem to be down yet.

by gpm20 hours ago|

[-]

According to your own source

> the full-year 2026 [PCD] outlook has been revised to −10.4% year-over-year

because

> erosion of consumer purchasing power amid regional inflation and currency volatility in many key markets, compounded by memory and storage shortages that are proving more severe than anticipated in the previous forecast cycle.

The positive Q1 YoY growth

> was largely the product of pull-forward demand, as both consumer and commercial buyers accelerated purchases ahead of anticipated price increases and limited product availability.

The idea that only nerds care about the cost of things is... absurd.

by indigo94510 hours ago|

[-]

> The idea that only nerds care about the cost of things is... absurd.

For hardware purchases, laypeople may go about it the other way from what nerds would do: instead of deciding what they need in terms of computing power and memory, and then finding a cheap offer for that, they just decide how much they want to spend, and then buy a device at that price point irrespective of its performance characteristics. If you shop like this, and would have purchased anything but a rock-bottom low-end device two years ago, prices have remained stable.

by gunalx7 hours ago|

[-]

But noe you van only afford the rock bortom low-end device.

by a_conservative1 days ago|

[-]

I recently put 2+2 together.

Broadcom has become wealthy by being Google's TPU hardware partner, including sharing their TSMC capacity with Google, and evidently now they are doing the same thing with OpenAI. What a brilliant way to take advantage of the AI gold rush!

I wish they weren't using their piles of money to extort money out of the software industry like they are with VMWare and Bitnami.

by kccqzy21 hours ago|

https://finance.yahoo.com/sectors/technology/articles/broadc...

[-]

Well Google has reduced reliance on Broadcom already. They found a new hardware partner, MediaTek, that’s probably much, much cheaper than Broadcom.

by mschuster9120 hours ago|

[-]

> Well Google has reduced reliance on Broadcom already. They found a new hardware partner, MediaTek

Oh dear god. I'm actually feeling sorry for Google at that point. Good luck, you'll need it...

by kccqzy19 hours ago|

[-]

My hunch is that this change is driven by bean counters.

by amelius6 hours ago|

[-]

Who says Google isn't doing its own designs mostly?

by kccqzy5 hours ago|

[-]

Oh they definitely are. But as a transitional step, replacing Broadcom with MediaTek is probably mostly about cost.

by rasz17 hours ago|

[-]

MediaTek was spun out of UMC. UMC was a powerhouse of ASIC design.

by alephnerd1 days ago|

[-]

> Broadcom has become wealthy by being Google's TPU hardware partner...

Kinda, but not exactly.

Broadcom cornered the enterprise infra and security market in the late 2010s and early 2020s after acquiring CA Technologies, BMC (EDIT: Did NOT acquire them, they were considering it back in 2018 but decided against it and KKR ended up acquiring them), Symantec (which they bought instead of BMC), and VMWare and were able to make a strong cybersecurity story during the late 2010s cybersecurity and SaaS boom.

That gave them plenty of cashflow that helped subsidize their hardware business when hardware was not viewed as hot as it is today.

Additionally, Broadcom is GCP's marquee customer and has been for a little under a decade so they were able to make a sweetheart deal where all that software businesses at Broadcom would be exclusively using GCP and in return GCP would working with Broadcom to design it's silicon and source infra needed for their DC buildouts.

Ironically, the DoJ blocking Broadcom's acquisition of Qualcomm was the best thing it ever could have done for Broadcom, because it gave Broadcom the dry powder to dominate the Enterprise SaaS and build a strong niche in the cybersecurity space.

> piles of money to extort money out of the software industry

From personal experience, executives and leadership who started off in the electronics and hardware industry are much more vicious and cutthroat than their peers who started in software.

Working in an industry that historically had to deal with high commodification, low margins, and long tail sales leads to leadership that can execute. Additionally, no one climbs the leadership ladder without having spent years as a line-level engineer, but that's true for software as well to an extent.

Edit: can't reply

> Did they acquire also BMC?

Nope.

Broadcom was considering acquiring them in 2018 but decided not to go through with the opportunity and KKR jumped in.

by vb-84481 days ago|

[-]

Did they acquire also BMC?

by a_conservative1 days ago|

[0] https://www.goodreads.com/book/show/66863.Only_the_Paranoid_...

[-]

Good information, Broadcom is a playa, lots and lots of acquisitions! (a quick google search turns up a very eventful history for Broadcom)

> From personal experience, executives and leadership who started off in the electronics and hardware industry are much more vicious and cutthroat than their peers who started in software.

Only The Paranoid Survive is quite a name for a management book. It implies surviving in the world you are speaking about.

by nickpinkston1 days ago|

https://www.reddit.com/r/singularity/comments/1r9frzk/taalas...

[-]

This is very cool to see - seems like soooo much efficiency waiting to be unlocked at the chip level.

What's everyone think of Taalas?

They're actually burning the LLM model into the silicon, with some onboard memory for fine-tuning. They claim huge cost / latency wins.

Super fast demo live at: https://chatjimmy.ai/

https://taalas.com/

by jsenn5 hours ago|

[-]

Their demo is almost unbelievably fast, but as I understand it, the limitation of Taalas's strategy is KV-cache. This grows with context length, so either needs to be stored in SRAM (small) or streamed in (slow). Even for a tiny model like the Llama 8B they have in their demo, the KV cache will be ~64kb per token at 8-bit quantization, so at a 1,000-token sequence length you are already at 64MB of SRAM for a single user. This is probably why their demo only lets you generate 1,000 tokens: they can't go beyond that without slowing down inference.

So I'm curious what their strategy is. It seems to me that the options are: 1. Target smaller usecases that can live with a tiny context window 2. Use huge amounts of SRAM (at which point they look like Groq or Cerebras) 3. Make it up with extreme KV-cache compression/quantization 4. Run linear-attention/sliding window attention models

Other commenters have mentioned robotics as a potential application, which sounds interesting.

by kccqzy21 hours ago|

[-]

> seems like soooo much efficiency waiting to be unlocked at the chip level

Well if you are exclusively using GPUs that are general purpose, of course you leave so much efficiency on the table. That’s why Google started making TPUs more than a decade ago. I remember that kerfuffle when Google fired Timnit Gebru when Gebru’s paper used GPUs to calculate the environment impact of LLMs while ignoring the efficiency of TPUs; this basically made Jeff Dean very angry due to that wide efficiency gap.

by redox9916 hours ago|

[-]

These NVIDIA GPUs aren't general purpose in the way that you think. They can't even run games. Nvidia blackwell is probably slightly more efficient than TPUs for training. Do you really expect a 4 trillion company with the majority of its revenue being AI for some years now, not to have built its flagship product fully around AI? The GPU name stuck around, but they are pretty terrible at graphics.

The real efficiency win in these chips is that they are made for inference only. You can throw away the vast majority of a chip if you only need a few ops, a single precision (like INT8 or FP8) and don't need ultra fast interconnects.

by jacques_chester21 hours ago|

[-]

That ... wasn't the kerfuffle

by janalsncm17 hours ago|

[-]

She wrote the stochastic parrots paper.

Google’s internal review blocked it from publication. Stated reasons were about paper quality. You can speculate whether that was the real reason.

Gebru issued an ultimatum email and said she would resign if some list of conditions weren’t met.

Google said “thanks, we accept your resignation”.

She claims it is retaliation, but it seems more like an own-goal if you ask me. She basically handed Google the solution to their problem.

Practical lesson: don’t tell your employer you might quit before you’re ok with leaving.

by Herring18 hours ago|

[-]

It kind of was. I really hate gaslighting, but GP is not inaccurate. Google claimed it did not meet their bar for publication because it ignored recent research on how to reduce the environmental and bias-related risks of LLMs. On the other hand, a large org is unlikely to subsidize high-profile research that makes it look bad. And Gebru was critical of Google’s internal culture and diversity efforts…

by qnleigh10 hours ago|

[-]

I haven't read any of these papers, but given the environmental impact of LLMs in 2026, it seems like Timnit Gebru has been thoroughly vindicated...

by Catloafdev1 days ago|

[-]

It'd be cool to see more of this type of thing, but I have to imagine the ability for it to be updated to a brand-new model as new models come out is limited. If that is the case, it's going to be an extremely hard sell.

by NitpickLawyer23 hours ago|

[-]

> extremely hard sell.

It really depends on the pricepoint at which they can get a board. If they can do a ~32B model for 1k$ and a size of an external HDD, I'd buy one now, even knowing that it won't be upgradeable / the model remains fixed. The speeds they've shown are a quality of its own, and there's plenty you can do with such a model and faster than instant responses.

by nemonemo22 hours ago|

[-]

Maybe in 10 years when the tech matures, but IMO now seems a bit too early to have a tech like this. It is like intelligence without evolution or progress.. yes it can be used in some niche markets, but difficult to be generic.

by throwthrowuknow20 hours ago|

[-]

There are plenty of applications that would be useful right now. Specialized models for tool use, like fine tuning for command line tools that are already well established and don’t change often. I’m sure there are many areas where the training data is essential crystallized and unlikely to change. Think of them more like delegated agents or coprocessors that another model could route to so instead of routing to a quantized or lesser model it could use a full fidelity model that is faster, almost instantaneous.

by runeks6 hours ago|

[-]

If performance per watt is 100x better than GPUs (as GP link claims) then I don't think it's a hard sell at all. That's actually a cost reduction that matters.

by empath7523 hours ago|

[-]

You don't need SOTA models for all tasks, and being able to do more routine tasks at something like 10% of the cost and 70x speed unlocks LLM use for things that are just unthinkable now (bulk classification tasks, real time speech interaction, etc)

by wongarsu23 hours ago|

[-]

A hard sell right now. The rate of change will slow down

by gpm23 hours ago|

[-]

Yes, but with current architectures world knowledge is baked into the weights. We might stop figuring out how to make models better, but the world keeps changing, science is going to keep making progress at understanding the world, etc. This creates a significant minimum rate of change and I'm pretty skeptical that it's worth baking weights into silicon as a result.

by Micrococonut21 hours ago|

[-]

I think it would just be an opportunity to sell another chip a few years down the line. If the utility curve flattens out on the performance of models I can see a future where you are buying an up to date chip every few years to upgrade to the latest and greatest, while providing up to date context as part of the user input. Like if I have a programming task and I supply a copy of up-to-date documentation alongside my input, I would think that I could still get good output out of a dated model.

by Chu4eeno23 hours ago|

[-]

That's why we have reasoning/CoT LLMs that can use tools to get updated information.

by post-it16 hours ago|

[-]

This already isn't the case for the popular models. The knowledge baked into the weights tells the model how to talk and reason, but for world knowledge they do a web search right off the bat most of the time.

by cruffle_duffle22 hours ago|

[-]

I mean it just depends on the price of the chip. You might just replace the chip like you would any other component. Like a video game cartridge or something.

by ianm21822 hours ago|

[-]

What makes you think that? The rate of change seems to have been increasing and there is so many chip and model best in different directions at the moment.

by cmrdporcupine23 hours ago|

[-]

I think the model they chose is out of date and hard to sell, but there are plenty of use cases where today's dumb small models are fine. A Qwen 3.5/3.6 or Gemma 3 model on silicon at those speeds would be genuinely world changing even if it's only 1-3B params. Such a model at those speeds will remain extremely useful even over a 5-6 year timespan, I think.

If you consider the places you could deploy it -- with no network access, and at those high speeds... very useful .. for adding vague "common sense" fuzzy thinking to all kinds of applications that right now piss consumers off with poor UX. Esp if the model can do voice-to-text and text-to-speech well (some of the smaller models can)

by crote23 hours ago|

[-]

I wouldn't be surprised if "fast, cheap, dumb" end us being the market for LLMs.

The state-of-the-art models aren't at "can fully replace knowledge worker" levels yet and I doubt they'll get there any time soon, so charging $2000 / month for access isn't going to happen. Right now everyone and their dog is being handed subsidized credits to play with AI, but the actual outcome is rarely good enough to be worth the money they'd need to charge for it. It might very well take another order of magnitude or two to get LLMs to be truly good (if it is even possible at all), and considering how much money is already being pumped into it I just don't see that happening.

On the other hand, the dumb models are more than adequate for simple noncritical tasks, like directing a user to the appropriate FAQ entry, or playing phone decision tree. There's a lot of money in making chatbot assistants actually useful, or in augmenting website search. Turning it into a glorified "language-to-API-call" translator doesn't take a lot of smarts, but as long as it's cheap you can make a killing in volume.

by wwweston22 hours ago|

[-]

> On the other hand, the dumb models are more than adequate for simple noncritical tasks, like directing a user to the appropriate FAQ entry

This is a lane I’ve been experimenting in —- seeing what I can get out of models that work in 16GB VRAM for simple tasks (screen scraping, decision tree navigation, natural language queries). It’s interesting for sure (certainly reveals non-deterministic limits) and promising for low criticality review-opportunity tasks, but I also feel like I need better sources/community for understanding and reflection. Preferably those that aren’t hype channels. Any pointers?

[-]

> I think the model they chose is out of date and hard to sell

I understood it as a proof-of-concept, not a for-mass-production single blueprint - i.e.: "if you need your NN in a CIM form on ASIC, we can do it".

Their next proof-of-concept was said to be meant to be about size: "we showed you we can do it with 8b, now we are working to show you we can do 24b or 32b". Then, "and we plan to go bigger and faster".

> Our second model, still based on Taalas’ first-generation silicon platform (HC1), will be a mid-sized reasoning LLM. It is expected in our labs this spring and will be integrated into our inference service shortly thereafter. // Following this, a frontier LLM will be fabricated using our second-generation silicon platform (HC2). HC2 offers considerably higher density and even faster execution. Deployment is planned for winter (19 Feb 2006)

by martythemaniak23 hours ago|

[-]

In a chatbot, 17k tok/s is a neat but nearly useless showcase. In a coding agent it is a meaningful improvement. In robotics, it could be an absolute revolution.

8B models aren't useful in general, but for specific use cases they can provide an enourmous amount of intelligence - nVidia's Tesla/Waymo competitor is a 7B LLM with a 2B diffusion model, and running that at those speeds could be an order of magnitude cheaper than existing solutions.

by hadlock23 hours ago|

[-]

17K tok/s is approaching realtime motor cortex needs for a robot with ~12 actuators (bipedal humanoid) and an IMU. I don't know how many parameters a motor cortex would need but 8B feels like it is within 2 orders of magnitude.

by nok22kon23 hours ago|

[-]

this is an LLM, not a motor cortex. it will output commands as text (json, ...), so comparing size is not very meaningful, especially considering neurons are highly complex and likely requires thousands of artificial simple neurons (weight+bias)

by yunwal21 hours ago|

[-]

There's nothing about Taalas that is specific to an LLM

by cruffle_duffle21 hours ago|

[-]

Bumping the speed of these things would be more than meaningful. It would be a massive game changer.

I assert like 80% of this “multi agent parallel workflow” business is simply a workaround to models being soooooo slow. Like as the dude driving these things… you kick it off and twiddle your thumbs waiting minutes to hours sometimes for all the inference and token generator to finish. So you dispatch multiple workstreams in parallel to be more efficient.

I assert that if the model was even 10x faster we’d be using these things radically different. You’d be doing things that are currently time prohibitive. At 100x, holy shit will software dev get crazy. You’d be kicking off hundreds of parallel workers attacking a problem from every angle and stuff. Who even knows!!!

And the thing is, 10x will absolutely come and probably even 100x. And it will be sold like a video game cartridge or something depending on how the actual model gets “baked” into the hardware. No remote inference at all.

by Imustaskforhelp23 hours ago|

[-]

Could you give me some example how in robotics it can be an absolute revolution?

My understanding is that robotics doesn't really rely much on LLM's in the first place but rather other things.

Is the thing that you are suggesting that it would ingest all real time data and then reason through it at an incredibly fast speed and then act on it and re-iterate? I might imagine some problems with this though I am not a robotics engineer and perhaps someone who deeply understands this topic can give more information.

by nok22kon23 hours ago|

[-]

LLM are very good at looking at images and reasoning about them. much more than just object recognition/segmentation, they can explain the physics in the image, the intents, plan actions, ...

by Chu4eeno23 hours ago|

[-]

That's because of posttraining optimizing for benchmarks that test that.

They tend to collapse into nonsense and hallucinations pretty quickly if you move slightly out of the envelope of the current visual reasoning benchmaxxing.

by martythemaniak23 hours ago|

[-]

Disclaimer: I'm a robotics noob, but I've been working on robotics for a few months now.

I'd say virtually all robots you've seen in the real world today rely on classical approaches - you build a rudimentary map, then use classical algorithms to find paths/do area coverage. The robots do no reason or understand what they're looking for, they're more like in-game units. At most there's some bounded, lightweight image classification going on.

LLMs can understand and reason about the world natively. nVidia has a Tesla FSD/Waymo competitor which simply their 7B reasoning LLM but instead of outputting tokens directly, its outputs are fed to a 2B diffusion model that outputs 1.6 second long trajectory for the car, and this is enough for an L2 system. But to make this work, they need the model to run at 10Hz, so they use super high-end hardware to do it (Jetson Thor) and the car is still "blind" for 100ms at a time (they have a parallel classical safety system).

With on-chip LLMs you could run this loop at like 100Hz on a chip that costs a few hundred bucks, rather than 10Hz on a board that costs several thousand.

by typ15 hours ago|

[-]

Low latency is nice. But it would be more interesting if they could demonstrate the efficiency of energy consumption.

by flumes_whims_4 hours ago|

[-]

Tokens/seconds and watt-hours seem related?

by rebeccajae21 hours ago|

[-]

It seems technically interesting, but they seem very sparse on details. I don't know if I like the idea of a single unchanging model forever on a chip. How much more expensive would the silicon be if they used rewritable ROM for the weights? Such an arrangement would permit fine-tunes of the model it was designed for, which might minimize concerns about the model becoming outdated.

[-]

There is no memory storage of weights in the Taalas cards but translation of the weight multiplier into a circuit.

by dcchambers22 hours ago|

[-]

I think hardware like this is the future for LLM-providers once we reach a point where the models aren't advancing much any more. You could argue we're close now.

The hyperscalers like AWS will made great use of these to serve up models that will be relevant for several years. But right now, we're still seeing significant bumps in model quality every couple of months - especially with open-weight models like Deepseek/Kimi/GLM.

Until that point, though, I don't see how this is ever going to be cost effective vs general purpose hardware.

I also think we'll see miniature versions of this baked into mobile hardware for super fast and efficient on-device LLMs.

by WASDx21 hours ago|

[-]

I see only these two possibilities:

1. If LLMs keep improving, burning models onto silicon becomes obsolete too fast and is not worth doing. Outcome: We keep getting better LLMs. 2. If LLM improvements slow down, they will be burned onto silicon. Outcome: We get faster, cheaper and energy-efficient LLMs.

Either way sounds great to me. It will certainly be a mix so we can even get both.

[-]

deleted

by londons_explore22 hours ago|

[-]

I wanna see an inference chip where the weights are part of the rom of the chip.

There would be 1 multiplier per weight (and since they're constant, the whole thing turns into a bunch of simple adders), and the total pipelined system throughput would be one token per clock cycle.

That means you can probably have millions of users simultaneously using a single bit of silicon, with perhaps 500 million tokens per second coming out the output bus.

Downside is this chip would be huuuuge - a whole wafer.

Wafer level faults probably won't matter though - neural nets are resistant to a few missing or wrong weights.

Due to the speed the industry moves, you'd want to race from model weights to production super fast, make 50 wafers, use them for a year, then bin them when that model is obsolete.

by sometimelurker22 hours ago|

[-]

this appeared some time ago, https://taalas.com/, but I'm sure there's others thinking these same thoughts. this would be best for small models imo, nothing frontier because that changes too fast

by 1e1a21 hours ago|

[-]

you can try it out here: https://chatjimmy.ai/

by Meetvelde16 hours ago|

[-]

that's so fast it feels fake

by the_sleaze_16 hours ago|

[-]

13,789 tok/s

Well I've gotten one of those "holy fuck this is the future" deeply unsettled anxious feelings in my gut again. It's been a week or 2, it was time.

by froh12 hours ago|

https://news.ycombinator.com/item?id=47103661

[-]

i only found one discussion of the tech here on HN

by agazso11 hours ago|

[-]

It's indeed super fast, but the output is complete BS hallucination. Not sure what's the value of this.

by runeks6 hours ago|

[-]

It's a proof of concept that it's possible to etch a neural net into a chip and get massive performance (and efficiency) boost

by Smaug12322 hours ago|

[-]

By the way, you've seen Cerebras? It's not gone as far as what you described - loads of cores and RAM but you still load up the weights onto it as software and they need to be streamed into the chip for large models - but it is a whole wafer.

by trouve_search21 hours ago|

[-]

Cerebras is a whole lot of SRAM, basically a ton more L1/L2 cache, hence increasing throughput.

They're pretty supply constrained right now though and their production costs seem prohibitive.

The interesting players at the moment are from Toronto: taalas (print the model onto the silicon) and tenstorrent (dataflow programming based hardware)

by londons_explore20 hours ago|

[-]

There is a huge downside to weights being modifiable - it means you need to have multipliers (not simply adders), and SRAM to store those weights.

I suspect for equal performance, that's probably a 5x increase in silicon area (and therefore cost).

by phkahler21 hours ago|

[-]

>> I wanna see an inference chip where the weights are part of the rom of the chip.

I've been wondering about that for a while now. For a lot of tasks putting weights in ROM is probably OK. OTOH:

>> There would be 1 multiplier per weight...

I'm not sure that is a good idea. Maybe if its quantized down to 2 bits... Otherwise maybe a small ROM near each multiplier (or row of them or whatever) so the multipliers could handle N distinct matrix operations without having to move the data from far away.

Another fun thought is to have a row of MAC units on DRAM so a DRAM row would be a vector. Row size might be 64Kbit or 8K weights if they're 8bit. This also keeps the weights and calcs on the same chip. I'm not sure this would put enough multipliers on one chip though. Systolic arrays can have tens or hundreds of thousands each doing one op per clock cycle.

by cyptus21 hours ago|

[-]

analog chips could also be very interessting instead of using digital signals and processing them against the weights in the ROM. I have no idea if that scales with such big models though.

[-]

The drawback is in keeping signal fidelity (e.g. dissipation, temperature etc.) and in the conversion between analogue and digital.

Nonetheless, yes, there are already implemented solutions for small NNs (I understand mostly acting as triggers).

by whazor5 hours ago|

[-]

You don't need a single wafer, you can split the model into many smaller different chips and connect inputs/outputs.

Skip VHDL and directly go for GDSII / OASIS. Try to find similar vectors so you get re-usable blocks.

You can dynamically calibrate a chip by fine tuning output.

by freakynit14 hours ago|

[-]

This may be extreme, or, completely stupid, but, why are we not using genetics to "grow" chips in a chemical soup yet? Similar to Verilog/VHDL, don't we have some similar language to express circuits using gene sequences?

by marcosqanil10 hours ago|

[-]

I've worked for one of Europe's biggest synthetic biology labs and I know lots of biologists are low-key interested, but current players in semiconductors see it as kind of a tarpit.

IBM used to have a program using DNA origami for lithography back in 2009, which makes sense as lithography masks are a pain to make. I really wish I know why the program was stopped, but most of the researchers are retired by now.

As to whether you can just "grow" the whole chip from scratch, the answer is probably, but it would require lots of non-trivial scientific discoveries. For instance, we can't really make sizable chips using DNA without horrible defect rates. Biology is much better at making redundant rube goldberg machines, than very precise machines with no tolerance for errors.

I think we'd have a better chance of success if we made very weird kinds of chips that better took advantage of the medium, perhaps even something that we "train" rather than just use out of the box.

I'd love it if anyone here knew more about this !

by freakynit8 hours ago|

[-]

Would it be comparatively easy to make neuromorphic chips instead of traditional chips? I believe probabilistic algorithms like those employed by LLM's must be more tolerable to working with defects as well..?

by whalee12 hours ago|

[-]

We lack robust frameworks for 'forward engineering' stochastic thermodynamic computation over molecular free-energy landscapes (which is basically what a "chemical soup" is doing) like we do for analog/optical/digital computing. This is why, as a field, medicine is so heavily empirical and reverse engineering oriented.

by freakynit10 hours ago|

[-]

Man... I had to chatgpt your comment just to understand. But I do now.

Basically, unlike current chip manufacturing process where every stage is deterministic and precise, the soup-world, the chemistry, is not. And we do not have accurate enough models to handle them in deterministic way, or, model them precisely.

My respect for nature's engineering just shot up by 10 times more.

by AceJohnny213 hours ago|

[-]

Are referencing the 1998 short story "Taklamakan" by Bruce Sterling?

by freakynit10 hours ago|

[-]

Thanks.. just looked it up. Seems super interesting.

by fallat14 hours ago|

[-]

Do that at scale

by freakynit13 hours ago|

[-]

Bacteria do that at scale, far far bigger than all chips combined. All it takes is chemical soup and a few starter seed dna's.

by fallat4 hours ago|

[-]

Ah, so we're not talking creating full on brains after-all?

by voidUpdate10 hours ago|

[-]

> "Downside is this chip would be huuuuge - a whole wafer."

Why don't we have chips like that? If a CPU the size of a postage stamp can do x amount of performance, imagine how much performance you could get if you used an entire wafer of chips running in parallel. Obviously there would be certain use cases, like you couldn't fit an entire wafer in a phone, but still

by ngomez10 hours ago|

[-]

Using the space of an entire wafer for one chip would result in extremely low manufacturing yields. Even with state of the art silicon cleanrooms, there will still be defects in parts of the output.

With CPUs and GPUs, chip makers can disable faulty cores and bin them as lower SKUs to get some yield out of it. But if you're using an entire wafer to embed weights, and a speck of dust causes a printing defect that makes the weights wrong, the entire wafer is worthless.

by voidUpdate7 hours ago|

[-]

Do failed wafers have to go in the trash, or can you recycle them?

by Jyaif6 hours ago|

[-]

What's the difference between disabling faulty cores and disabling the parts of the wafer that have defects?

by RussianCow2 hours ago|

[-]

I'm not an expert, but I think those are the same thing. But for an LLM etched onto a whole wafer, it doesn't make sense to disable part of it since that would remove some weights entirely.

by cactusplant73747 hours ago|

[-]

Is that defect easy to detect?

by kimsey08 hours ago|

[-]

We do. The Cerebras line of Wafer Scale Engines is exactly an entire wafer of cores running in parallel with fast memory next to each one. It's intended for very high throughput LLM inference. https://www.cerebras.ai/chip

by WithinReason5 hours ago|

[-]

One token per clock cycle at 1B parameters would imply 2 ExaFLOPS, consuming about 10 KWs

by 22 hours ago|

[-]

deleted

by yuriyguts21 hours ago|

[-]

I've also been thinking about this. Although the forward pass of a transformer model also involves some heavier operations like normalization, reciprocals, exponentiations or other non-linearities (GeLU, SiLU) which may (though typically don't) involve learned weights as operands.

by Salgat18 hours ago|

[-]

Supposedly memristors would be ideal for this (and it would be reprogrammable), but then again, memristors seem to be the carbon nanotubes of the computing world.

[-]

> weights [as] part of the rom of the chip

Not really that: you are pointing to Compute-In-Memory (CIM) - techniques where the data (here, a multiplier value) is part of the processor (here, the multiplying circuit).

The problem of "fetch and process" is bypassed completely architecturally: the data is there where the processing happens - it's not moved, there is no latency.

by zkmon21 hours ago|

[-]

firmware upgrade would mean flashing a huge BIN file.

by HDThoreaun17 hours ago|

[-]

How would the pipelining work when the next token depends on the last token?

by cruffle_duffle22 hours ago|

[-]

“ Wafer level faults probably won't matter though - neural nets are resistant to a few missing or wrong weights.”

Brain science people “love” traumatic brain injury cases because it can help explore what happens when bits of the “brain wafer” get damaged. We’ve learned a lot from such things.

I wonder if people are intentionally “destroying” parts of the model weights to learn more about what happens? Like could you strategically wipe a gig of the model so it’s “all zeros” and see what happens?

I have to wonder

by zurfer21 hours ago|

[-]

This is called mechanistic interpretability. There is lots of fascinating insights already since you can do basically everything down to the neuron or weight level thousands of times. The human brain is many orders of magnitude harder to make sense of.

by sometimelurker21 hours ago|

[-]

well its actually called ablation, and its one way to do mech interp. anthriopics got a bunch of work on mech interp here https://transformer-circuits.pub/, like SAEs and NLAs

by Cantinflas21 hours ago|

[-]

[-]

Of course tampering with chunks or nodes in the NNs is a way to study the "spawned" (through gradient descent etc.) configuration and "reverse-engineer the black box" to get "AI transparency".

Anthropic published an important work around one year and a half ago.

by mdp202112 hours ago|

https://www.anthropic.com/research/tracing-thoughts-language...

[-]

> Anthropic published an important work around one year and a half ago

> #Tracing the thoughts of a large language model#

https://news.ycombinator.com/item?id=43495617 (27 March 2025)

by Computer021 hours ago|

[-]

Reminds me of Golden Gate Claude (https://www.anthropic.com/news/golden-gate-claude)

by maz1b1 days ago|

[-]

Pretty huge move. Google and their TPUs are looking infinitely more prescient as I think they are on their 7th generation, along with the offshoots it inspired like the LPU and even others, perhaps like Cerebras and their Wafer Scale Engine.

However, based off first impressions, it seems like this is meant for inference side, and not training, which is also an interesting choice.

by skeledrew1 days ago|

[-]

Training is pretty much a 1x cost, and efficiency there is already on the way down with architectural improvements. Inference though is an ongoing cost which over time takes orders of magnitude more resources, so focusing on making that far more efficient means way greater gains over time.

by ggcr8 hours ago|

[-]

With Reinforcement Learning, inference is very present in post-training stages now too

by forrestthewoods1 days ago|

[-]

Inference costs are higher than training now. I think.

Nvidia is king of general purpose training chips. But inferences can be specialized.

by lugu18 hours ago|

[-]

What makes you think this? With wider adoption the ratio shall shift in favor of inference. And API price is becoming more important than SOTA capability.

by forrestthewoods18 hours ago|

[-]

> With wider adoption the ratio shall shift in favor of inference

Yes? That’s why more money will be spent on inference than training?

I’m talking absolute cost. As the number of people using AI and burning tokens goes up the amount of spend on inference goes up.

I am fairly confident that Anthropic has way way more GPUs serving Claude Code to users than they have training models. They’ve got a lot of users!!

> API price is becoming more important than SOTA capability.

Also yes? This is why custom silicon for efficient inference makes sense!

I think we’re in total agreement here :)

by cactusplant737420 hours ago|

[-]

Cerebras's Codex Spark 5.3 has been a huge flop. Small context window and old model. But hopefully they can improve so that we can benefit from 1000 tokens/second with GPT 5.5.

by zer00eyz1 days ago|

[-]

> early testing shows that Jalapeño will deliver performance per watt substantially better than current state-of-the-art

We're starting to see what really matters here, and though this is hand wavy the TPU makes similar claims.

I think googles memo about having no moat still stands (see: https://newsletter.semianalysis.com/p/google-we-have-no-moat... if you are unaware). It kind of makes sense that all of this is looking more like 60's to 90's IBM, DEC, Cray, Sun and the hardware race that happened then. History doesn't repeat but it often rhymes and I suspect that these efforts will follow the same trajectory.

by granzymes1 days ago|

[-]

To be clear, that is not "Google's memo". It's a memo by a guy who happened to work at Google. There is a diversity of opinions at a company that employs 180,000 people.

by deweywsu23 hours ago|

https://www.computerhistory.org/storageengine/first-commerci...

[-]

With the pace of AI, and with AI helping to pave the way for faster/better AI, I keep wondering if hardware like this will become obsolete well before it has a meaningful ROI. Huge AI models can be run with less resources already through quantization and offloading, but that's just the beginning. One day, maybe not far from now, a breakthrough will allow huge LLMs (say 200B in size) to run well on an old 5 year old Dell desktop. Think that's crazy? Look at the size of the first hard drives. The IBM 350 was a disk with 50 platters, 24 inches in diameter, that held 3.5Mb, and was leased for today's equivalent of $35K.

Compare that to a multi-terabyte ssd. Now apply that improvement to how an LLM is architected and run now. With AI assisting, it won't be long before a leap occurs and these data centers with all their current ultra-cutting edge Nvidia cards are nearly obsolete overnight.

by admax88qqq23 hours ago|

[-]

> One day, maybe not far from now, a breakthrough will allow huge LLMs (say 200B in size) to run well on an old 5 year old Dell desktop.

But if you have such a breakthrough could you not also apply it and run 200T models on todays datacenters?

by pennomi23 hours ago|

[-]

That assumes scaling laws still hold up. A bigger model might end up only incrementally more intelligent.

by ACCount3722 hours ago|

[-]

They do. Mythos kicked ass while it lasted. And what we know of the scaling law curves promises us even more gains in the future.

"The future" being "whenever training and inference at increased scale becomes economical". Which is probably bounded by new generations of hardware, but might also be pushed forward by algorithmic advances.

by phkahler21 hours ago|

[-]

I think they're out of training data though...

by ACCount3721 hours ago|

[-]

Synthetics are often used for "data amplification" nowadays. Extra compute covers a multitude of sins.

by ACCount3722 hours ago|

[-]

Not only you could: you would also want to.

The likes of Mythos show that the scaling laws are real, and you can x5/x2 the total/active params and get meaningful gains. If "inference per param" gets cheaper? Up the params and get more intelligence for the same price.

by deweywsu23 hours ago|

[-]

Quite true

by simonebrunozzi23 hours ago|

[-]

Interesting comment, but the comparison with hard disk drives is probably unfair.

The IBM 350 was commercialized 70 years ago; it took 70 years for someone like you to be able to compare that to a multi-TB SSD.

Furthermore, nothing says that Moore's Law will necessarily apply to LLMs, for decades to come.

by deweywsu23 hours ago|

[-]

Very true, and all I am basing my comment on is the improvement in speed AI has demonstrated when applied to software development, and inferring it might enable a similar 10X or 100X improvement in both hardware architecture as well LLM structure and/or interface methods. If that speed improvement applies to performance of AI, that could mean the 70 years it took for people to improve storage technology might be able to be compressed to achieve a step change in AI performance in a drastically shorter timeframe.

by LZ_Khan23 hours ago|

[-]

I think Jevons Paradox and scaling laws will make this not the case. If bigger models are always better (which seems they are), then will always need high-end hardware.

by gdiamos23 hours ago|

[-]

Usually breakthroughs in computing lead to more usage of computing, not less.

by 3abiton23 hours ago|

[-]

> One day, maybe not far from now, a breakthrough will allow huge LLMs (say 200B in size) to run well on an old 5 year old Dell desktop.

I think there will be specialized hardware (beside GPUs) that would be custom made for LLMs. Yes TPUs exist, but mainly for datacenter. GPUs exist, but they are adapted from mainly graphic application. Once all the demand from data center dries up, innovation will kick in.

by andriy_koval23 hours ago|

[-]

> I keep wondering if hardware like this will become obsolete well before it has a meaningful ROI

it will build expertise/infra/know-how foundation for next generation of hardware

by dwa359223 hours ago|

[-]

True but as someone else pointed out; at that time we'd be interested in running 200T parameter model rather than 200B. Why, you might ask? Law of human laziness - a human will become as lazy as the technology allows it to. With the 200T or 20,000 T model - I'd be heavily incentivized to ask it to make the bread for me that I enjoy making now or create a movie for me (featuring myself) which will maximize the dopamine production in my brain.

by zabriel_goss23 hours ago|

[-]

I agree with you. Stepping stones are still a part of getting there, if only to be briefly useful.

by hyhatqtv23 hours ago|

[-]

Looking at the development of memory bandwidth, capacity and prices over the last 10 years there is little indication that’s likely.

by Rekindle809018 hours ago|

[-]

[dead]

by v5v31 days ago|

[-]

>designed for initial deployment by the end of 2026 and expanding in the years ahead,

So after the IPO and will be featured heavily in the IPO sales brochure as a future promise?

I'm sceptical over any pre-IPO announcements.

by estetlinus23 hours ago|

[-]

Yeah, the narrative feels like pre-IPO shenanigans, and it looks like the lid on my laundry basket. I wouldn’t be surprised if this is a con.

by Culonavirus19 hours ago|

[-]

Con or not it is an obvious thing they have to do. Might as well promise.

IIRC their biggest cost they're "hiding" in their financials by doing creative accounting is inference (putting it into marketing and whatnot, in the billions)... if they can't hide it in their S-1 then they have to rationalize it, either by a) increasing the prices (not gonna happen, with token based billing orgs are already watching their codex spends) or b) lowering the inference costs. You can lower that by "soft optimizing" (dumbing down) your models but then you have the other players breathing down your neck (see quick rise of Claude), or actually optimizing, in software and in hardware. We're like 5 years into the rise of LLMs, there's not THAT much left on the table unless you write to the metal you specifically designed for your models (and I'm pretty sure the lack of "nvidia tax" would help with covering most of the r&d costs of a custom solution, at least in the long term).

50% cheaper inference without losses in fidelity would unquestionably be a massive win for OpenAI.

by frandroid1 days ago|

[-]

Who's IPO? Broadcom and Google are already listed, obviously.

by airspresso1 days ago|

[-]

OpenAI's upcoming mega IPO

by awestroke1 days ago|

[-]

OpenAI, the non profit organization, is going to become a publically traded profit maximizing corporation

by hk__21 days ago|

See https://openai.com/index/evolving-our-structure/

[-]

> OpenAI, the non profit organization

No, the nonprofit org stays nonprofit, while the for-profit org it owns will become publically traded.

by hoherd23 hours ago|

[-]

> OpenAI was founded as a nonprofit, and is today overseen and controlled by that nonprofit.

Does anybody actually believe that?

by signatoremo20 hours ago|

[0] - https://www.bloomberg.com/news/articles/2026-06-24/openai-an...

[-]

I haven't seen this discussed here:

So far, the accelerator is showing cost savings of roughly 50% compared with typical AI graphics processing units, Broadcom Chief Executive Officer Hock Tan said in an interview. - [0]

50% cost saving. The picture changes so quickly, there are still a lot of low hanging fruits, that I find any discussion about whether a vendor has moats, or if they can recoup investment, is moot and futile.

by wmf20 hours ago|

[-]

If GPUs have 75% margin then 50% cheaper is no surprise.

by epolanski20 hours ago|

[-]

Operational costs far outweight hardware cost.

by riknos31451 minutes ago|

[-]

Let's use an example of a GW AI deployment.

At $0.07/kWh, that costs $70,000 every hour in just electricity. $1.7 million /day. $613 million /year.

I had claude estimate the GPU cost of such a deployment:

> To get racks per GW: a full NVL72 rack draws roughly 130-132 kW under full load. If a 1 GW facility runs ~715 MW of IT power (after a ~1.4 PUE for cooling), that's on the order of 4,000–4,500 racks. At $3.4M of compute hardware each, the GPU-system cost lands around $14–15 billion.

15 billion / 613 million / year = ~24.5 years til electricity costs catch up to the GPUs. Obviously electricity isn't 100% of OpEx, but I'd expect it to be the majority for AI deployments.

Regardless, if you can cut the $613 million/yr in half that's still massive savings.

by lugu18 hours ago|

[-]

Do they? Genuinely ansking.

by epolanski6 hours ago|

[-]

For a small cluster no, but at major data center level yes. Which is why they building data centers bigger than stadiums.

If you spend 10B on a data center, roughly 30% of that price is going to hardware, so roughly $ 3B.

So for two data centers you're spending 20B.

Now, assume there's hardware that performs twice as fast at same energy (watt/token), even if it costed you twice you're saving 7B because you don't need the second data center.

You get the same output of $ 20 B out of a $ 13 B initial investment, but you're also halving operational costs: less staff, less lawyers, etc, etc.

This is the reason why Nvidia is making gargantuan margins: hyper scalers don't really care about hardware cost, if they can get double the output and save themselves 30-40% of total costs and 50% of the headaches they will keep buying at twice the price gen over gen.

by npunt18 hours ago|

[-]

Yep, I was surprised to learn that too.

by Schiendelman20 hours ago|

[-]

"Typical" is doing a lot of work there. That could mean much older chips than Nvidia is currently selling.

by signatoremo15 hours ago|

[-]

"Typical" usually means typical, i.e. median. Also they are claiming cost saving, not performance. The saving would even be more impressive if much older chips are less efficient than the newer ones -- costing more to run.

by chris_money20223 hours ago|

[-]

Microsoft, Google, and Amazon also do this, but they also have the hyperscaler datacenter infrastructure to host the chips. Designing and taping out the chip is one thing, packaging, cooling, deploying, powering, and managing the fleet is another stack entirely. Wonder where that will come from?

by wmf22 hours ago|

[-]

Don't forget Stargate.

Update: Somebody on Twitter said it's going to be hosted 50/50 at Microsoft and Oracle.

by chris_money20222 hours ago|

[-]

I forgot Stargate

by cpldcpu23 hours ago|

[-]

I had Opus 4.5 design an LLM inference engine in verilog, including firmware and automated verification a while ago: https://github.com/cpldcpu/smollm.c

It's of course far from optical. But lowering the implementation through the abstraction levels turned out to be extremely powerful.

by smetannik22 hours ago|

[-]

Can you suggest some tutorials for Verilog and FPGAs in general?

I have a spare Tang Nano 9k but I don't feel confident about blindly asking Claude to vibecode me a solution and still would like to have at-least a basic level of understanding.

by cpldcpu20 hours ago|

[-]

hm.. has been quite a while for me. The good thing about the Tang Nano is that it is supported by the Yosys open source toolchain. There are quite a few resources on the web when you search for the combination.

by jared0x9021 hours ago|

[-]

the hdlbits course is really good imo

by digitaltrees1 days ago|

[-]

We’ve entered the “if you care about software, build hardware” phase of AI

by some-guy1 days ago|

[-]

I have been eyeing what Taalas is doing [1] by making pure hardware models. The speed is absurd.

[1] https://taalas.com/products/

by mikewarot1 days ago|

[-]

They talk about products, but they don't sell the hardware, thus they don't really have a product, just a service.

I know, it's nick picking, but when people can just reach in and take services away, like Fable/Mythos, hardware is the only thing worth buying.

by LoganDark23 hours ago|

[-]

I'm sure they'll have a product for you if you have millions to invest in a partnership with them.

by arcanemachiner23 hours ago|

[-]

"Nitpicking"

by digitaltrees12 hours ago|

[-]

Underrated. Hits on multiple levels

by jupr1 days ago|

[-]

crazy product. their test chatbot feels a db query.

https://chatjimmy.ai

by digitaltrees20 hours ago|

[-]

I have and it was wild. Paradoxically it made me realize that I actually like reading the stream as it's generating.

[-]

“People who are really serious about software should make their own hardware.” ― Alan Kay

by zwarag1 days ago|

[-]

What are the other phases. Or what are you referring to in general?

by digitaltrees12 hours ago|

[-]

Mainframe punch card -> PC floppy disk -> cloud SaaS -> AI --> return to the land agrarian

by dadoum1 days ago|

[-]

> May we scale smoothly, exponentially and uneventfully through A[SI]

That sentence sounds weird to me. I can't really put my finger on why, maybe the combination of adverbs, or just the fact of writing the desire of scaling as a company so directly. It feels (to me) like openly claiming their selfish goals. Or maybe I am just misinterpreting and they are referring to the whole humanity as "We" (but knowing Broadcom and in a lesser extent OpenAI doings, I am not convinced).

by kilroy1231 days ago|

[-]

I hope to see something like this, but in a small form factor like the NVIDIA spark.

I want a super fast LLM that is Opus 4.6+, like, in ability.

[-]

Memory bandwidth is the bottleneck in the Spark. If you replace the SoC with an optimized ASIC but keep the same 256-bit LPDDR5 the performance will be the same. You can increase performance by using wider memory but that's also more expensive.

by phonon1 days ago|

[-]

M3 Ultra has a 1024 bit memory bus (819 GB/s) and starts at $3,999 (96GB of RAM). It can be done....

by bigyabai1 days ago|

[-]

The tradeoff is that the M3 Ultra's GPU loses to laptop GPUs in compute benchmarks. All of that bandwidth is wasted idling for token prefill.

For inference workloads, it makes a lot more sense to optimize for prefill/ttft before maxing out memory bandwidth.

by Schiendelman18 hours ago|

[-]

With the M6 theoretically coming later this year, Apple seems to be realizing they need to catch up with more lanes of GPU.

by bigyabai18 hours ago|

[-]

Personally, I doubt it. Apple hamstrung themselves with unified SOC memory, there are cheap dGPUs that smoke the M5's prefill speeds and even have faster decode too. Apple is running up against the limitations of putting a mobile integrated chipset up against the desktop form factor. An SOC stops looking like a smart decision at that scale.

The software side is still pretty sketchy, too. Apple's ecosystem is fractured between NPU, MPS and Accelerate BLAS, with libraries like MLX and CoreML built precariously overtop. Apple has to commit to a full rearchitecture of their GPU to challenge Nvidia, which fractures that ecosystem even further.

by Schiendelman16 hours ago|

[-]

I don't expect them to be AS fast as Nvidia anytime soon. Understood that they need architectural improvements to get there.

Apple's business model will be to pay Google for compute for now, and then as they get better on device, move more and more locally. So they're very well incentivized to get better. The thing they've been best at in the last 19 years has been spinning flywheels they already have, and this is exactly that.

by bigyabai16 hours ago|

[-]

I'm just genuinely convinced that Apple's AI flywheel is going in reverse. Their killed their golden goose with OpenCL, which had a genuine shot at dethroning CUDA if Apple took it seriously. It had industry-wide buy in and multiple implementations before Apple threw in the towel. When they designed Apple Silicon, they could have used the lessons learned from that experience to create a CUDA-like ALU layer instead of focusing on raster efficiency for their GPUs. Nvidia had proven that it was possible with low-power ARM SOCs like Jetson and Tegra which did deliver CUDA in handheld experiences. But Apple chose instead to delegate AI to the NPU, which is now dark silicon on devices that defer to MPS backends for most inference. The architecture is locked in to an expensive and suboptimal raster-first GPU design.

It's not hard to see why Apple made those mistakes, and many of them were made by the rest of the industry too. It's specifically tragic that Apple snatched defeat from the jaws of victory with GPGPU programming, and it makes me think that their future will be more subscription services and less half-ass technical efforts. Or they rip up the foundation and start from scratch, it's never too late to start work on Apple Silicon 2.

by Schiendelman15 hours ago|

[-]

I think it's easy to understand why Apple wouldn't build low level engineering solutions - they'd rather control the platform and just have developers call MLX. I'm not sure, if I was in their shoes, that I'd make the same call. But it's a call, and it's consistent with the rest of their ecosystem decisions.

by wmf17 hours ago|

[-]

I love those 128 GB dGPUs.

by bigyabai16 hours ago|

[-]

Me too! The problem is that people don't love having 128gb of DDR5 held back with a laptop-grade iGPU. It puts up strictly non-interactive speed for LLMs of that size.

When you layer those same models across 128gb of dGPUs, then you can actually fill the KV cache in seconds, instead of minutes. And you get higher memory bandwidth on most professional cards.

by smith70181 days ago|

[-]

Unfortunately Sam Altman won't be the one to deliver us at-home hardware that can run Opus-level models

by blitzar22 hours ago|

[-]

I wonder what is happening with the OpenAI / Jony Ive crossover episode.

by flyinglizard1 days ago|

[-]

Forget about it. Datacenter class hardware is getting farther and farther from desktop use. It’s not PCIe GPUs anymore.

by lifeisstillgood20 hours ago|

[-]

So I’ve been wondering about “one or two levels back” chip design. If I understand it, 28nm chips (pre EUV) is just about suitable to run (not train just inference) frontier models.

And so if I was a mid-level State would it be worth while to take my nascent chip industry and push it out to build a 28nm foundry and supporting eco-system.

The models will come but the real challenge of the future is having enough compute power for every one and every use. Even if LLMs don’t become AGI they will still be incredible tools - and as OpenAI seems to spend 8000 for each 200 monthly subscription building one’s own data centres seems sensible

by paxys18 hours ago|

[-]

You are underestimating how difficult it is even for a large nation state to attract the kind of talent and investment it would take to set up a chip industry. It is out of reach for anyone outside of the 3-5 largest national economies and a few big American/Chinese multinational corporations.

by wmf16 hours ago|

[-]

28nm chips is just about suitable to run frontier models

I doubt it. 28 nm is 4-5 generations back so inferencing would need a large number of chips with very high power consumption. Maybe you're thinking more of 7 nm which is what Chinese fabs have; it seems to be OK for companies like Huawei.

And so if I was a mid-level State would it be worth while to take my nascent chip industry and push it out to build a 28nm foundry and supporting eco-system.

It never reaches breakeven so you'd have to provide billions in subsidies per year forever. The sovereign chip stuff only makes sense for the US and China; even the EU probably isn't large enough to make it work. A single country definitely couldn't.

by eggsome14 hours ago|

[-]

But the energy requirements per token would be orders of magnitude worse than chips made at 3nm. So probably better for your hypothetical state to just pay the extra for more efficient chips so that they don't have (as much) of an energy problem.

[-]

> Even if LLMs don’t become AGI they will still be incredible tools

(Mostly an aside, but: LLMs have paved the way, now the problem is there, it is a challenge and a geopolitically relevant race... AGI is a goal set: not-having-reached-it will be just a stage.)

by bogdiyan1 days ago|

[-]

I am not sure how much of the work is done by OpenAI, or whether it is basically a Broadcom chip specifically built for OpenAI models. It is a necessary step, but building a high-performance chip is not easy. Look at companies like Groq, Amazon, and Google.

by u1hcw9nx1 days ago|

[-]

Both Google and Amazon also codesign heavily with Broadcomm (Amazon also with Marvell and Alchip)

Broadcomm does stuff like physical design, provides IP blocks, managing manufacturing process with TSMC, packaging and testing. Google and Amazon work with system architecture, performance targets, and requirements but Broadcomm as consultant.

by groundzeros201520 hours ago|

[-]

This is starting to sound like startup scope creep. Instead of making the AI model it’s now custom silicon, web browsers, and consumer electronics?

by krick7 hours ago|

[-]

But there never really was a moat in LLM?.. I mean, I don't know where you stand, but my perception is that we all kinda knew that the whole time since 2017, and really knew that since DeepSeek. What they really care about is:

1. Customer acquisition.

2. Cheap(er) electricity/hardware.

So it's really surprising to me that them making their own chip surprises anyone at all. The electricity thing is already kinda being taken care of by earlier strategic alliances with some other evil people, the chip is a natural next step.

by glaslong20 hours ago|

[-]

Definitely has that smell... At the same time though, they NEED inference cost to drop substantially, and even better for them if it only happens for their models on their hardware.

I assume they're doing everything they can to make that happen model-side, but coming at it from the other end makes sense too if they can swing it.

by guywithahat17 hours ago|

[-]

Maybe, but they’re also a massive company. At some point Google stopped being a startup and become a massive company with margins to look after

by groundzeros20155 hours ago|

[-]

After they were wildly profitable.

by brcmthrowaway18 hours ago|

[-]

Nearly all those initiatives have failed though

by theowaway2134561 days ago|

[-]

This seems like more competition for Cerebras? Am I understanding correctly?

by HarHarVeryFunny1 days ago|

[-]

This is just an uncut wafer - I don't think it's intended to be a wafer-scale chip.

Cerebras etch memory onto the wafer alongside the processing elements, but AFAIK OpenAI are going to be using HBM memory and a conventional chiplet design.

by KeplerBoy22 hours ago|

[-]

Still competition for cerebras. Seems quite unlikely they will get an OpenAI deal anytime soon.

by smsx22 hours ago|

[-]

They have an OpenAI deal right now. https://openai.com/index/cerebras-partnership/

by HarHarVeryFunny22 hours ago|

[-]

No - this is OpenAI trying to complete with Google (TPU) and Amazon/Anthropic (Trainium) on cost.

Cerebras are addressing very specific use cases, not general purpose LLM serving, and OpenAI does already partner with them.

by _boffin_4 hours ago|

[-]

My question is: what will this do to Ceberas? It validates them, did they just have their lunch eaten?

by yiyingzhang1 hours ago|

[-]

This is another Cerebras? fwiw, it took Cerebras many years to finally get a handle on the yield and the cooling problem. Wondering if they just hired a bunch of people from Cerebras.

by jnaina14 hours ago|

[-]

Two turkeys don't make an eagle.

I don't have much confidence in either OpenAi/Sama nor Broadcom, given past history. Again this is just pre-IPO shenanigans.

As credible as the "Datacenter in Space" claim by Elmo, before the SPCX IPO.

by olalonde10 hours ago|

[-]

Why even "unveil" it? Seems like giving away competitive intelligence for no reason at all... other than hyping the stock?

by mobile6test14 hours ago|

[-]

„ OpenAI says early results show significantly better performance-per-watt than current state-of-the-art alternatives“

would be very interesting to see any papers/data around this

by MangoCoffee23 hours ago|

[-]

cheap token is more important now than ever. Chinese open weight model is getting pretty good. the real cost of AI adaption will come down to who (China or US) can provide cheap token for consumers and companies. Microsoft consider DeepSeek for their cowork is an example and now OpenAI with its own AI inference chip.

by SV_BubbleTime19 hours ago|

[-]

I’m not understanding. If cost per token hits the floor that does not mean that you want a model that uses tokens.

If the Chinese are optimizing for token usage, that’s also speed.

Why use more token if few do trick?

by GL269 hours ago|

[-]

OpenAI is going to close the one thing it needs to be profitable : calculation power. Love this website : https://isaiprofitable.com/, shows who wins at the AI revolution. Nvidia wins because it has instant revenue, OpenAI is going to close that gap.

by fennecbutt1 days ago|

[-]

I mean I'd love to be able to buy something like the 17k tps taalas chip as a pcie or m.2.

Imagine when we can roar along at that speed, low power. Can just have the model reason for a while about anything and everything. It reminds me of the "race to idle" for mcus etc.

by ipdashc1 days ago|

[-]

> 17k tps taalas chip

It's odd to me that I haven't heard anything about this approach (baking LLMs/weights into silicon directly) since. It seems almost common-sense that we're going to end up there eventually. And it feels like that point is drawing ever closer now that model capabilities, if not quite plateauing out, are at least getting to a "good enough" point for a LOT of use cases.

I wonder if it's being worked on in secret, if there's something about it that makes it infeasible, or if companies are really too nervous to lock in one model like that because the next one down the line could be a huge improvement. Re. infeasability, I have heard that the Taalas demonstration chip ran Llama 3.1 8B (a pretty horrible model) and that even that took a massive amount of transistors / die area. So it might just be the case that the good models are too big to fit on silicon?

by topspin1 days ago|

[-]

I have also been thinking about this a lot, and share your belief that this is inevitable.

Taalas has a running demo here: https://chatjimmy.ai/

It's eye opening: generated an AVX-512 optimized Mersenne Twister in C in 0.076s, 13,706 tok/s. Too fast for the tok/s to be terribly accurate.

[-]

> It's odd to me that I haven't heard anything about this approach ... I wonder if it's being worked on in secret, if there's something about it that makes it infeasible

The studies and efforts are ongoing and public, and there are technical hurdles to be faced - but the relevant works go back in time quite a lot and there is heightened interest in it now.

It seems that you simply took the "hyped headlines" for the whole of the work.

by ipdashc4 hours ago|

[-]

> It seems that you simply took the "hyped headlines" for the whole of the work.

Well, yeah, that's what I'm saying. It's odd that there haven't been any major headlines (customer interest, competitors' announcements, etc) other than their initial demo. Good to hear it's being worked on though!

by mdp20213 hours ago|

[-]

Did we not play with MNIST and placed some calculated bet on NNs well before Yann LeCun started the fire with the explosive success of the Convolutional NNs?

I'd say it pretty consistently starts in the underground.

The real revolution in the context is that it /could/ be done practically - overcoming the hurdles. But for what the interest in the matter is concerned, I'd say there almost cannot be a greater interest at this stage: making NNs efficient. This must be absolutely evident, as evident it is that the separation of memory and processor is against the idea of NNs, as evident as it is that multiplication is achievable just physically.

Of course many have seen that and got on studying it. As soon as it will be optimally practical...

by coder54316 hours ago|

[-]

> It's odd to me that I haven't heard anything about this approach since.

It has only been four months since they unveiled their first prototype. I don't understand your confusion. Chip development does not happen overnight...?

Their initial blog post laid out a roadmap, so theoretically they should have another thing to demonstrate this summer.

by ipdashc4 hours ago|

[-]

In the sense of interested customers, online discussion, other companies doing the same thing, etc. Of course it takes time to get actual results, but from an outsider's perspective it's surprising that it was basically just their initial demo and that's more or less it so far. Excited to see if they come out with something this summer though!

by mdp202115 hours ago|

[-]

You are focusing on Taalas, but (specific) analogue computing, electronic NNs, compute-in-memory etc. - the field including the contextual approach - backdate to Rosenblatt.

by coder54315 hours ago|

[-]

Yes, I’m focused on the topic at hand that the person I replied to was also talking about.

The person I replied to was acting as if Taalas was ancient history. I was pointing out it has only been a few months.

by mdp202112 hours ago|

[-]

I'd say the original remark was more general («this approach (baking LLMs/weights into silicon directly) [... as if] worked on in secret») - which is salient, because when I investigated weeks ago, I found a large number of attempts to CIM and to general branching from Von Neumann architecture for the purpose of optimizing NNs implementations in HW.

Universities are studying, startups are proposing - the «approach» is under the big headlines level but quite lively. Not just Taalas, not just their way - which remains remarkable in the scene as the HW is achieved, working, online, available... and amazing.

by coder5437 hours ago|

[-]

CIM does not bake the weights into silicon. The level of optimization that you can do down to the last transistor when the weights are fixed is on an entirely different level than CIM where you still need general purpose ALUs all over the place.

by mdp20213 hours ago|

[-]

> CIM does not bake the weights into silicon

If that were the extent of the terms, then what could we call "baking the weights into silicon"? Setting parts of the circuits to determined values for multiplication is is like printing a Read-Only Memory. (And you compute at it: Compute In Memory.)

> CIM where you still need general purpose ALUs all over the place

If that were so, then why do taxonomists present analogue computing as part of CIM? Ohm's Law does not constitute an "ALU" the way you intend it.

Simply, I used CIM, "Compute In Memory", for lack of a better term - for "store data there where you modify data", for "beyond Von Neumann's separation of data storage and processor".

by coder5433 hours ago|

[-]

EDIT: It's just not even worth arguing this point, so deleting my original, much longer comment. Abstract taxonomies can claim that Taalas is CIM, but this entirely and utterly misses the point, and misses what makes Taalas' approach special. If you told a room full of chip architects to go build "CIM for AI", they would not build a Taalas-like totally specialized chip, therefore it is not sufficient, and just muddies the conversation from my point of view. People have been doing "CIM" for decades and yet I've never seen anyone build a totally specialized chip at the scale of Taalas. And yes, you can (in theory) build an analog version of any computer, so of course you can build analog CIM, but "analog compute" is not inherently CIM, so conflating the two is just confusing.

by mdp20212 hours ago|

[-]

I can't check everything right now, but for example, the divulgational from Rakesh Kumar mentiones "Analogue CIM".

And I do not get your rant about "analog computing", which has everything to do with NNs (otherwise, well, prove it): they started with that - they are basically that in fact. Analogue computing is a very great temptation since it would solve the issues of inefficiency in digital NNs. Unfortunately, it has drawbacks which are massive for big NNs. Taalas' seems to be the best compromise.

[-]

Good models will require multiple Taalas chips but Groq and Cerebras also require a lot of chips and that hasn't stopped them.

by ipdashc4 hours ago|

[-]

> Good models will require multiple Taalas chips

I guess that makes sense. Is this feasible, or does the added latency between chips kill any of the performance gains?

by wmf2 hours ago|

[-]

Using multiple chips seems to work fine for Cerebras and Groq so it should also work for Taalas. It does sounds challenging to reach >10K tok/s but latency could be below 1 us which is a small part of the token budget.

by MichaelNolan1 days ago|

[-]

The current taalas chip is for a 3.1B param model. I’m hope so much that they can get that up to the 30B range. Just imagine Gemma 4 or Qwen 3.6 at 17k tps.

by coder54316 hours ago|

[-]

Taalas' first chip is for a Llama 3.1 8B quant, not a 3.1B parameter model, to clarify.

by OrvalWintermute1 days ago|

[-]

Word of Advice for OpenAI:

Never underestimate Broadcom’s ability to shaft their own customers

- VMware

- CA Technologies

- Symantec Enterprise Security

- Brocade

- LSI Corporation

by SV_BubbleTime19 hours ago|

[-]

I don’t know. I’m kind of glad that two of my least favorite companies are working together.

by antonvs1 days ago|

[-]

CA Technologies was much worse than Broadcom in its heyday.

Three of their top execs - CEO, CFO, and head of sales - went to federal prison on securities fraud, conspiracy, and other charges. The CEO, Sanjay Kumar, who was at least partly the fall guy for co-founder Charles Wang, served 10 years.

Being acquired by Broadcom could only have been an upgrade, as strange as that may sound.

by BLKNSLVR20 hours ago|

[-]

*requires VMWare license.

by paxys20 hours ago|

[-]

Very interested to know the distribution of effort between the two companies. Is this truly a brainchild of OpenAI engineers or did they pay to white label and use a new Broadcom chip?

by satvikpendem1 days ago|

[-]

I'm assuming they used LLMs to (help humans) do custom circuit design. Even pre LLM there were various computer optimizations that didn't require humans like genetic algorithms. It'd be cool to see a paper on how they did it.

by Legend24401 days ago|

[-]

The only surprising thing about this is that they didn't do it three years ago.

[-]

deleted

by imglorp6 hours ago|

[-]

Is broadcom really the best business partner? 100,000 VMware customers might say no.

by skyberrys23 hours ago|

[-]

The new chip sounds like it's vustom made to accelerate a few specific models they really need to run fast. The advantage is it's truly and ASIC, not a xPU. There are several new startups targeting EDA tooling automation, Chip Agents is the biggest one I can think of but their are smaller players too, Silimate is one I recall. These companies are focusing on building fast AI powered tools to speed up the tape out cycle.

[-]

deleted

by kazinator23 hours ago|

[-]

There is a never ending torrent of money coming, so why not make custom chips.

Whoo ... party!

[-]

Although, custom HW has to be the focus right now - simply because we are dealing with a technology (big NNs) that are not the best match with Von Neumann architectures.

by Jyaif6 hours ago|

[-]

Broadcom will let the entire industry leverage the decade of research done for TPUs.

The AI business of Nvidia is cooked.

by mangomanai9 hours ago|

[-]

owow...what gonna be next.....thei own robot????

by BobbyTables219 hours ago|

[-]

Why the hell Broadcom of all companies?

by philjohn9 hours ago|

[-]

Because they have the skills necessary to help bring custom designed ASICs to fruition. Google uses them for their TPU's, Meta uses them for their custom ASICs as well.

by shevy-java14 hours ago|

[-]

So this mafia is driving up RAM prices. And now build their own overpriced hardware.

Either RAM prices go down, or that mafia must pay us all compensation money for this cartel build up. Why is the USA protecting this? How much does the orange man profit personally from helping drive up the prices here?

by qsxfthnkp23221 days ago|

[-]

aw shucks nvda has some spicy competition

Make sure you all use that fancy ñ

by boarush1 days ago|

[-]

They don't have true competition, what they lose out on is market share with hyperscalers, since OpenAI would have no plans to share inference hardware with any other company right now. Plus, I don't know how does NVIDIA's investment equation pans out long terms given OpenAI will be investing in more purpose built inference stack for the future.

by ismailmaj1 days ago|

[-]

they're still kings for training, though I've heard Anthropic is training now on JAX+TPU setup, so might not be a monopoly in that segment.

[-]

deleted

by fibonacci1123581 days ago|

[-]

So this is where all the memory they bought is going to.

by babelfish1 days ago|

[-]

that's not really how it works

by jonhohle18 hours ago|

[-]

If it’s really a differentiator, why announce it? Why not keep it secret and make it a competitive advantage?

by bakies18 hours ago|

[-]

Investors, I'm sure everyone's had the idea and they're doing it.

by gravypod1 days ago|

[-]

I wonder how close OpenAI is getting to using the memory they purchased. Are they planning to stack a huge amount of HBM2 into these chips?

[-]

I assume OpenAI has been buying memory and "giving" it to Nvidia in exchange for a discount.

by renoir1 days ago|

[-]

Look at the SIZE of that chip.

Cerebras stock is down nearly 20% today.

Not only is approach overlapping, OpenAI is also Cerebras's only major customer.

by tantalor1 days ago|

[-]

If you're referring to the big circle of silicon, that's a wafer, generally contains many chips (100-1000s).

by arcanemachiner1 days ago|

[0] https://www.techradar.com/pro/broadcom-and-openai-debut-jala...

[-]

The alt text of the first image describes it as the "Jalapeño inference chip".

As a non-RTFA-er. I'm assuming it's a wafer-scale chip, similar to the ones made by Cerebras.

EDIT: From TechRadar[0]: "The 300mm wafer that both CEOs are holding will generate about 50 to 60 ASICs."

by jupr1 days ago|

[-]

That made me chuckle but I guess if you have never seen one I could see how that assumption could be made.

If this photo is real I wonder what can be revealed about the approach they have taken by analyzing the architecture of what we can see.

by mdp202115 hours ago|

[-]

> That made me chuckle but I guess if you have never seen one I could see how that assumption could be made

It's more like that "wafer as a big-chip" (more formally, "WSE - Wafer Scale Engine") is now a reality (see Cerebras).

But in this case, the wafer will be split into a few dozen chunks.

by thrtythreeforty1 days ago|

[-]

For reticle-limit chips, it's on the order of 100. And less than that once you filter out bad dies.

by moralestapia1 days ago|

[-]

Everybody here knows that.

What some don't know (including you) is that the industry is doing wafer-sized chips nowadays, of which Cerebras is the flagship company.

That's why the stock movement could be related, and that is why GP wrote that comment.

by AxiomaticSpace1 days ago|

[-]

I think Cerebras stock going down could also be partly caused by the lock-up period ending today for 200k shares (page 73 of their prospectus) - https://www.sec.gov/Archives/edgar/data/2021728/000162828026...

by maxall41 days ago|

[-]

It doesn’t seem like it? Unless I am misunderstanding these Nasdaq insider trading reports: https://www.nasdaq.com/market-activity/stocks/cbrs/insider-a...

by moralestapia1 days ago|

[-]

Dang, I just checked and CBRS is in free-fall since the IPO.

Sucks, I think they're a cool company.

OTOH, I was the only person back then pushing hard during my time at KAUST (back in 2019) to buy one of their systems when they were nobody, eventually resulting in a partnership between the two.

Then I joined their online discourse, very few users, I was semi-active there but they didn't care much.

Then I came to Toronto and heard they were opening an office here, tried to get noticed several times but got mostly ignored. I asked about upcoming events several times, anything to get involved, "yeah man, maybe one day". Then they made an event during Toronto Tech Week and didn't even tell me ... idk.

I don't get schadenfreude as I still think they're a cool company.

My point is they put all the eggs in one basket (AI inference) and neglected everything else. They seem to be on shaky ground now ... sad.

by fl4regun23 hours ago|

[-]

my friend briefly worked there and then got hit by layoffs, as a result, I am enjoying the schadenfreude.

by ksd4821 days ago|

[-]

That's just the wafer disc. Looks like it was presented to Sam Altman for ceremonial purposes.

The wafer disc is what the CPU gets "printed" on.

by delduca23 hours ago|

[-]

NVidia stocks are red now

by dgellow23 hours ago|

[-]

Because of Micron, no? I don't think it's related to OpenAI's announcement

by brcmthrowaway20 hours ago|

[-]

What happened with Micron?

by dgellow10 hours ago|

[-]

The stock went down quite a lot before their latest earnings report. That dragged all semis and memory stocks down

by bluegatty22 hours ago|

[-]

'braodcom' ha ha ... it's not OpenAI's chip then ...

by duendefm1 days ago|

[-]

If this is something that will hurt Nvidia, I'm all for it

by jabedude1 days ago|

[-]

how much does this chip help with inference speed?

[-]

It's probably the same speed but cheaper.

by Buttons84019 hours ago|

[-]

Fucking Broadcom?

The only time I've ever seen that name before is when trying to solve driver issues, on both Linux and Windows.

Are they especially stingy with their IP related to drivers or something?

by m3kw921 hours ago|

[-]

They tested on spark model, i bet it's a mix of that with focus on inference speed. Whatever it is, hopefully it shows up with current models as faster. Token/s is as big thing as anything else, and thats where they can really gain some edge over the competition.

by tehjoker22 hours ago|

[-]

No information on how significant the reduction in energy per token is. No information on amortized price per request. Increasingly its clear OpenAI must demonstrate order of magnitude reductions in cost to not die, this is investor story time without that information.

by rvz23 hours ago|

[0] https://news.ycombinator.com/item?id=45429514

[-]

No surprise here. [0]

[-]

Actually, I find the idea of using Cerebras etc. for /training/ (not just inference) surprising: I did not stumble in much data and discussion about "super-CPUs" in that area, where NVidia (with the tools focused on it) has that long-built edge...

Edit: contextually,

> Jalapeño is specifically designed for inference

by Imustaskforhelp23 hours ago|

https://techcrunch.com/2025/11/06/sam-altman-says-openai-has...

[-]

Although this seems to be for inference itself only and not training but inference is a recurring cost and training is a one time cost and so to me, even if Nvidia still gets moat on training, I don't think that it could ever justify its massive evaluations because for example, some chinese models are actually trained on Non-Nvidia models. The moat in that is incredibly thin.

(at the moment), I think that if I were Nvidia, I would be a bit terrified and I imagine the stock to not be doing super great as I can just imagine everyone online might start talking about it for better or for worse.

I am a bit impressed by OpenAI but is this what can be classified as a plan for OAI to salvage itself and all the commitments it has made nearing a 1.4 Trillion dollars from my memory and this article[0] is from 2025

But could OpenAI simply walk out of its commitments when necessary (for example to Nvidia) if this chip works out or what exactly might happen in the future as these commitments are asked to be paid for, its still smart for OAI to diversify with this chip and to have more deeper ways of revenue than just being a simple middleman but I imagine that Nvidia and others have also invested in OpenAI and they must not be happy with this change.

The thing with AI deals are that they have become so complicated that it is hard for me to find the first order impact of things, let alone second or third order impacts and financial accountability seems to be impacted quite heavily because of all of it and there is some sense that it is done so intentionally.

by wilg1 days ago|

[-]

> significantly better performance-per-watt than current state-of-the-art alternatives

An interesting example of how the current market dynamics incentivize low cost and therefore power efficiency and therefore lowering resource use.

by zuzululu22 hours ago|

[-]

im very excited that frontier models now have so much money and revenue they are releasing their own chips that could change the relationships and bottom line

by gaigalas23 hours ago|

[-]

But nvidia's moat is software support, isn't it?

by KeplerBoy22 hours ago|

[-]

You don't need a whole lot of software support if you just want to serve a single family of LLMs.

by gaigalas22 hours ago|

[-]

A lot of companies that serve a single family of LLMs seem to prefer nvidia though. Why is that?

It's not just good drivers, which is what moats them for games and ML. It's a multi-decade work of making chips that are nice to program for and software infrastructure around them.

Apple and Google have excelent chips, yet they needed to invest a lot in long-tail software projects to make those chips do actual premium work. Still not state of the art for serving LLMs (although Google is strong in that, mostly because it piggybacked on previous chip-related software work for phones and so on).

by SV_BubbleTime19 hours ago|

[-]

> A lot of companies that serve a single family of LLMs seem to prefer nvidia though. Why is that?

If you write your tools for CUDA, you’re going to prefer hardware the runs CUSA.

How is there anything more to it than this?

by gaigalas18 hours ago|

[-]

Cool. That's it.

What will people use to write for Jalapeño matters.

Nvidia has multi-decade heritage. Apple spent almost a decade in MLX. Snapdragon failed partly here. OpenAI announced nothing regarding to that, so this big moat that multiple companies have (nvidia the most prominent) is nil for them.

by hari_vardhan12 hours ago|

[-]

by xyst14 hours ago|

[-]

> built by Broadcom

AI is cooked bro. Broadcom is the death sentence of anything.

by jauntywundrkind21 hours ago|

[-]

Is there any actual content on what the chips are?

You can't purchase Microsoft or AWS chips, but both of them do pretty good write-ups on what they've done. https://blogs.microsoft.com/blog/2026/01/26/maia-200-the-ai-...

This seems utterly empty of actual substance.

by innis2268 hours ago|

[-]

[dead]

by 42Hugh5 hours ago|

[-]

[dead]

by Quin_Liu13 hours ago|

[-]

[flagged]

by vladar10710 hours ago|

[-]

[dead]

by kevinten109 hours ago|

[-]

[dead]

by mikewarot1 days ago|

[-]

[dead]

by yoDogItIswutis20 hours ago|

[-]

[dead]

by coalstartprob18 hours ago|

[-]

[dead]

by sehw1 days ago|

[-]

lol

by flyinglizard1 days ago|

[-]

I call BS. It’s probably a white label around existing Broadcom IP, impossible to go from zero to this kind of chip in nine months. I doubt OpenAI had any significant contribution.

by zerohp23 hours ago|

[-]

That’s exactly what this is.

9 months to production is completely impossible anyway.

9 months from design to early samples is probably impossible given than TSMC takes 3 months after tape out to produce them. Then it’s up to the customer to qualify and revise for production. TSMC doesn’t do that.

There’s no AI that makes this happen in 9 months.

by Mistletoe1 days ago|

[-]

The similarities between the AI world and the crypto world are so much closer than any AI fanboy would ever admit.

by samrus20 hours ago|

[-]

This is why ram prices are fucked. Cause altman doesnt give a shit about normal people as long as openAI suceeds

by Africa-Ai23 hours ago|

[-]

Wow thats sounds tempting to use open ai newest chips

by nullbio17 hours ago|

[-]

Big tech AI labs will develop LLM accelerators and hardware LLMs that increase frontier model output to tens of thousands-hundreds of thousands of TPS.

These chips will be used internally for their own business goals, giving them the capability to iterate at such an insane pace they will be able to clone every software product and software company on Earth. Meanwhile they'll trickle out 100-300 tps access to the rest of subscription users to drain them of their cash and keep the beast fed with fresh training data.

How can any individual company building a product, with access to 100-300 TPS behind-frontier security-gated, censored and capability gated models expect to compete with a company like Anthropic or OpenAI with frontier, unrestricted, unlocked models that can produce 100-1000x the output? 3-5 of their employees working to clone your 500 staff business will likely be easy pickings for them.

This should concern everyone.

The only reason they aren't 100% in on the strategy of replacing everyone is because they need us for training material and they needed the bootstrap. But the bootstrap problem is already gone, and they don't need to give us fair access to keep training data rolling.

by jerojero1 days ago|

[-]

One thing I don't like about California based companies is how cringe the names always are.

"Jalapeño" is such a bad name, having an "ñ" already makes it difficult and annoying to deal with in so many little ways. Good luck with that.

But also, theres the sort of "yes lets use Mexican related things because we're California" thought that I just really hate. I don't know, its like corporate Memphis to me. You see a product like this, you know it's an uppity califonia based firm that came up with it.

by thewebguyd1 days ago|

[-]

No worse, I suppose, than, the obsession with Lord of the Rings that the authoritarian surveillance companies have. Palantir, Anduril. Then we have the not defense/surveillance ones: Mithril, Valar, Narya, Erebor

by skeledrew1 days ago|

[-]

What kinds of names would you suggest?

by thewebguyd1 days ago|

[-]

None, probably. Just saying Jalapeño is no worse than any other non-descriptive company name. Although at least Palantir and Anduril are aptly named for what they do. The VC firms less so.

by utopiah1 days ago|

[-]

Strawberry was too complicated as a codename.

by CrzyLngPwd1 days ago|

[-]

Too many Rs.

by smallmancontrov23 hours ago|

[-]

Too many? But there are only two Rs in strawberry, how can that be too many?

by CrzyLngPwd22 hours ago|

[-]

You are correct. I don't know why I thought there were 5 Rs in strawberry, and now I look properly I can count them correctly, there are indeed 6 Rs in strawberry.

I am sorry for initially giving an incorrect answer.

by anthk1 days ago|

[-]

Don't worry, in Europe it's the same, but for insurances/lawyer stuff. Tons of companies have names based on Latin words such as Civitas/Insalus/Legalia/Legalitas or whatever which looks tacky/rancid/old fashioned kilometers away.

by qsxfthnkp23221 days ago|