undefined

upvote

points

by LarsDu881 days ago |

upvote

by bhouston1 days ago|

[-]

> CPU, GPU, and even DRAM sitting on the same "die"

This is actually great. The laptop body stays the same and you swap out a small mini circuit board that has the CPU + GPU + DRAM on it.

This is the point of the Framework laptops. They are just unfortunately stuck with non-Apple parts and thus are slow / inefficient.

Maybe Qualcomm can make a motherboard for Framework high end laptops with their Snapdragon X2 Elite Extreme ARM-based CPUs that are supposedly competitive with Apple's M4 offerings?

And then offer a cut down Qualcomm mobile phone CPU + GPU + DRAM offering for the Framework 12 so that it can compete on price/performance with the MacBook Neo?

I think you need to complete with Apple with the right equivalents.

reply

upvote

by LarsDu881 days ago|

[-]

Funny thing is, the circuit board on the Neo is barely smaller than that of the lowest end iPhone. The only remaining big cost item swappable item at that point is the display.

The benefits of modularity begin to get outweighed by the costs when 85% of the cost of the machine needs to be swapped out with each upgrade. For consumers, why would they not simply opt to spend the rest of the 15% to get a whole new computer?

reply

upvote

by topaz018 hours ago|

[-]

You underestimate how much of the cost is the chassis, hinges, screen, speakers, keyboard, which add up. Sure the CPU is the single most expensive component, but CPU + mainboard for the fw13 is less than half the price of a new fw13. And of course part of the idea is that you don't know what you'll have to replace first, when you're staring out. You might bust the hinge, or get excited about their touchpad upgrade, or decide you need a higher resolution screen, long before you need the new mainboard. The flexibility, in other words.

reply

upvote

by bhouston22 hours ago|

[-]

Yes, the CPU+GPU+Memory is fused but the rest of it doesn't have to be. These are still separable components and they do cost something:

- NVMe drive (or two)

- Bright, wide gamut, high resolution screen

- Aluminum case

- Great keyboard

- Wifi/ports

- Battery

reply

upvote

by idle_zealot23 hours ago|

[-]

> spend the rest of the 15% to get a whole new computer?

I can see why the manufacturer would want this. As a user though why would you? If the rest of the body is familiar and works well, why toss it?

Maybe the sentiment springs from the general culture of consumerism and new-is-better thinking, and historically that's been warranted in the consumer electronics space. Most things aren't really like that though. Humans have long built tools, clothing, furniture, and infrastructure designed to last a long time. You commit resources up front to make sure the thing is of high quality and then benefit for anywhere between decades to centuries. Replacement carries the risk of downgrading. Again, rapid technological advancement has blown this way of doing things away, but at some point parts of the tech plateau and this will need to be rediscovered. For things like keyboards, trackpads, and laptop cases, I don't see how "new" will beat "good" from this point on. Even displays are starting to reach limits. This seems like the right time to be working on "here is your reliable human interface device, drop in whatever crazy magic chip fabs have cooked up every X years to keep it capable."

From a humanist perspective there's another reason to move this way. People like to grow attached to objects and tools. Something has been lost in the shuffle of swapping out our most personal objects every few years.

reply

upvote

by andrepd22 hours ago|

[-]

This resonates with me. I have changed phones twice in the last 12 years. Some people look at me like I'm crazy.

reply

upvote

by borgel1 days ago|

[-]

Yeah, I think this is the right idea (or the most optimistic path towards M-series power/performance). If you wanted something fully/aggressively open you could do something like build a mainboard compatible with one of MNT's fully open SOMs like [1].

[1] https://shop.mntre.com/products/mnt-reform-rcore-rk3588-proc...

reply

upvote

by Danox10 hours ago|

[-]

Qualcomm is basically at heart a patent troll company. They give nothing away and they double dip on their Frand patents they won’t support anything if they don’t have to good luck if you think Apple is bad Qualcomm is on a whole different level…

reply

upvote

by ben-schaaf17 hours ago|

[-]

> The way to manufacture more efficient compute now is do things like put DRAM closer to the chip and even closer integration between CPU and GPU. The fact that Apple can co-design their silicon such that the CPU and GPU can pull from the same pooled RAM is a major advantage over competitors.

> When you have CPU, GPU, and even DRAM sitting on the same "die

Apple has been really successful convincing people they've done something special here. Given how many people are so horribly misinformed about this I'd go so far as to call it false advertising.

No, the DRAM is not on the same die. It's on package. They're literally standard SK Hynix memory chips.

Yes technically there's a latency advantage, but comparing M1 to DDR5 desktop chips Apple actually has worse overall memory latency.

Every integrated graphics chip from Intel and AMD has had unified memory for the last 10+ years.

Compute itself is also not what makes the Apple chips get long battery life. Looking at tests under full load the M1 is significantly worse than the latest Intel or AMD, yet it still gets better battery life under normal usage. The efficiency does not come from compute but from a whole host of idle consumption optimisations Apple brought over from their phone chips.

reply

upvote

by veqq12 hours ago|

[-]

Indeed, on an HP Elitebook with a Ryzen 8840U I get about 20 hours of battery life on CachyOS (but downclocking a bit, with TLP) and the speed tests claim this is like a M2-3. For like $500 (before RAM went up...)

reply

upvote

by AnthonyMouse20 hours ago|

[-]

> The way to manufacture more efficient compute now is do things like put DRAM closer to the chip and even closer integration between CPU and GPU.

People have been hyping things like this for decades, but then it turns out the number of applications that need to frequently share data between a CPU and GPU at a faster speed than PCIe can handle are pretty uncommon. Meanwhile putting them closer together has some pretty significant real disadvantages, because then you're trying to deliver more power and dissipate more heat over a smaller area instead of putting more physical separation between the two largest loads in the machine.

Notice that high end PC GPUs are significantly faster than any of Apple's integrated GPUs, and that's why.

> There are also latency and bandwidth benefits how they setup their RAM just from pure physics.

Soldering RAM has a modest latency advantage over SODIMMs at the most extreme timings and CAMM turns even that into basically nothing.

> And chip manufacturing is moving towards chiplets where you have cores manufactured separately and then wired together at nanoscale level on top of a silicon interposer.

You're describing a move to less integration. They were originally on the same die, and the change has no real effect on modularity. The user doesn't even have to know that some Ryzen CPUs have a separate I/O die or more than one compute die, they all still fit into the same socket and are even interchangeable with the ones that have only a single die.

reply

upvote

by LarsDu8820 hours ago|

[-]

- For high end AI inference chips, DRAM already goes onto the interposer right next to the GPU to bring the bandwidth as high as possible. Apple will eventually do this for the exact same reasons. It's not just soldering RAM to a PCB - The chiplet technique and putting everything on an interposer is less integrated from the perspective of the chip manufacturer, but for the consumer -- folks who are going to buy Framework laptops, this is a far less integrated package. CPU, GPU and RAM will sit on the same interposer and purchased together as a unit with no upgrade or swap path for any component. This is not the same as simply soldering everything together on one PCB. The level of intergration is far higher

reply

upvote

by AnthonyMouse19 hours ago|

[-]

> For high end AI inference chips, DRAM already goes onto the interposer right next to the GPU to bring the bandwidth as high as possible.

The high end AI inference chips use HBM and cost tens of thousands of dollars. HBM uses 1024 data pins instead of 64, which is crazy expensive, which means that to the extent that consumer devices get it at all, it would be in addition to rather than instead of ordinary DRAM, e.g. you might have 12GB of HBM on the CPU package but then 64GB of less expensive DRAM. Increasing the number of cache hierarchy levels is a long-term trend. HBM as L4 cache is pretty plausible for high end CPUs as a supplement rather than replacement for DRAM.

There are already servers that work like this, e.g. Xeon Max has 64GB of HBM but then further supports up to 4TB of DDR5.

Moreover, the AI inference hardware integrates the CPU into the GPU because it's really just a giant GPU. They're not getting some major advantage from that, they just know nobody is going to want to swap out the CPU on a system where the CPU is mostly irrelevant. If you wanted that level of inference performance on a normal PC which is used for other purposes where the CPU actually matters then you would drop the AI accelerator with the HBM or GDDR into a PCIe slot.

reply

upvote

by LarsDu8819 hours ago|

[-]

I think the long term trend is typically the high end technology of today will be the mid to low tier technology of the future.

If putting 1024 data pins all connected via a nanoscale manufactured silicon interposer right now seems complicated and expensive, that doesn't mean we won't see it in tomorrow's consumer devices. If anything we will be MORE likely to see this one day. Apple and other companies are gradually working towards moving AI models to be more local which means memory bandwidth has a real killer app use case right now. Witness Liquid AI and their partnership with Mercedes Benz to put 8B param LLM models into vehicles.

Both Desktop PCs and the CPU are becoming less and less relevant as we move further in the decade to be honest...

reply

upvote

by AnthonyMouse9 hours ago|

[-]

> I think the long term trend is typically the high end technology of today will be the mid to low tier technology of the future.

The trend doesn't look like that. The PCI bus from 1992 had 124 pins. PCIe 5.0 x16 has 164 pins; x8 has even fewer pins than the slots from decades ago. Guess how many pins Thunderbolt has. DDR1 DIMMs from the year 2000 had 184 pins; DDR5 has 288. The number of pins goes up very slowly if at all, because it's one of the most expensive ways to increase performance, despite being effective.

Which is why the enterprise hardware has always done it and the consumer hardware hasn't.

> Apple and other companies are gradually working towards moving AI models to be more local which means memory bandwidth has a real killer app use case right now.

The real problem is that ordinary consumers don't want to pay for 128GB of GDDR or HBM, and if they did then you would attach it to the GPU rather than the CPU anyway.

What they might want is the less expensive ordinary DRAM with a wider bus, which is what Apple does, but then you're not using 1024 pins and have no need to solder it instead of using CAMM.

> Witness Liquid AI and their partnership with Mercedes Benz to put 8B param LLM models into vehicles.

8B param models don't need exotic hardware, those run on existing consumer GPUs.

> Both Desktop PCs and the CPU are becoming less and less relevant as we move further in the decade to be honest...

Less relevant to what? Making up for the inefficiency of bad JavaScript with fast hardware? Running the less parallelizable parts of PC games? Databases and other branchy server workloads? They're as relevant as ever to the things they've always been relevant to.

reply

upvote

by Dylan168071 days ago|

[-]

> The fact that Apple can co-design their silicon such that the CPU and GPU can pull from the same pooled RAM is a major advantage over competitors.

Lots of laptops have integrated graphics. And many recent CPUs have strong integrated graphics. They're not doing anything special there. I don't understand why that gets so much attention.

The special thing they do is having very wide bandwidth on the higher end models, to a CPU with integrated graphics. That doesn't affect the Neo though.

reply

upvote

by Danox9 hours ago|

[-]

[dead]

reply

upvote

by curt151 days ago|

[-]

> There are also latency and bandwidth benefits how they setup their RAM just from pure physics

What sort of physics? Dedicated GPUs achieve massive memory bandwidth without needing to put all of their memory on-die.

reply

upvote

by geerlingguy23 hours ago|

[-]

Shorter PCB traces because of insane timing requirements for DDR5, GDDR7, and beyond; GPUs put the memory chips as close as possible surrounding the CPU die to reduce the latency and prevent timing/signaling issues.

But even there, the fastest AI accelerator GPUs are putting memory on die, and using chiplet designs, to get the memory closer and closer to the cores.

reply

upvote

by LarsDu8822 hours ago|

[-]

Simply physically moving the RAM closer to compute can make communication faster.

Ideally, RAM and compute should be combined. That's kind of what our brains do. We'll probably need more mature memristor technology to achieve that one day.

reply

upvote

by yread23 hours ago|

[-]

SSD is also soldered for little performance advantage.

reply

upvote

by AnthonyMouse20 hours ago|

[-]

You say "little" but the actual numbers seem to point to none. There are M.2 NVMe SSDs that are faster than Apple's soldered ones.

reply

upvote

by codedokode20 hours ago|

[-]

It might give great financial performance advantage though.

reply

upvote

by jstanley22 hours ago|

[-]

> Moore's Law approaching its end.

People have been calling the top on Moore's Law for at least as long as I've been buying computers. (~20 years). I'll believe it when I see it.

reply

upvote

by GrumpyYoungMan22 hours ago|

[-]

We're already seeing it if most are incapable of recognizing it. The chip folks aren't doing ridiculously complicated things like https://spectrum.ieee.org/semiconductor-technology-roadmap in $30 billion+ fabs for the fun of it.

reply

upvote

by jstanley9 hours ago|

[-]

People doing ridiculously complicated things to keep scaling CPUs is why it's not come to an end.

reply

upvote

by warmwaffles23 hours ago|

[-]

> Moore's Law approaching its end.

No it isn't. We are going more parallel and the transistor counts will continue to rise.

reply

upvote

by Dylan1680716 hours ago|

[-]

Zen 5 has 8.3 billion transistors in a chiplet, Zen 1 had 4.8 billion per chiplet. If we add on some more to compensate for the separate I/O die then we're looking at basically one doubling over several generations and 7 years.

There's still significant gains to be had, but the exponential growth is really petering out.

reply

upvote

by aaa_aaa21 hours ago|

[-]

No, it ended long ago.

reply

upvote

by smj-edison10 hours ago|

[-]

Moore's law or Dennard scaling?

reply

upvote

by LarsDu8819 hours ago|

[-]

[dead]

reply

upvote

by bigyabai1 days ago|

[-]

> that the CPU and GPU can pull from the same pooled RAM is a major advantage over competitors

It can be an advantage, it also has downsides though. LPDDR5 is fairly slow as far as GPU memory goes, and on Apple Silicon it splits the bandwidth across the entire chipset. Many recent Macbooks have dGPU-tier hardware constrained by Wintel-laptop memory bandwidth.

And if Apple uses DDR5, why not CAMM? If Apple uses NVMe, why not M.2? Many of the advantages you've listed are marginal compared to the real-world constraints of the hardware, and cover up some boneheaded decisions that don't significantly impact the laptop's efficiency.

reply

upvote

by LarsDu881 days ago|

[-]

Right now, at this point in time, for applications like local AI and certain types of gaming, I would argue for most people having more VRAM is more useful than having faster VRAM. I personally now do more AI stuff and gaming on my M5 mac with its 24 GB shared (300 GB/s) RAM pool than my 12 GB 5070 Ti (900 GB/s).

Apple still lives in its walled garden and defends it vociferously, but I would argue they have made the correct design tradeoffs for their business.

reply

upvote

by AnthonyMouse20 hours ago|

[-]

> Right now, at this point in time, for applications like local AI and certain types of gaming, I would argue for most people having more VRAM is more useful than having faster VRAM. I personally now do more AI stuff and gaming on my M5 mac with its 24 GB shared (300 GB/s) RAM pool than my 12 GB 5070 Ti (900 GB/s).

The issue is that this in no way requires soldered memory. CAMM2 supports speeds up to 9600 MT/s. You can get over 300 GB/s from two CAMM2 sockets.

reply

upvote

by bigyabai1 days ago|

[-]

For applications like local AI and the majority of PC video games, you are not expected to have DDR5-level GPU bandwidth. It is a constraint, there is no "good enough" when you're selling a desktop-grade M5 Max that is bandwidth-constrained in practice. Modern gaming at native resolution is pretty much impossible on most Macbook Pros.

It's an acceptable approach for iPad-level stuff, but for professional workstations and desktops it's not competitive.

reply