Sadly, as you can tell, they have not taken me up on my requests. Awesome that other people got it working!
What exactly do you feel macOS is missing?
Presumably with the right entitlements you can just hit the same (presumably IOKit) syscalls that driverkit does. But that's an extra layer of reverse engineering, and you're not really using driverkit anymore.
i don't think the issues with the project really are specific to driverkit.
> I have been bothering the VM team for years for VM GPU pass through.
Good luck. I'm sure they're keen on giving people access to this so that people can spend their money on NVIDIA GPUs instead of buying more expensive Macs. :)
Would of course be awesome, but I'd be very surprised if it happened.
(Meanwhile, I'm recompiling Wine to see if I can patch it to address an issue that was hotfixed in Proton two weeks ago but isn't in a CrossOver build yet, so yeah, there's maybe some arguments to be made here that I'd be a potential beneficiary. If I weren't too cheap to spring for an eGPU in today's market, anyway.)
The VFIO-style driver made by the author of this also appears generic enough to support all kinds of PCIe, not just GPUs. Apple might find a way to weasel out of this ("hey, this is for hardware companies and you don't seem to be affiliated with one", "your driver requests too broad access", etc.) if there really is a conflict of interest, but so far, there's a chance it will just get rubber-stamped.
I can see them rejecting it for legitimate reasons, though, at least as far as "legitimate" with Apple goes. This driver is essentially a thin layer over PCIDriverKit, exposing all functionality that's supposed to be behind the entitlement to arbitrary applications, in similar fashion to WinRing0. They probably didn't come up with all this bureaucracy only to sign something like that in the end. We'll see what happens.
[0] https://github.com/scottjg/qemu-vfio-apple/blob/84ecdcf5db6b...
[1] https://developer.apple.com/documentation/pcidriverkit/creat...
1. Virtualization.framework seems to support some form of GPU passthrough from the host (granted, not eGPU - it's for the integrated GPU). I think the primary use case is having macOS guests get acceleration, while still sharing GPU time with the host. There is also a patch that recently hit QEMU mainline that supports using the "venus server" with virtio-gpu to support a similar functionality for Linux guests under Hypervisor.framework.
2. Apple internally has some kind of PCI Passthrough support available in Virtualization.framework. It seems like the code is shipped to customers in the framework, but it relies on some kind of kext or kernel component that isn't shipped in retail macOS. I can't say if that's intended to ever be released to customers, but clearly someone at Apple has thought about this the feature.
Unless there's another method I missed, the internal GPU "pass through" of Virtualization.framework you're thinking of might actually just be paravirualization, at least that's what the name suggests. It's implemented in the public ParavirtualizedGraphics framework [0], albeit for PG on Arm macOS, the relevant interfaces are private [1]. I haven't looked that deep into it per se, but, fixing the bugs around it, I've run into a few clues suggesting that it's just a command stream + shared memory being passed around. It also uses its own generic driver on the guest side.
Great job, by the way! Love how authors of pieces like this casually come here to comment :)
[0] https://developer.apple.com/documentation/paravirtualizedgra...
[1] https://github.com/qemu/qemu/blob/edcc429e9e41a8e0e415dcdab6...
There's some randomness around Tahoe for FileVault and it crashing because Data is detected as not encrypted (and that's not OK on bare metal). If hitting that case you might need to enable FileVault inside the VM (and remember to sync aux storage afterwards if not done)
there also appears to be a generic pci passthrough path. we were discussing it on the qemu-devel list: https://lore.kernel.org/qemu-devel/C35B5E97-73F2-4A60-951B-B...
Will Apple ever make a computer that makes Siracusa happy? (and do you have the "Believe" shirt?)
Now they gave up on the workstation market that really enjoys their slots for all myriad of cards.
Having a thunderbolt cable salad is only for those that miss external extensions from 8 and 16 bit home computer days.
Which is clearly what Apple is nowadays focused, if you look back at the vertical integrations before the PC clones market took off.
So now if you really need a workstation, it is either Windows, or one of those systems sold with Red-Hat Enterprise/Ubuntu from IBM, Dell , HP.
I haven’t seen a non-laughable workstation config from the big vendors since the dot com bubble. Presumably they exist, I guess?
I've been pretty darn happy with the Puget Systems custom workstation I ordered last year before the memory craze started (especially since it has 192GiB of DDR5).
I also ordered another family member a custom "Tiki" system from Falcon Northwest and that has also been quite excellent from what I've seen and they've told me.
Now is obviously not the most economical time to order a new system, but when it is appropriate (and for what it's worth) I think those are two great system builders.
The last I checked, the really big players tended to add value add gimmicks (water cooling is a common one, custom psu form factors are another) with reliability / compatibility issues. That’s the tier to avoid, not the Puget systems of the world.
My Puget Systems workstation for example has a simple AIO for cooling with some Noctua fans and a Fractal Design 7 XL full tower case.
The Tiki system I ordered for a family member from Falcon Northwest does have a custom case, but almost everything else is fairly standard inside. The super small form factor was important to them.
Could I have built either of these systems myself? Absolutely -- I've done that for at least prior 20 years or so, and I've built dozens for employers, but it sure was nice to buy one that just worked this time instead of having to having to fiddle with memory sticks or find exactly the right bios settings for stability, etc.
I'm well aware of the premium I paid but I can honestly say it has been incredibly nice to have a workstation that just works without having to fiddle with bios updates or hardware. I also don't really have the time to spare so I was entirely willing to trade funds for time.
It is too inefficient to design a machine which _might_ have two GPU and a flock of additional drives installed into it. It just makes sense to instead design around having independent hardware in its own case, which can meet its own power/cooling needs. This has been a design goal since the trashcan Mac.
Having a PCIe bus increases bandwidth and reduces latency, but once you account for eGPU and for people who would be happy building custom solutions on platforms other than macOS, there's likely not enough identified market for a modular design.
Even if the drivers loaded, they can't talk to the GPU from within docker (unless one implements PCI passthrough). MacOS owns the PCI bus in this scenario.
Anyway, the Mac Pro is dead now. There's only so much sales audio and video professionals can provide.
https://www.reddit.com/r/hardware/comments/1hmgmuf/apples_hi...
Arguably more petty. SJ has been dead for almost 15 year now, I imagine the C-suite might get over it at some point.
I can believe it. IIRC Jobs also snubbed ATI once after they leaked the GPUs going in the next PowerMac model.
Things have moved on since the days where GPUs in Macs were a priority.
But then the AI race has changed things. So who knows - maybe we will one day see official eGPU support from Apple and new drivers from nVidia. Wouldn't put on money on it though....
I don’t know about that. Apple supported some full size GPUs in past product lines and the number of users was very small. Granted, LLMs change that demand but the audience for Mac Pro buyers who would use a full-size GPU that is impossible to obtain is almost nothing compared to their laptop sales.
Part of the reason the new Mac Pro failed to find an audience can definitely be blamed on macOS' hostility to third party hardware. Who knows what Apple would be worth if they beat Nvidia's Grace CPU to the datacenter market. It was certainly their opportunity.
The only ones left were people like John Siracusa that still hoped to the very last minute, that Apple would change their mind.
Admittedly… what’s on my desk? A MacBook M4 Air, a Mac Studio, and there’s an x86 iMac in the corner.
What goes in the travel bag? A MacBook Pro or the Air.
Every time I look at buying something else the math doesn’t add up.
The 5090 sits in a commodity PC chassis. It’s not like I need a model running on my own computer.
It isn't only audio and video.
Maybe it doesn’t matter that much now because they’ve literally exited all the businesses where an external GPU is going to matter. But sticking with AMD all that time out of spite is just wild.
The game benchmarks are fun but the LLM improvements are where this gets really interesting for practical use. I love Apple platforms as an approachable way to run local models with a lot of RAM, but their relatively slow prompt processing speed is often overlooked.
> Here you can see the big issue with Macs: the prompt processing (aka “prefill”) speed. It just gets worse and worse, the longer the prompt gets. At a 4K-token prompt, which doesn’t seem very long, it takes 17 seconds for the M4 MacBook Air to parse before we even start generating a response. Meanwhile, if you strap the eGPU to it, it’ll only take 150ms. It’s 120x faster.
The prefill problem goes unnoticed when you’re playing around with the LLM with small chats. When you start trying to use it for bigger work pieces the compute limit becomes a bottleneck.
The time to first token (TTFT) charts don’t look bad until you notice that they had to be shown on a logarithmic scale because the Mac platforms were so much slower than full GPU compute.
The RTX 5090 has an incredible amount of compute performance for matrix operations and a lot of memory bandwidth. The Apple Silicon parts have unusually high memory bandwidth for general purpose compute chips, which is why they can generate tokens so fast. Their raw matrix compute performance is amazing for their power envelope but not nearly as fast as a dedicated GPU consuming 400-500W.
Apple added tensor cores on the M5 generation which help with those matrix operations, which is why the M5 performs so much better than the M4 Max in that article.
Dedicate GPUs like the RTX 5090 are in another league, though.
You can see the divergence in the high resolution gaming benchmarks, too. Once he starts benchmarking at 4K or 6K where the CPU emulation stops being a bottleneck, the raw compute of the 5090 completely crushes any of the Apple Silicon GPUs.
EDIT: since Aurornis beat me by 3 minutes, I’ll add another interesting tidbit instead :)
NVIDIA tensor cores on consumer GPUs are massively less powerful per SM core than on their datacenter counterparts-parts (which also makes them easier to get to peak efficiency on consumer GPUs because the rest of the pipeline is much more quickly a bottleneck as per Amdahl’s Law).
This is potentially changing with Vera Rubin CPX which looks an awful lot like a RTX 5090 replacement but with the full-blown datacenter tensor cores (that won’t be available unless you pay for the datacenter SKU) - so it will have very high TFLOPS relative to its bandwidth.
The target market for the CPX is exactly this: prefill and Time To First Token. You can basically just throw compute at the problem for (parts of) prefill performance (but it won’t help anything else past a certain point) and the 5090/M5 are nowhere near that limit.
So the design choice for NVIDIA/Apple/etc of how much silicon to spend for this on consumer GPUs is mostly dictated by economics and how much they can reuse the same chips for the different markets.
Every Blackwell card other than the (G)B100, (G)B200, (G)B300 and Jetson Thor, use the Ampere tensor core instruction (mma.sync) but with fp4/6/8 added on. Beyond that the DGX Spark (which is advertised as having the same architecture as B200) has especially weak (not tcgen05) tensor cores that have a very narrow operating window and low utilization.
because the GPUs aren't as fantastic as everyone assumes?
> might also be less optimised in MLX?
prefill has gotta be one of the most optimized paths in MLX...
Seeing the author present their results like this give off the impression that they’re biased, which I am sure they aren’t.
I understand that this is true it seems that Doom does support Vulkan but you would need to add VK_NV_glsl_shader to MoltenVK. Probably much less work than what went into hanging an RTX 5090 off of a M4. Still, kudos to the scott and the local AI Inference speeds are pretty cool. What a crazy project! <applause>
(EDIT: Apple agrees with my impression. “To use an eGPU, a Mac with an Intel processor is required.” And, on top of that, the officially supported eGPUs were all AMD not NVIDIA. https://support.apple.com/en-us/102363)
It'd be amazing if Apple would provide better support, and allow more than that 1.5 GB window to make this easier. Arm overall has some quirks with PCIe devices, but at least in Linux, it's gotten so much easier since most modern drivers treat arm64 as a first class citizen.
this is only speculation, but i think the big thing that makes tinygrad slow is that the tinygrad inference engine has not really been optimized much for all these open LLM models. probably most of the work has gone towards optimizing the stack for george's self-driving hardware company. since you can't just run the existing CUDA kernels on their engine, that makes things a lot tougher, engineering-wise.
i am actually curious if my project could share a macos host driver with them. i think it would need some changes, but it seems like there's a lot of overlap
The problem is `max-num-seqs` and `max-model-len` fight each other, and unless you're in the pure single-client mode you'll need multiple slots so to speak.
Hopefully in 2026 the Valve Index VR headset which is ARM (Qualcomm?) we get what you're talking about here - basically proton for Win32/64 to Linux ARM64.
Side note that Windows on ARM isn't bad just that its priced out of its league and cooling is awful for gaming on current laptops. The only issue I had was OpenGL needing some obscure GL on DirectX thing for Maya3D to get games to work.
But Valve's ARM efforts even mean that Android devices can play some (mostly less graphically intensive) Steam games. That makes me very excited about the prospects for the future of gaming handhelds.
Or, more likely, it will tell you something it doesn't know.
Reminds me of yesterday, when I was arguing with ChatGPT that the 5070TI was an actual video card. It kept trying to correct me by saying I must have meant a 4070ti, since no such 5070ti card exists.
I asked Claude to generate an HTML page about PowerShell 7. It gave me a page saying 7.4 was the latest LTS release. I corrected it with links showing 7.6 was released in March and asked it to regenerate with the latest information.
It generated basically the same page with the same claim that 7.4 was the latest release.
People do this too though. At least the AI generally tries to follow instructions that you give it even when you are lacking clarity in the details.
I feel like it's similar to the self-driving car problem. The car could have 99.9999% reliability, drive much better and safer than a human, yet folks will still freak out about a single mistake that's made even though you have actual humans today driving the wrong way down the highway, crashing in to buildings, drunk driving, stealing cars, and all sorts of other just absolutely stupid things.
We need to move away from this idea that because it's an AI system it should give you perfect responses. It's not a deterministic system and it can be wrong, though it should get better over time. Your Google search results are wrong all the time too. The NYT writes things that are factually incorrect. Why do we have such a high standard for these models when we don't apply them elsewhere?
it should be reasonably expected that you can give a source and fix an error in the AI output.
I would even go as far as to say if a human directly told the AI "no, use 7.6 as the latest version", the AI should absolutely follow direct instructions no matter what it thinks is true. What if this human was working on a slide about the upcoming release of 7.6 that has no public documentation?
For me, I ask AI questions about taxes and my health all the time. In the case of taxes, getting a basic handle on the relevant tax law is made 1000 times easier. I can always refer directly to the IRS publications to verify, once I know what I’m looking for.
For health, frankly, it would be impractical for me to ever get as much useful information from doctors as what I can easily get from AI. Four years ago, I would have a bunch of health questions and simply never know the answers to any of them because I would have nobody to ask. Now I get them all answered, and if I were to be suggested to actually do anything that sounded even slightly risky I’d go to the doctor, armed with much more context than I had before, to verify it.
This is also very bad and people complain about these things all the fucking time.
> Why do we have such a high standard for these models
Because Altman and Amodei are defrauding investors out of hundreds of billions of dollars on the promise that they will replace the entire workforce. Of course people are going to point out the emperor has no clothes when half of our society is engaged in mass hysteria worshipping these fucking things as the next industrial revolution, diverting massive amounts of resources to them, and ruining HN with 10 articles on the front page per day about how software engineering is dead.
Even this article, which is theoretically about playing games on a MacBook and not about AI, has devolved into AI discussions. It's honestly kind of tiring.
I suppose the article invites it by putting an AI blurb up top, and I suppose I'm also not helping by adding my own comment, but _still_.
So at worst these AI tools are as bad as the existing system. Worth complaining about? Absolutely. Worth holding to much higher standards? Nah I don't think so. Not at this stage at least. And folks are just disappointing themselves by setting up straw men expectations.
These tools are non-deterministic systems (like humans) which sometimes don't do exactly what you want (like humans) but are also extremely fast, much cheaper (for now), and have domain knowledge generation that is much broader than any single human has. Like anything else, there are pros and cons.
The New York Times publishes a "corrections" section in each issue. Let me know where I can view the 60TB file where ChatGPT fesses up to its daily fails.
People lie all the time too. You're just radicalizing yourself to create a bias for no reason other than concocting a straw man expectation that you made up for yourself. What's the point of that?
"Very deep", "border-line impractical" "in a research-sense" is the perfect summary of this article itself! :)
Previous Empires naively bet their entire future on the words of magicians, or people who claimed they could look into water, the sky and fire and tell you what the future is going to be.
Machine Learning Engineers are the modern day Empire's court magician.
> Important: Codex CLI no longer exists
> OpenAI discontinued the Codex model + CLI a while back. There is no official binary named codex in any current OpenAI npm packages. OpenAI’s current CLI tool is:
npm install -g openai
> which installs the openai command, not codex.The world knowledge of these models is not necessarily up to date :)
edit: I replayed the same prompt into current ChatGPT and it is less clueless now. Maybe OpenAI noticed that it was utterly dumb that GPT-5.whatever didn't believe that Codex existed and fine-tuned it.
It's amazing how this still needs to be said. Codex was released in April 2025. The initial GPT-5 and 5.1 still had a knowledge cutoff in late 2024. Like, what did you expect? Always beware the knowledge cutoff for LLMs (although recent releases have gotten much better with researching the web for updates before answering modern software topics).
I really hope "The Clown" isn't just a typod "The Cloud".
If not, tell us more!
I'm inspired/tempted by this to rename my external "higher grunt offload" machine to "clowntown".
I got Fallout 3 working on my M2 MBP as well as it did on Windows back in the day. Temps were cool, battery was decent. If they sold my college years gaming collection (15-ish years ago) in a way that ran natively through GoG or Steam, I'd buy every single title.
The real question is what happens when they drop Rosetta. They promised they'll keep the APIs related to running 32 bit games but can we trust them?
[1] Not at 8k 240 fps of course.
Not to mention that Mac owners are a minority share of the PC gaming market. Linux has the right idea, if you don't translate the games then you'll never have true preservation.
I'll never pay anyone for a developer licence or fee either. They can sponsor me to port my software to their platform.
Most people don't need that, but most people don't need an eGPU either. The number of gamers who would switch to Macbook+eGPU is negligible. It's just not compelling. For LLMs, hanging a 5090 off the thunderbolt port makes prompt processing fast, but I will be surprised if the M6 doesn't come with silicon just for that, as its the current gap. M5 is quite adequate for token generation for the price, given the RAM quantity and bandwidth. An M6 that accelerates TTFT would make an eGPU irrelevant.
For gaming, the threadripper gets at least +50FPS for windows vs linux, and some games just freeze for periods of time on linux with things like dynamic frame generation. I have an SSD for windows just for gaming.
This. eGPUs fade in and out of relevance every few years, and even back in the Intel Macbook days there were people advocating for eGPU gaming with Bootcamp. It was a terrible solution, there is every reason to avoid macOS with a dGPU when you have something like Linux or even Windows as an alternative.
Ignoring the fact that the Mac OS gets in your way every time you try to do something that Apple doesn't like, with no guarantee that an update won't break anything existing, ignoring the fact that Macs are non repairable, non upgradable, ignoring the fact that they don't support multiple displays flawlessly, I hope you realize that egpu support natively is NEVER coming to Macs, because why the fuck would they enable it when they can just charge you full price for a desktop computer? Apple is built on the sole image that Apple users have money, so buying another Mac Mini or Mac Pro in addition to your laptop is what you are supposed to do.
Android is way ahead of Mac with Android Desktop mode and Samsung Dex, to the point where you don't even need to own a laptop anymore. Ive been using my S24/S25 with lapdock for over 3 years now as a laptop, and it works flawlessly. Apple can easily do this with iPhone, but they won't because that means one less macbook purchase.
Of course the author probably did that as a joke.
Another part of me is almost annoyed that Apple's complete apathy toward obvious computing use cases like this is rewarded by a project like this. I feel like Macs and macOS should not be rewarded for being so difficult to extend and use outside of Apple's narrow vision of the use case of their hardware.
Apple used to support this use case wholeheartedly, but we can see that it's abandoned on their end: Intel-only, and the newest generation of AMD GPUs supported are the 6000 series: https://support.apple.com/en-us/102363
I got tired of rewarding Apple for refusing to make a computer that makes the most of the technology available. This stuff is all a lot worse than just moving over to Linux or even Windows. With hardware like the Framework 13 Pro coming out, along with a surprisingly good set of premium PC laptops, I really don't think the Mac hardware is worth it anymore. Others have legitimately caught up, especially with Apple's aging MacBook Pro chassis with the horrible notch.
Bingo. This is exactly how I use LLM. I like getting a gut check, seeing what the first recommendations are or if there is some deep flaw in what I think the approach is, and I almost never copy/paste whatever it spits back or just follow its instructions.
"no - not in any practical sense today, and "maybe" only in a very deep, borderline-impractical research sense."
This is why humans will always rule over crappy LLMs.
Or if you're referring to how the OP still decided to go ahead, I've seen AIs go ahead on impractical courses of action many times, and surprisingly succeed on some of them.
that said, since i was willing to ignore that aspect of it, it did accelerate getting the work done by a lot. it seems like it understands system programming really well, and did a good job navigating the qemu codebase. i have ~20 years of systems programming experience so i already knew what had to be done here. it didn't really guide the project much, but it did write a lot of the code.
Congrats! Each one got what they wanted :).
Unfortunately, I also believe that market forces may push away from this direction, as LLM companies try to capture the value stream
Never let an AI tell you that you cannot do something practical for your own self for research, discovery or for fun.
The only thing that is close to impractical is expecting your non-technical friends or others to follow you without any incentive or benefit.
It’s these people, not the ones who refuse to use LLMs, who are as they say, “cooked”.