There's also a bunch of work going into the SKILL.md to plan for more complex parts (this is mostly a stop gap while the models don't have amazing spatial reasoning).
I'm using Opus 4.7 w/ the 1M context option on the vibrating mesh nebulizer repo and have hit compacting pretty often which is a restart-the-conversation flag for me on relatively small OpenSCAD files like the adapters and enclosures here which are like 10-40kb: https://github.com/dmchaledev/VibratingMeshNebulizerControll...
I'm brushing up on robotics after spending the last 10 years working in software land. After being humbled by modern CAD tools like Onshape, I built this harness / skill to help me generate some basic CAD models for a 7dof robot arm I'm designing.
It ended up working much better than I expected, particularly on the latest GPT 5.5 and Opus 4.7 models. It's been a lot of fun to work on. I've learned a lot about how STEP files work (opencascade, breps, etc) as well as 3d rendering tools like threejs.
I don't have much intention of turning this into a business, it's really just a fun open source tool that I'll continue to maintain as long as myself and others find it useful. Very open to ideas and contributions.
P.S. I just pushed a major update that improves the workflow and scripts/tools for the CAD skill. I also added some basic benchmarks to start measuring performance over time.
https://media.githubusercontent.com/media/earthtojake/text-t...
I'm just wondering why anyone would bother describing CAD models in text. Language is imprecise and ambiguous. If you want to create a full part definition, you need to be extremely thorough with your description. At that point it's just easier (less mental load) and faster to construct the thing yourself. Not to mention, the model might still ignore your perfectly good prompt.
I think going from a picture to an initial starting point with well-"thought"-out structure for CAD purposes could potentially be very useful. Optimally you could just enter the measurements and be done.
Looking at the L-bracket one, the specification is actually instructing the gussets to overlap the holes, so it actually performed both better and worse than I expected
And yes, as someone who CADs mechanical parts a reasonable amount, you have to be very precise, hence me wondering how the given prompt could be useful
It's not perfect by any stretch, but it is surprisingly strong. It was able to create and debug some pretty complicated geometry by iterating with screenshots, adjusting view angle and zoom and rendering mode, updating parametric geometry generation, and working to fairly complex goals.
Obligatory mention of https://zoo.dev/ who went to extreme lengths on this.
I will say I explored this reasonably deeply and came away with the conclusion that even though we have OpenSCAD and all these examples, LLMs are still very weak at spatial reasoning compared to diffusion models.
You can do all sorts of tricks like have a parts library to get around this and do physics checks but another inconvenient truth is whenever you design a complex assembly, every change to that part needs to be aware of the other parts in the design -- thus you need a global part-aware editing capability from diffusion.
That's getting solved already in china leading labs, and bottlenecked by the lack of good training data, which china is solving with mass labor.
This will be solved overseas first before we will in the US.
p.s. I am not affiliated with zoo or any of these other things FYI was just very curious about this whole area
> That's getting solved already in china leading labs
Care to drop a bit of info as a follow up to this claim? Curious!
What work are you referring to here?
Don't know what diffusion model can do, but 100% agree with the "LLMS are very weak at spatial reasoning" comment.
I build a rather complex blueprint-image-to-3D-brep-model a couple of months back using codex ... ugh the damn thing has really no idea where things are in space, something a 3 year old figures out instinctively.
It did end up saving some time as compared to modeling the object myself in a CAD package, but there was so many completely obvious thing I had to explain ... very hard to believe when compared to what codex can pull of with code.
Even a handwritten sketch could be a very good starting point for an image recognition from an A.I.
i will say that my current harness: https://github.com/cartazio/oh-punkin-pi is a testbed for a bunch of 2nd gen harness tech, largely optimized for reasoning llms only. the next one after this harness is gonna be epicccc
Or is this capable of generating STEP files directly from an LLM (which I doubt)?
[EDIT]: haha. the answer is hidden in:
.agents/skills/cad/requirements.txt
TL;DR:
build123d
ezdxf
numpy
trimesh
vtk
and the engine is build123d, which, from its home page:Build123d is a Python-based, parametric (BREP) modeling framework for 2D and 3D CAD. Built on the Open Cascade geometric kernel, it provides a clean, fully Pythonic interface for creating precise models suitable for 3D printing, CNC machining, laser cutting, and other manufacturing processes. Models can be exported to popular CAD tools such as FreeCAD and SolidWorks.
prbly worth mentioning in the README, I can't be the only one wondering out there.
Also: these things seem to be sprouting all over the place these days (a good thing!) ... CAD modeling using LLMs is clearly an idea whose time has come.
E.g. you want to build a gear box so you draw a sketch in the GUI with the positions of all the gears and name each axle where a gear would be attached, then you open a text editor where you specify all the gear parameters and to which axle they should be attached to. You then go back to the GUI to move the axles around. After assembly, you can start designing a housing for the gearbox.
The assembly could then be loaded directly into any simulation environment of your choice.
yup, found it as you were typing this :D