upvote
Yeah, I intentionally left space for the computation graph to be included in the GGUF spec in the hopes that this would be picked up by someone. I would have loved to have it in the first version, but I was prioritising getting the MVP spec out and implemented.

I'd still love to see this, but it would need a cheerleader very familiar with the current state of the GGML IR.

reply
I feel like the computation graph could be embedded into the weights similarly to how ONNX works. Then you expose some common interfaces that except some common parameters, and additional custom ones can practically be extensions, sort of like how Wayland works. So you can support not only transformer-ish models like LLaMa, but also RNN-ish models like RWKV and also multimodal models and more. Not sure how this would be implemented in practice but it sounds like a cool idea. I just worry that if the computation graph is baked into the model file, then improvements to the architecture or optimizations that don't require changes to the weights won't be applied to existing files without a conversion.
reply