upvote
I've worked around lower RAM machines with ONNX web models by first separating .onnx from .onnx_data, and second having scripts that split up the "layers" and shards the run (e.g. https://huggingface.co/cretz/Z-Image-Turbo-ONNX-sharded). Then you can have the runtime only run one at a time. I don't understand the details too deep, but Claude is good at writing scripts to shard onnx protos.
reply
It froze up my computer, had to hard-boot lol

(16MB M1 Macbook, Chrome)

reply