undefined

points

[-]

the QWEN-3.5-CODER-NEXT fits in half the 128GB and the rest for context. with the right plugins, particularly context pruning, ive got it running over night by writing plans then implementing.

i do not know if theres a smaller model with same capability, but model size and context window at 128 seems like a sweet spot.

token speed really isnt a bother because im either just multitasking or working on the filling in the missing details.

regardless, i think comparing first VRAM sizes w/target model then speed for your cost efficiency. plus, a healthy skepticism of mac hardware costs.