i do not know if theres a smaller model with same capability, but model size and context window at 128 seems like a sweet spot.
token speed really isnt a bother because im either just multitasking or working on the filling in the missing details.
regardless, i think comparing first VRAM sizes w/target model then speed for your cost efficiency. plus, a healthy skepticism of mac hardware costs.