Does this really mean anything? I for example, tend to ignore certain benchmarks that are focused towards agentic tasks because that is not my use case. Instruction following, long context reasoning and non-hallucinations has more weight to me.
IQ4_NL @112 GB
Q4_0 @ 113 GB
Which of these would be technically better?
[1] https://huggingface.co/bartowski/stepfun-ai_Step-3.5-Flash-G...