ehh ur right but there's a lot of nuance here. if you have a system that doesn't hallucinate a ton and is still very "creative" that's great, and probably much better than a hallucinating system regardless of its creativity. I'm reminded of theoremproving LLMs working in lean producing millions of slop proofs until one works, but if you have something like that simple RLVR should fix it (external oracle can be the judge for the RL.
reply