undefined

points

[-]

Yeah. Back when Gemma2 came out we benchmarked it and were looking at open models. For our use case though, while the tasks are pretty simple, we do need a pretty large context window and Gemini had a big lead there over the open models for quite a while. I'll probably be evaluating the current batch of open models in the near future though.

by jimbokun4 hours ago|

prev|

[-]

What’s interesting about this is that for previous technologies you could define a standard and demonstrate compliance with interfaces and behavior.

But with LLMs, how do you know switching from one to another won’t change some behavior your system was implicitly relying on?