Hacker News
new
past
comments
ask
show
jobs
points
by
nl
15 hours ago
|
comments
by
aesthesia
14 hours ago
|
[-]
Thinking shouldn't be too hard to deal with---just let the model generate freely until it hits a </think> token, then do constrained decoding, right?
reply
by
stymaar
11 hours ago
|
parent
|
[-]
Sure, but does llama-cpp support that?
reply