Hacker News
new
past
comments
ask
show
jobs
points
by
irthomasthomas
21 hours ago
|
comments
by
dominotw
21 hours ago
|
[-]
i dont understand the nuances here. what does this mean. 4.8 is trained on same model as previous one then? what does brand new mean.
reply
by
irthomasthomas
21 hours ago
|
parent
|
[-]
It means for 4.7 they trained a new base model with different architecture, different pre-training data (later knowledge cutoff), and a new tokenizer. Vs finetuning an existing model, which was the case for 4.6, and probably for 4.8.
reply
by
dominotw
18 hours ago
|
parent
|
next
[-]
do you mean pre training? so 4.8 is just post training of an old pretrained model?
btw where do they tell you how they trained the model.
reply
by
18 hours ago
|
parent
|
prev
|
[-]
deleted
reply