BERT isn’t a SLM, and the original was released in 2018.
The whole new era kicked off with Attention Is All You Need; we haven’t reached even a single decade of work on it.
Huh? BERT is literally a language model that's small and uses attention.
And we had good language models before BERT too.
They were a royal bitch to train properly, though. Nowadays you can get the same with just 30 minutes of prompt engineering.
Astute readers will note what’s been missed here.
Fascinating, really. Your confidently-statement yet factually void comments I’d have previously put down to one of the classic programmer mindsets. Nowadays though - where do I see that kind of thing most often? Curious.
Also the irony of your comment when it in itself was confidently stated yet void of any content was not missed either - consider dropping the superiority complex next time.
I don’t see a useful definition of LLM that doesn’t include BERT, especially given its historical importance. 340M parameters is only “small” in the sense that a baby whale is small.
While I could’ve written that better and with less attitude, gotta confess - and thx for pointing out my smugness - the AI stuff of the last few weeks really got under my skin, think I’m feeling all rather fatigued about it
We had very good language models for decades. The problem was they needed to be trained, which LLM's mostly don't. You can solve a language model problem now with just some system prompt manipulation.
(And honestly typing in system prompts by hand feels like a task that should definitely be automated. I'm waiting for "soft prompting" be become a thing so we can come full circle and just feed the LLM with an example set.)
I’m not astute enough to see what was missed here. Could you explain?
I don’t agree. I would say the entire point of LLMs is to be able to solve a certain class of non-deterministic problems that cannot be solved with deterministic procedural code. LLMs don’t need to be generally useful in order to be useful for specific business use cases. I as a programmer would be very happy to have a local coding agent like Claude Code that can do nothing but write code in my chosen programming language or framework, instead of using a general model like Opus, if it could be hyper-specialized and optimized for that one task, so that it is small enough to run on my MacBook. I don’t need the other general reasoning capabilities of Opus.
You are confusing LLMs with more general machine learning here. We've been solving those non-deterministic problems with machine learning for decades (for example, tasks like image recognition). LLMs are specifically about scaling that up and generalising it to solve any problem.