Show HN: AI Timeline – 171 LLMs from Transformer (2017) to GPT-5.3 (2026)

upvote

Show HN: AI Timeline – 171 LLMs from Transformer (2017) to GPT-5.3 (2026)

(llm-timeline.com)

119 points

by ai_bot13 hours ago |

upvote

by jcims5 hours ago|

[-]

I was born in 1973. My grandson was born in 2022. He won't know a world without 'AI' much like my kids didn't know a world without the Internet and I didn't know a world without refrigerators.

One thing I regret to say that I learned very late in my children's development was the value of boredom and difficult challenges. However I think I've successfully passed these lessons on to my kids as they raise their own. I have no idea what to say about 'AI' and the rapid reconfiguration of our relationship with the world that's going to happen as a result. All I can tell them is that we're in this together and we'll try to figure it out as we go.

Good luck everybody!

reply

upvote

by tadfisher2 hours ago|

[-]

I would think your parents thought about television more than refrigerators. That's one technology that really set the world on a new trajectory. Imagine if Nixon won the presidency in 1960, if we didn't have real-time video of the Apollo landings, or if America stayed in Vietnam for another ten years.

reply

upvote

by HPsquared42 minutes ago|

[-]

Television and radio set the parameters for the "single-stream culture" that emerged in the 20th century. Mostly a result of the limited bandwidth of early broadcast technology, so everyone had to watch the same few channels.

Web 2.0 broke this into millions of creators. Generative AI produces everything on-demand, but again there is a small number of (polymorphic) models producing the content.

reply

upvote

by roegerle5 hours ago|

[-]

I feel so old now

reply

upvote

by NoOn35 hours ago|

[-]

You know a world without refrigerators? :)

reply

upvote

by badsectoracula5 hours ago|

[-]

Interesting site, though it does seem to miss some of Mistral's stuff - specifically, Mistral Small 3 which was released under Apache 2.0 (which AFAIK was the first in the Mistral Small series to use a fully open license - previous Mistral Small releases were under their own non-commercial research license) and its derivatives (e.g. Devstral -aka Devstral Small 1- which is derived from Mistral Small 3.1). It is also missing Devstral 2 (which is not really open source but more of a "MIT unless you have lot of money") and Devstral Small 2 (which is under Apache 2.0 and the successor to Devstral [Small] - and interestingly also derived from Mistral Small 3.1 instead of 3.2).

reply

upvote

by ai_bot5 hours ago|

[-]

Good catches — just added Devstral Small 1 (May 2025, Apache 2.0), Devstral 2 (Dec 2025, modified MIT), and Devstral Small 2 (Dec 2025, Apache 2.0). Thanks for the feedback!

reply

upvote

by Sajarin3 hours ago|

[-]

Shameless plug but made a similar tree here: https://sajarin.com/blog/modeltree/

reply

upvote

by l-p27 minutes ago|

[-]

Thanks, that's way more useful to me.

Allow me to contribute:

> Magistral: Magist(rate) + stral? Mag(nificent) + stral? Nobody knows.

That's just French for "masterful" or a way to describe lectures. There's a sense of greatness in that word that contrasts with the Mini in Ministral which is in turn might be a pun on "ménestrel" (minstrel), "ministre" (minister), or made to sound like Minitel (or all of the above).

reply

upvote

by NitpickLawyer13 hours ago|

[-]

Misses a few interesting early models: GPT-J (by Eleuther, using gpt2 arch) was the first-ish model runnable on consumer hardware. I actually had a thing running for a while in prod with real users on this. And GPT-NeoX was their attempt to scale to gpt3 levels. It was 20b and was maybe the first glimpse that local models might someday be usable (although local at the time was questionable, quantisation wasn't as widely used, etc).

reply

upvote

by pu_pe12 hours ago|

[-]

GPT-J was the one that made me really interested in LLMs, as I could run it on a 3090.

Some details on the timeline are not quite precise, and would benefit from linking to a source so that everyone can verify it. For example, HyperClOVA is listed as 204B parameters, but it seems it used 560B parameters (https://aclanthology.org/2021.emnlp-main.274/).

reply

upvote

by ai_bot12 hours ago|

[-]

Great idea! Thanks

reply

upvote

by ai_bot13 hours ago|

[-]

Great catches — just added GPT-Neo (2.7B, Mar 2021), GPT-J (6B, Jun 2021), and GPT-NeoX (20B, Apr 2022). Thanks!

reply

upvote

by Maro10 hours ago|

[-]

This would be interesting if each of them had a high-level picture of the NN, "to scale", perhaps color coding the components somehow. OnMouseScroll it would scroll through the models, and you could see the networks become deeper, wider, colors change, almost animated. That'd be cool.

reply

upvote

by ai_bot10 hours ago|

[-]

Thanks! Great idea

reply

upvote

by wobblywobbegong8 hours ago|

[-]

Calling this "The complete history of AI" seems wrong. LLM's are not all AI there is, and it has existed for way longer than people realize.

reply

upvote

by ai_bot8 hours ago|

[-]

Fair point — updated the tagline to 'The complete history of LLMs'. AI as a field goes back decades; this is specifically tracking the transformer/LLM era from 2017 onward

reply

upvote

by nubg8 hours ago|

[-]

Most of "AI" before ChatGPT was just researchers wasting public grant money, eg BLOOM.

reply

upvote

by gordonhart8 hours ago|

[-]

Easy to forget but there was a ton of industry+investor excitement around computer vision from ~2015-2021, to the extent that the "MLops" niche sprung up around it. This was called AI at the time, and mostly went out the window when general-pupose pretrained models arrived.

reply

upvote

by stuxnet791 hours ago|

[-]

I would place the beginning of the computer vision hype at 2012 or so when the AlexNet paper came out.

Also an aside, it is mind boggling to me how pre-2021 ML is now ancient history.

reply

upvote

by _verandaguy2 hours ago|

[-]

This is ignoring ML which has existed for decades.

Neural networks, computer vision, sentiment analysis, all of these and more have provided an unspeakable amount of value over the years.

reply

upvote

by bigstrat20037 hours ago|

[-]

And now it's private companies wasting investor money. Not sure there's much difference between the two.

reply

upvote

by Panoramix3 hours ago|

[-]

Nice overview. Some of the descriptions are quite thin on details, like "new model by x", or "latest model by y". Well of course it was new at the time but that doesn't really add information.

reply

upvote

by jvillasante10 hours ago|

[-]

Why is it hard in the times where AI itself can do it to add a light mode to those blacks websites!? There are people that just can't read dark mode!

reply

upvote

by Lerc9 hours ago|

[-]

Visual presentation has been a weak point of AI generation for me. There isn't a lot of support for them seeing how a potential presentation might appear to a human.

Models that take visual input seem more focused on identifying what is in the image compared to what a human might perceive is in an image, and most interfaces lack any form of automated feedback mechanism for them to look at what it has made.

In short, I have made some fun things with AI but I still end up doing CSS by hand.

reply

upvote

by ai_bot10 hours ago|

[-]

Thank you! Sorry for the inconvenience. I'll add it a bit later

reply

upvote

by hmokiguess10 hours ago|

[-]

Would be nice to see some charts and perhaps an average of the cycles with a prediction of the next one based on it

reply

upvote

by ai_bot10 hours ago|

[-]

Thanks! I'll add some charts

reply

upvote

by adt8 hours ago|

[-]

750+ here:

https://lifearchitect.ai/models-table/

reply

upvote

by ai_bot8 hours ago|

[-]

Great resource — Dr. Thompson's table is exhaustive. llm-timeline.com takes a different angle: visual timeline format, focused on base/foundation models only, filterable by open/closed source. Different tools for different needs.

reply

upvote

by YetAnotherNick9 hours ago|

[-]

It misses almost every milestones, and lists Llama 3.1 as milestone. T5 was much bigger milestone than almost everything in the list.

reply

upvote

by embedding-shape9 hours ago|

[-]

> T5 was much bigger milestone than almost everything in the list.

It's in the timeline though? Or are you saying that one should somehow be highlighted, even though none of the other ones are? Seems it's just chronological order, with no one being more or less visible than others, as far as I can see.

reply

upvote

by YetAnotherNick9 hours ago|

[-]

Some are highlighted and listed as milestones.

reply

upvote

by ai_bot9 hours ago|

[-]

Fair point on T5 — just marked it as a milestone. On Llama 3.1: it's there as a milestone because it was the first open model to match GPT-4 at 405B, which felt like a genuine inflection point. Happy to debate the milestone criteria though — what would you add?

reply

upvote

by YetAnotherNick8 hours ago|

[-]

That was llama 3, which is marked as milestone already.

Also I would say add apple/DCLM-7B(not as milestone imo) as it was kind of the first fully open model which was at least somewhat competitive with closed data model.

reply

upvote

by 8 hours ago|

[-]

deleted

reply

upvote

by varispeed10 hours ago|

[-]

The models used for apps like Codex, are they designed to mimic human behaviour - as in they deliberately create errors in code that then you have to spend time debugging and fixing or it is natural flaw and that humans also do it is a coincidence?

This keeps bothering me, why they need several iterations to arrive at correct solution instead of doing it first time. The prompts like "repeat solving it until it is correct" don't help.

reply

upvote

by embedding-shape9 hours ago|

[-]

> as in they deliberately create errors in code that then you have to spend time debugging and fixing

No, all the models are designed to be "helpful", but different companies see that as different things.

If you're seeing the model deliberately creating errors so you have something to fix, then that sounds like something is fundamentally wrong in your prompt.

Besides that, I'm guessing "repeat solving it until it is correct" is a concise version of your actual prompt, or is that verbatim what you prompt the model? If so, you need to give it more details to actually be able to execute something like that.

reply

upvote

by varispeed5 hours ago|

[-]

> then that sounds like something is fundamentally wrong in your prompt.

I am holding it wrong?

reply

upvote

by embedding-shape3 hours ago|

[-]

Some things take a bit of skill to use, yes. Like not everyone can play music with a guitar, you need to train a bit before it sounds OK.

reply

upvote

by koakuma-chan6 hours ago|

[-]

> If you're seeing the model deliberately creating errors so you have something to fix, then that sounds like something is fundamentally wrong in your prompt.

No, all these models are just bad for anything that they weren't RLed for, and decent for things they were. Decent, because people who evaluate them aren't experts.

reply

upvote

by embedding-shape6 hours ago|

[-]

> No, all these models are just bad for anything that they weren't RLed for, and decent for things they were

Are you claiming that the models are RLed to intentionally adding errors to our programs when you use them, or what's the argument you're trying to make here? Otherwise I don't see how it's relevant to how I said.

reply

upvote

by koakuma-chan6 hours ago|

[-]

No, I am making the argument that models have poor capabilities outside of tasks they are RLed for, and their capabilities inside those tasks are only as good as capabilities of people evaluating their responses, i.e. not great. Even if you instruct the model "don't do X" or "do X this way"—you cannot rely on the model following that instruction. This means that there is nothing you can do if model makes "errors."

Not necessarily relevant, but fun, I had the ChatGPT model correct itself mid-response when checking my math work. It started by saying that I was wrong, then it proceeded to solve the problem and at the end it realized that I was correct.

reply

upvote

by embedding-shape5 hours ago|

[-]

> Even if you instruct the model "don't do X" or "do X this way"—you cannot rely on the model following that instruction.

Why not? I can definitively fire of two prompts to the same model and harness, and one include "don't do X" and the other doesn't, and I get what I expect, one didn't try to avoid doing X, and the other did. Is that not your experience using LLMs?

reply

upvote

by koakuma-chan5 hours ago|

[-]

It depends on the instruction, and how many other instructions there are. Models converge on doing things the way that emerged from their training, and with every turn the model cares less and less about your instructions. In practice, this means that after you had the model plan and execute the plan, you almost always end up having to iterate on the output because during the process of outputting the output the model began to derail and ignore instructions. You get things like "In a real app, we would do X, for now, just return null" or various subtle bugs.

It makes sense if you remember that it just predicts, what should probably be the next piece of text?

reply

upvote

by embedding-shape5 hours ago|

[-]

I understand how they work, as I do work with them everyday and been doing so for two years or so. What I don't understand, is how what you're saying is in any way related to the whole "deliberately create errors in code" part, which is where I jumped into the discussion.

Maybe I'm missing some bigger picture you're trying to paint here? I understand (and see) them making "mistakes" all the time, and I guess you could argue it's deliberate in some way, because it's simply how they work and adjusting the prompt and redoing usually solves the problem. But I'm afraid I don't see how it's connected, at least yet.

reply

upvote

by koakuma-chan3 hours ago|

[-]

Nope, no bigger picture. That's all I meant.

reply

upvote

by SignalStackDev4 hours ago|

[-]

[dead]

reply

upvote

by EpicIvo9 hours ago|

[-]

Great site! I noticed a minor visual glitch where the tooltips seem to be rendering below their container on the z-axis, possibly getting clipped or hidden.

reply

upvote

by ai_bot9 hours ago|

[-]

Thanks for the feedback! I'll fix it asap.

reply