But right now people make it a hobby, and that thing can run on a laptop.
This is just so wild.
This projects shares similarities with Minix. Minix is still used at universities as an educational tool for teaching operating system design. Minix is the operating system that taught Linus Torvalds how to design (monolithic) operating systems. Similarly having students adding capabilities to GuppyLM is a good way to learn LLM design.
Absolutely. If you loaded this into an agentic coding harness with a decent model, I can practically guarantee it would be able to help you figure out what's going on.
> there is no more need for writing high level docs?
Absolutely not. That would be like exploring a cave without a flashlight, knowing that you could just feel your way around in the dark instead.
Code is not always self-documenting, and can often tell you how it was written, but not why.
My non-coder but technically savvy boss has been doing this lately to great success. It's nice because I spend less time on it since the model has taken my place for the most part.
Hah, you realize the same thing is going on in your boss's head right? The pie chart of Things-I-Need-stronglikedan-For just shrank tiny bit...
Also, large codebases are harder to understand. But projects like these are simple to discuss with an LLM.
Do LLMs not take comments into consideration? (Serious question - I'm just getting into this stuff)
For the readers/learners, it's useful to understand the differences so we know what details matter, and which are just stylistic choices.
This isn't art; it's science & engineering.
No one, including the GP, said it was.
Well, the person who asked the question, for one. I'm sure they're not the only one. Best not to assume why people are asking though, so you can save time by not writing irrelevant comments.
https://dailyai.com/2025/05/create-a-replica-of-this-image-d...
You> hello Guppy> hi. did you bring micro pellets.
You> HELLO Guppy> i don't know what it means but it's mine.
But the character still comes through in response :)
Food (not dying) is the goal of organisms.
A rock is maybe not a good counterexample, but a crystal is because it can grow over time. So in some sense, it tries not to break. However a crystal cannot make any choices; it's behavior is locked into the chemistry it starts with.
https://en.wikipedia.org/wiki/List_of_countries_by_total_fer...
Now, I ask, have LLMs ben demystified to you? :D
I am still impressed how much (for the most part) trivial statistics and a lot of compute can do.
How does it handle unknown queries?
If Guppy doesn't know regular expressions yet, could I teach it to it just by conversation? It's a fish so it wouldn't probably understand much about my blabbing, but would be interesting to give it a try.
Or is there some hard architectural limit in the current LLM's, that the training needs to be done offline and with fairly large training set.
https://github.com/arman-bd/guppylm/blob/main/guppylm/genera...
Uses a sort of mad-libs templatized style to generate all the permutations.
Laughed loudly :-D
Honestly, I never expected this post to become so popular. It was just the outcome of a weekend practice session.
How many parameters would you need for that?
There is nothing wrong using AI tools to write code, but nothing here seems to have taken more than a generic 'write me a small LLM in PyTorch' prompt, or any specific human understanding.
The bar for what constitutes an engineering feat on HN seems to have shifted significantly.
I want to really praise the (unintentional?) nod to Nagel, by limiting capabilities to representation of a fish, the user is immediately able to understand the constraints. It can only talk like a fish cause it’s very simple
Especially compared to public models, thats a really simple correspondence to grok intuitively (small LLM > only as verbose as a fish, larger LLM > more verbose) so kudos to the author for making that simple and fun.
Nagel's point was quite literally the opposite[1] of this, though. We can't understand what it must "be like to be a bat" because their mental model is so fundamentally different than ours. So using all the human language tokens in the world can't get us to truly understand what it's like to be a bat, or a guppy, or whatever. In fact, Nagel's point is arguably even stronger: there's no possible mental mapping between the experience of a bat and the experience of a human.
[1] https://www.sas.upenn.edu/~cavitch/pdf-library/Nagel_Bat.pdf
In LLM-discussions, obviously-fictional characters can be useful for this, like if someone builds a "Chat with Count Dracula" app. To truly believe that a typical "AI" is some entity that "wants to be helpful" is just as mistaken as believing the same architecture creates an entity that "feels the dark thirst for the blood of the living."
Or, in this case, that it really enjoys food-pellets.
I might for example say a human entered a building, a bat might on the other hand think "some big block with two sticks moved through a hole", but both are experiencing a shared physical observation, and there is some mapping between the two.
Its like when people say, if there are aliens they would find the same mathematical constants thet we do
I’m not going to argue other than to say that you need to view the point from a third party perspective evaluating “fish” vs “more verbose thing,” such that the composition is the determinant of the complexity of interaction (which has a unique qualia per nagel)
Hence why it’s a “unintentional nod” not an instantiation
* How training. In cloud or in my own dev
* How creating a gguf
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/user/gupik/guppylm/guppylm/__main__.py", line 48, in <module>
main()
File "/home/user/gupik/guppylm/guppylm/__main__.py", line 29, in main
engine = GuppyInference("checkpoints/best_model.pt", "data/tokenizer.json")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/gupik/guppylm/guppylm/inference.py", line 17, in __init__
self.tokenizer = Tokenizer.from_file(tokenizer_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception: No such file or directory (os error 2)
`````` # after config device checkpoint_path = "checkpoints/best_model.pt"
ckpt = torch.load(checkpoint_path, map_location=device, weights_only=False)
model = GuppyLM(mc).to(device) if "model_state_dict" in ckpt: model.load_state_dict(ckpt["model_state_dict"]) else: model.load_state_dict(ckpt)
start_step = ckpt.get("step", 0) print(f"Encore {start_step}") ```
https://huggingface.co/datasets/arman-bd/guppylm-60k-generic
Then, some criticism. I probably don't get it, but I think the HN headline does your project a disservice. Your project does not demystify anything (see below) and it diverges from your project's claim, too. Furthermore, I think you claim too much on your github. "This project exists to show that training your own language model is not magic." and then just posts a few command line statements to execute. Yeah, running a mail server is not magic, just apt-get install exim4. So, code. Looking at train_guppylm.ipynb and, oh, it's PyTorch again. I'm better off reading [2] if I'm looking into that (I know, it is a published book, but I maintain my point).
So, in short, it does not help the initiated or the uninitiated. For the initiated it needs more detail for it to be useful, the uninitiated more context for it to be understood. Still a fun project, even if oversold.
[1] https://spreadsheets-are-all-you-need.ai/ [2] https://github.com/rasbt/LLMs-from-scratch