> But cutting down to 2 million 3 bit parameters or something like that would definitely be possible.
Sure, but there's no free lunch > Hey look what I just found
I've even personally built smaller "L"LMs. The first L is in quotes because it really isn't large (So maybe lLM?) and they aren't anything like what you'd expect and certainly not what the parent was looking for. The utility of them is really not that high... (there are special cases though) Can you "do" it? Yeah. I mean you can make a machine learning model of essentially arbitrary size. Will it be useful? Obviously that's not guaranteed. Is it fun? Yes. Is it great for leaning? Also yes.And remember, Tiny Stories is 1GB of data. Can you train it for longer and with more data? Again, certainly, BUT again, there are costs. That Minecraft one is far more powerful than this thing.
Also, remember that these models are not RLHF'd, so you really shouldn't expect it to act like you're expecting a LLM to work. It is only at stage 0, the "pre-training", or what Karpathy calls a "babbler".