upvote
Amazing project. This has the same feel as Karpathy’s classic “The Unreasonable Effectiveness of Recurrent Neural Networks” blog post. I think in 10 years’ time we will look back and say “wow, this is how it started.”
reply
You mentioned it took 100 gpu hours, what gpu did you train on?
reply
Mostly 1xA10 (though I switched to 1xGH200 briefly at the end, lambda has a sale going). The network used in the post is very tiny, but I had to train a really long time w/ large batch to get somewhat-stable results.
reply