undefined

points

[-]

Amazing project. This has the same feel as Karpathy’s classic “The Unreasonable Effectiveness of Recurrent Neural Networks” blog post. I think in 10 years’ time we will look back and say “wow, this is how it started.”

by janalsncm288 days ago|

prev|

[-]

You mentioned it took 100 gpu hours, what gpu did you train on?

by ollin288 days ago|

parent|

[-]

Mostly 1xA10 (though I switched to 1xGH200 briefly at the end, lambda has a sale going). The network used in the post is very tiny, but I had to train a really long time w/ large batch to get somewhat-stable results.