undefined

points

[-]

If you want a written resource I have a blog post about the mathematics behind building a feed forward from scratch, https://max-amb.github.io/blog/the_maths_behind_the_mlp/. Kinda focuses on translation from individual components to matrix operations.

by kflansburg7 hours ago|

prev|

[-]

If you aren't already aware, Karpathy has several videos that could get you there in a few hours https://www.youtube.com/@AndrejKarpathy

by jdw647 hours ago|

parent|

[-]

very thanks!

by lancekey4 hours ago|

parent|

[-]

Also check out his nanochat repo. I used the repo, claude and shadeform to train my own mini model for about $300. Would have been less but I screwed up and let the cloud gpu rental run for a few hours even though the training run errored out.

Of course the model was dumber than GPT2 but still it was a great learning experience.

by glouwbug7 hours ago|

prev|

[-]

It’s just linear algebra. Work your way from feed forward to CNN to RNN to LSTM to attention then maybe a small inference engine. Kaparthy’s llama2.c is only ~300 lines on the latter and it pragma simds so you don’t need fancy GPUs