If you want to be a researcher and come out with the next breakthrough, get ready to go back to school and learn some math.
If you just need to learn how to use it well and build things with it, then you probably just need to have a high level understanding.
Same as programming. I’d bet most programmers have no idea about the physics that makes computers work.
What about improving the efficiency of token consumption, etc., basically opportunities for improving cost/performance?
I keep thinking there has to be a better way to share context with models than dumping entire gigantic skill files of raw text or otherwise into them - I'm betting there's a bunch of low-hanging fruit there.
Which sums up HN these days.
I have no idea about careers at this point, I’m still doing fancy IT work as my day job I and look away from the future with dread. I also haven’t been looking for new roles on the open job market, so who knows maybe there’s multimillion pay packages for anyone who can articulate how attention works in an interview.
https://www.amazon.com/Build-Large-Language-Model-Scratch/dp...
https://www.amazon.com/Build-DeepSeek-Scratch-Abhijit-Dandek...