points
I have not gone through it yet, but this explanation of how transformers work is in the same class as Ciechanowski's work IMO: https://poloclub.github.io/transformer-explainer/