upvote
A non-autoregressive transformer trained with a classification objective.
reply
These are absurdly effective for this kind of task. Training is fast and straight forward. Packaging for deployment as ONNX is pretty simple as well.
reply
As a follow up to the original article, I added a new experiment using Logistic Regression and the results are very good. It actually improves on the accuracy by a few points.

More details here: https://www.teachmecoolstuff.com/viewarticle/using-logistic-...

reply