undefined

upvote

points

by dash215 hours ago |

upvote

by Tarq0n14 hours ago|

[-]

If it works for predicting the next token in a very long stream of tokens, why not. The question is what architecture and training regimen it needs to generalize.

reply