undefined

points

by wk_end14 hours ago |

comments

by forinti13 hours ago|

[-]

How does it compare to a Markov chain generator I wonder.

by jll2912 hours ago|

parent|

[-]

The Transformer is the more powerful model than Markov chain, but on such a weak machine as the C64, a MC could output text faster - but it surely would sound "psychedelic", as the memory limits a MC to a first-order or second-order model, so to predict one word, only the two words before would be taken into account as context (and no attention).

On a plain vanilla C64, the Transformer cannot really show what it's capable of doing. An implementation using 2 bit per weight (vectorized) could be slightly better, perhaps.

by yorwba3 hours ago|

parent|

[-]

You can build an unlimited-order Markov chain by, instead of pre-computing a table of counts for all possible contexts, using a substring-search index on the training data to count possible continuations on the fly: https://arxiv.org/abs/2401.17377 That paper uses suffix arrays, but more compact indices are possible: https://arxiv.org/abs/2506.12229

by pizza23412 hours ago|

prev|

[-]

[dead]