upvote
You can download the books and run them through a tokenizer. I did that half a year ago and got ~2M.
reply