upvote
If you have a large enough set to test against and a specific person you are looking for, this is totally doable currently.
reply
Of course it's doable. The question is how reliable the results are.
reply
It just needs to find the needles in the haystack. Humans can better verify if they're truly needles.
reply
Not just a test set, but enough of a set to search through and compare against. Several pages of in-depth writing isn't anywhere near sufficient, even when limiting the search space to ~10k people.
reply
this is a well-studied field (stylometry). when combining writing styles, vocabulary, posting times, etc. you absolutely can narrow it down to specific people.

even when people deliberately try to feign some aspects (e.g. switching writing styles for different pseudonyms), they will almost always slip up and revert to their most comfortable style over time. which is great, because if they aren't also regularly changing pseudonyms (which are also subject to limited stylometry, so pseudonym creation should be somewhat randomized in name, location, etc.), you only need to catch them slipping once to get the whole history of that pseudonym (and potentially others, once that one is confirmed).

reply
People do change over time, I used to write "ha" after every sentence for some reason
reply
You know, i had a particularly cringy period in which i put "la" at the end of sentences.
reply
Don't throw the baby out with the bathwater. "Ooh, la" sounds really unnatural.

But on a serious note, what did "la" mean in your context? I've never seen this.

reply
It’s a common thing for speakers of Singaporean English to end sentences with la/leh. But no idea if that’s what’s going on here.
reply
You left off something.
reply
sure, not denying that. my writing style is fairly different now in my 40s than it was in my late teens/early twenties.

but, those changes are usually pretty gradual and relatively small. thats why when attempting to identify someone via writing, you look at several aspects of the writing and not just word choice (grammar, use of specific slang, sentence length, paragraph structure, punctuation, etc.). it is highly unlikely that all aspects of someones writing changes at the same time. simply removing "ha" is inconsequential to identification if not much else changed.

additionally, this data is typically combined with other data/patterns (posting times, username (themes, length, etc.), writing that displays certain types of expertise, and more) to increase the confidence level of correct identification.

reply
Stylometry is okay if you're trying to deanonymize a large enough sample text. A reddit account would be doable. But individual 4chan posts? You barely have enough content within the text limit.
reply