upvote
Thank you, I will try it, although I'd prefer to translate entire sentences into Thai randomly. Perhaps you can add this advanced mode. Actually, I saw your app before while looking for an alternative to Toucan that supported Thai, but at that point in time you hadn't added support yet. Thanks for doing so.
reply
Okay I installed it and this is pretty great. Although I think your extension doesn't work for Thai the way you think it does. Because there's spaces between sentences instead of between words in Thai, it's translating entire sentences even with the "words only" setting enabled. This is what I want anyways, but will be too difficult for most learners. I have written misc Thai learning softwares and just so you know you should use an LLM to do word-splitting, not a software library. If you do use a library, you need to split words while looking for the largest possible word, but it won't work well. Basically you can't tell without a brain whether it's a lot of small words next to one another or a smaller number of compound words. IME only an LLM or a human will do a good job of this.
reply
Translating entire sentences is the idea - I'm not sure what setting you mean with "words only"? I really ought to make the settings clearer, but it's hard to do when you know what they "ought" to express.

"Translate Isolated Words" allows it to translate "sentences" of only one word, but it doesn't disable full sentences.

And yeah, atm it word splits by spaces for the dictionary. I hadn't thought to do it with LLMs, though that's a good idea. There's a somewhat related problem when doing Furigana, where it has a hashmap of strings-to-pronunciations, and it starts with a 4-character sliding window looking for matches, then a 3 character, etc.

reply
That's a pretty sick idea. Unfortunately I presume it involves sending your browsing data (e.g. page contents) to the server?
reply
Yeah, though I've added lots of privacy protections to at least partially mitigate that:

- There's a global blacklist of sites, as well as phrases in the title/URL (e.g. "bank")

- You can blacklist sites yourself

- Each sentence is run against filters checking for medical/legal/etc info, as well as checks for addresses, card/social security numbers, etc. All the checks are done client side

- There are also some special implementations, e.g. it looks at the source code of websites to work out if they're an instance of an American health portal that I've forgotten the name of - each doctor's surgery self-hosts it.

- Websites can add `nuenki-ignore=true` on their end, if they'd like to disable it.

And of course it doesn't log anything, though there is an anonymous cache in order to make it economical.

reply
What about a whitelist? I might just be interested in only having certain sites, like this one or Reddit, translated into my target language. That way I can be certain it is only turned on for sites that I am OK with sharing browsing history and not be concerned that I might have missed adding something to the blacklist.
reply
That's a good point. At some point I ought to make a UBlock-origin style list of customisable rules.

At the moment I'm focused on translation quality, but I intend to add that.

reply
Thanks!
reply
This is a great idea. Specifically, I want this enabled when I'm wasting time but not when I'm working. So I'd like it to be enabled only on X.com. This whitelist+blocklist functionality could be a user-side setting like with Adblockers.
reply