You don't need a neural network. Traditional NLP is far better at this task. The keyword you're looking for is "phoenemizer"
I'm surprised traditional NLP being better than ML models for this task, can you point me to a benchmark analysis pointing out that non-neural Espeak-ng is better than ML models?
Also, I asked for a neural model for another reason as well, I still want semantic knowledge present, I want more than pronunciation, but before I use myself as a test subject, I want to make sure I get the proper pronunciation in case the highly speculative "uploading game" works... I don't want to early systematically mis-train myself on pronunciation...