Thanks!
No, it cannot do this yet, as it's using a pipeline of voice to text to voice. I think models are heading in that direction, and voice to voice models are getting better.
For pitch accent, shadowing is a great way to improve. You can pause and repeat the tutors messages for example, or read out the word when doing flashcard reviews (copying the flashcard audio).