This is a great initiative and I hope to see more come out of this; I am not criticizing, but just want to provide my user experience here so you have data points.
In short, my experience lines up with your native speakers.
I found that it loses track of the phonemes when speaking quickly, and tones don't seem to line up when speaking at normal conversational speed.
For example, if I say 他是我的朋友 at normal conversational speed, it will assign `de` to 我, sometimes it interprets that I didn't have the retroflexive in `shi` and renders it `si`. Listened back to make sure I said everything, the phonemes are there in the recording, but the UI displays the wrong phonemes and tones.
By contrast, if I speak slowly and really push each tone, the phonemes and tones all register correctly.
Also, is this taking into account tone transformation? Example, third tones (bottom out tone) tend to smoosh into a second tone (rising) when multiple third tones are spoken in a row. Sometimes the first tone influences the next tone slightly, etc.
Again, great initiative, but I think it needs a way to deal with speech that is conversationally spoken and maybe even slurred a bit due to the nature of conversational level speech.
Hoping to see improvements in this area
I have just added sandhi support, please let me know if it's working better.
Will comment that the shorter phrases (2-4 characters long) were generally accurate at normal speed, but the longer sentences have issues.
Maybe focusing on the accuracy of the smaller phrases and then scaling that might be a good way to go, since those smaller phrases are returning better accuracy.
Again, really think this is a great initiative, want to see how it grows. :)
Will check once the TV is off in the house. :)
The classical example is 4/4 不是. Which goes bùshì -> búshì.
Or 3/3 that becomes 2/3. E.g. 你好 nǐhǎo becoming níhǎo.
The 1/4 -> 2/4 transformation I think is specific to one. 一个 yīgè becomes yígè.