indeed, I'm running to two problems on the analyzer side: 1. align model sliding off (especially w/ chorus/back vocals present) 2. transcript skipping parts of lyrics in lyrics-heavy tracks (I tried a lot of russian rap, lol)
happy for contributions as I'm not that experienced w/ machine learning side of the project, mostly it was emperical "tweak the parameters and look what is changed"