undefined

points

[-]

Just tried it with B.E.D - Walk Away[0], unfortunately it lost track of the lyrics after 30 secs (Model is "large-v3"). Will play around a bit more, as it would be great to have a working karaoke generator.

Some quick feedback:

  - Needs a way to skip for-/backwards during playback to validate the result
  - Sentences seem to be recognized (first letter has uppercasing), but periods aren't added
  - Needs an option to edit results from the track analysis

Thanks for keeping it FOSS!

[0]: https://www.youtube.com/watch?v=_MFT4H3VoNE

by djtango11 hours ago|

parent|

[-]

Periods in song lyrics?

by gaudystead5 hours ago|

parent|

[-]

I'm guessing they mean punctuation in general?

by rzzzzru9 hours ago|

parent|

prev|

[-]

hey mate! thanks for your feedback.

indeed, I'm running to two problems on the analyzer side: 1. align model sliding off (especially w/ chorus/back vocals present) 2. transcript skipping parts of lyrics in lyrics-heavy tracks (I tried a lot of russian rap, lol)

happy for contributions as I'm not that experienced w/ machine learning side of the project, mostly it was emperical "tweak the parameters and look what is changed"

by rzzzzru9 hours ago|

parent|

[-]

also model only affects the transcript job (I need to make it clearer in the UI). For the alignment, it's a single model provided by whisperx

by evanjrowley8 hours ago|

prev|

[-]

Amazing work! I am thrilled someone was motivated to approach this problem and develop a creative solution like this. There are very limited options for Karaoke, especially in the FOSS space. Most Karaoke apps are super limited and that's driven many Karaoke enjoyers I know to YouTube in search of the songs they want to sing. This solution would give them the power to do even more songs, even better than what's out there now!

Questions for you:

1. What CUDA capability level is necessary for Nvidia GPU accelleration to work?

3. Are there any plans to support iGPU/NPU accelleration on AMD and Intel? Asking because those chips are most common in the mini computers sold at low cost these days.

My family members who love Karaoke and will be happy to try this. Looking forward to it!

by rzzzzru5 hours ago|

parent|

[-]

hi!

1. Maxwell+ should work well 3. I would need to explore, you can join the discord or the mailing list on the website!

cheers!

by solstice11 hours ago|

prev|

[-]

Excited to try this out. How well does WhisperX deal with lyrics in say Mandarin or Cantonese? Does it output Hanzi?

by rzzzzru9 hours ago|

parent|

[-]

I haven't tried Mandarin and Cantonese, but tried Japanese. back at that time, it performed poorly. however, I've tweaked a bunch of settings since then, so maybe it has changed. Hanzi is a supported font and can be output, but the transcript/alignment quality might not be the best

by samtp5 hours ago|

prev|

[-]

I just want to say how much I love that you used Dean Blunt in the example video

by rzzzzru5 hours ago|

parent|

[-]

one of my favorite artists and this one is one of my favorite tracks in general. cheers!

by defrost11 hours ago|

prev|

[-]

Struggled somewhat with Tjamuku Ngurra by the Tjintu Desert Band, absolutely nailed Mariah Carey's Ken Lee.

by philsnow8 hours ago|

parent|

[-]

( https://knowyourmeme.com/memes/ken-lee )

by antihero12 hours ago|

prev|

[-]

This looks like awesome awesome fun! Will let you know how it runs. What a wonderful idea <3

by throwaway7436 hours ago|

prev|

[-]

Just tried No_4mat's 1992... unfortunately it didn't work :(