To understand why OMR is so neglected is because most people widely underestimate the difficulty of the task. It has a specific blend of the most extreme shapes combined with an extremely complicated graphical grammar...
My first thought was whether MEI format is being added to MuseScore (the sheet music editor I use these days). It looks like it is: https://music-encoding.org/musescore-doc/
As a somewhat related aside, now that the MuseScore people own Hal Leonard and seem to pushing integration with their cloud subscription service, I wonder if they'll see some of these directions as potentially competing with them. I don't think there's anyone who wouldn't love a transposable clean digital version of their Real Books... and if Hal Leonard is in the business of selling Real Books, I can see where good OMR might be a problem for them. I guess piracy of scanned versions is already rampant, so maybe it's a wash.
AIN'T THAT THE TRUTH.
My girlfriend is studying musicology and she has some physical disabilities that make it difficult for her to write things down sometimes. So I try to help her by writing some AI-powered TTS/OCR/etc. apps here and there. It becomes painfully obvious that music was never considered an important part of any AI training dataset, anywhere.
These days, I'm pleasantly surprised by how well Opus 4.8 understands/explains music theory (as you said). But ask him to transcribe/OCR/OMR some sheet music and he'll confidently give you the MusicXML/Lilypond equivalent of "2 + 2 = horse".
I really hope this ignored area will be swept up with the rest of the rising AI wave, but it's still criminally undervalued.
I always think of the nun character against AI in Mrs Davis:
> "Don't give it a name. No one calls Facebook Doug. No one calls Twitter Mary Lou. No one calls them anything, because no one uses them anymore. They use it, and it's not a person. It's code. - Mrs Davis
I've been thinking about what kind of organization could be self-sustaining and also produce good music AI training data as a natural byproduct. An ideal arrangement would be something that provided some incentive or benefit to musicians in exchange for their recorded interpretation of sheet music. Soundslice, mentioned by another user, seems to do that. They let both teachers and students upload recordings of music that has been turned into MusicXML. The recordings, paired to those snippets of sheet music, has to be a gold mine. Assuming they have enough users. If they aren't already working on stem separation and automatic transcription, they probably should be. Still, my hope would be to figure out some kind of sustainable model where that dataset could be created and released for open model development...
As a domain, I see AI in music as a boon to human creativity. I am very much a novice jazz improvisor, and a passable amateur technician on the trombone. Human instructors can do a lot for me, but there's a lot that is "grinding it out" repetition, where I think AI could be a huge aid. I heard Sam Harris on a podcast recently talk about his bullishness on the humanities (paraphrasing: people don't care if a human reads their MRI if detection is good, but people probably do care that a human wrote the novel they're reading).
Music might even be a better example of the irreplaceability of people. While some people might bop along to a tune composed by Suno on the radio, live music is just so much more enjoyable for me. And even better than listening to a live show played by masters, is playing together with friends. To the extent that AI can patiently help us learn the skills to express our own creativity, I'm here for it!
You might like the "iReal Pro" app for the replacement and transposition of jazz standards on your tablet. It's pretty great for that use case versus camera scans.
With the caveat that I'm not terribly fluent in ABC, it seems to me that simple things are simple, but hard things seem to be nearly pathological. And (again, maybe a lapse in my understanding) it seems like there may be a fair number of concepts that are impossible to convey in ABC?
Lastly, if I understand correctly, ABC got its start and is mostly popular as a simplified format for church songbooks. I'd imagine that would, uh, influence the training corpora towards sounding a bit... church songbooky.
EDIT: I may have been overly dismissive of ABC on first glance. It does seem like people have extended it quite a bit, and that it's at least, in theory, capable of encoding most of what I'd expect. And it's human readable, which is a benefit. Though, readability does take a stiff penalty the more richness you add (e.g. dynamics, articulations, stacked notes, etc)
To let LLMs compose music I chose json for context efficiency, but this seems like it could be better choice, simple, efficient, already a real format.
It may not be necessary…a lot of the training pairs/data for this could probably be procedurally created via code.
Would be pretty fun to work on and see it come to life.
On the topic of stem separation, I've wondered about creating a quasi-synthetic dataset by taking chunks of recordings by real musicians playing them back in a real space in various combinations and recording the resulting analog-blended cacophony. Could repeat in various environments like cathedrals, basement bars, etc for realism :-)
A salient extract:
...but why is it so complicated? A novice interpretation of "music" is "a bunch of notes!" ... my amateur interpretation of "music" is "layers of notes".
You can either spam 100 notes in a row, or you effectively end up with:
melody = [ a, b, [c+d], e, ... ]
bassline = [ b, _, b, _, ... ]
music = melody + bassline
score = [
"a bunch of helper text",
+ melody,
+ bassline,
+ page_size, etc...
]
...so Lilypond basically made "Tex4Music", and the format serves a few dual purposes...[snip]You can look at samples of Hal Leonard's Real Book(s) on their website to get a sense of what it looks like. Again, just an aesthetic preference, but one I and many others hold nonetheless.