undefined

points

[-]

So, the format for musicologist and researcher in music is the MEI format: https://music-encoding.org/ for which the reference engraver is verovio: https://www.verovio.org/index.xhtml Note that verovio is able to engrave in svg format while keeping a maximum of information from the original mei score, meaning that you can extract enough metadata to create an actual detection dataset for a deep learning model. This is my horrible hacked up script that will create a coco dataset from a set of scores engraved with verovio: https://github.com/kwon-young/music/blob/main/svg2pl.py I have published a synthetic music score dataset from this: https://www.kaggle.com/datasets/kwonyoungchoi/trompa-coco/da... I anyone wants to try and fit a detector on top is welcome :)

To understand why OMR is so neglected is because most people widely underestimate the difficulty of the task. It has a specific blend of the most extreme shapes combined with an extremely complicated graphical grammar...

by peatmoss4 hours ago|

parent|

[-]

Thank you for this! Both MEI format and the Verovio engraver are news to me. I will check them out.

My first thought was whether MEI format is being added to MuseScore (the sheet music editor I use these days). It looks like it is: https://music-encoding.org/musescore-doc/

As a somewhat related aside, now that the MuseScore people own Hal Leonard and seem to pushing integration with their cloud subscription service, I wonder if they'll see some of these directions as potentially competing with them. I don't think there's anyone who wouldn't love a transposable clean digital version of their Real Books... and if Hal Leonard is in the business of selling Real Books, I can see where good OMR might be a problem for them. I guess piracy of scanned versions is already rampant, so maybe it's a wash.

by indiv05 hours ago|

prev|

[-]

> music is basically a greenfield for AI wherever you look

AIN'T THAT THE TRUTH.

My girlfriend is studying musicology and she has some physical disabilities that make it difficult for her to write things down sometimes. So I try to help her by writing some AI-powered TTS/OCR/etc. apps here and there. It becomes painfully obvious that music was never considered an important part of any AI training dataset, anywhere.

These days, I'm pleasantly surprised by how well Opus 4.8 understands/explains music theory (as you said). But ask him to transcribe/OCR/OMR some sheet music and he'll confidently give you the MusicXML/Lilypond equivalent of "2 + 2 = horse".

I really hope this ignored area will be swept up with the rest of the rising AI wave, but it's still criminally undervalued.

by mejutoco1 hours ago|

parent|

[-]

> how well Opus 4.8 understands [...] and he'll confidently

I always think of the nun character against AI in Mrs Davis:

> "Don't give it a name. No one calls Facebook Doug. No one calls Twitter Mary Lou. No one calls them anything, because no one uses them anymore. They use it, and it's not a person. It's code. - Mrs Davis

by mft_9 minutes ago|

parent|

[-]

Eh, humans give names to and/or anthropomorphise lots of things. My partner names all of her cars and bikes; I don't. Isn't it more rational to feel some sort of connection and anthropomorphise a tool with which you can at least have an intelligent conversation, than a simple machine?

by peatmoss4 hours ago|

parent|

prev|

[-]

I recently left a job at where I was working with open data producers / providers across a lot of domains. A lot of data is produced and released for free by governments and nonprofits because it's either directly part of the mission, or it's a natural byproduct of the organization's mission. Occasionally, you'd have really great datasets come out of industry / commercial organizations because the data were a byproduct and didn't create a scenario where a data release would create opportunity for competition.

I've been thinking about what kind of organization could be self-sustaining and also produce good music AI training data as a natural byproduct. An ideal arrangement would be something that provided some incentive or benefit to musicians in exchange for their recorded interpretation of sheet music. Soundslice, mentioned by another user, seems to do that. They let both teachers and students upload recordings of music that has been turned into MusicXML. The recordings, paired to those snippets of sheet music, has to be a gold mine. Assuming they have enough users. If they aren't already working on stem separation and automatic transcription, they probably should be. Still, my hope would be to figure out some kind of sustainable model where that dataset could be created and released for open model development...

As a domain, I see AI in music as a boon to human creativity. I am very much a novice jazz improvisor, and a passable amateur technician on the trombone. Human instructors can do a lot for me, but there's a lot that is "grinding it out" repetition, where I think AI could be a huge aid. I heard Sam Harris on a podcast recently talk about his bullishness on the humanities (paraphrasing: people don't care if a human reads their MRI if detection is good, but people probably do care that a human wrote the novel they're reading).

Music might even be a better example of the irreplaceability of people. While some people might bop along to a tune composed by Suno on the radio, live music is just so much more enjoyable for me. And even better than listening to a live show played by masters, is playing together with friends. To the extent that AI can patiently help us learn the skills to express our own creativity, I'm here for it!

by elasticdog3 hours ago|

prev|

[-]

For just chord analysis, there's "Harte notation", which is meant to be unambiguous representation of the notes (https://ismir2005.ismir.net/proceedings/1080.pdf). That obviously doesn't get you all of the additional information necessary for engraving and full representation of the music, but there are research datasets available using it like https://github.com/smashub/choco. I've also used the https://github.com/MarkGotham/When-in-Rome dataset for some analysis work, but again that's not 100% what you're looking for.

You might like the "iReal Pro" app for the replacement and transposition of jazz standards on your tablet. It's pretty great for that use case versus camera scans.

by singpolyma36 hours ago|

prev|

[-]

What about sheet music typesetting formats like https://abcnotation.com/ ?

by peatmoss5 hours ago|

parent|

[-]

I forgot to mention ABC. I have seen a few LLMs look at that. There was a model / paper published a couple years back called ChatMusician that built around it.

With the caveat that I'm not terribly fluent in ABC, it seems to me that simple things are simple, but hard things seem to be nearly pathological. And (again, maybe a lapse in my understanding) it seems like there may be a fair number of concepts that are impossible to convey in ABC?

Lastly, if I understand correctly, ABC got its start and is mostly popular as a simplified format for church songbooks. I'd imagine that would, uh, influence the training corpora towards sounding a bit... church songbooky.

EDIT: I may have been overly dismissive of ABC on first glance. It does seem like people have extended it quite a bit, and that it's at least, in theory, capable of encoding most of what I'd expect. And it's human readable, which is a benefit. Though, readability does take a stiff penalty the more richness you add (e.g. dynamics, articulations, stacked notes, etc)

by WhitneyLand4 hours ago|

parent|

prev|

[-]

The simplicity is really cool.

To let LLMs compose music I chose json for context efficiency, but this seems like it could be better choice, simple, efficient, already a real format.

https://github.com/whitneyland/riffmcp

by genxy5 hours ago|

prev|

[-]

Create a benchmark for this problem that researchers can easily run and the problem will solve itself.

by mcbetz6 hours ago|

prev|

[-]

I observe that music OCR space and the only really good solution so far is soundslice. You scan and review some edge cases and get really good results. Paid service by a small company, very worthy to be supported!

by peatmoss4 hours ago|

parent|

[-]

I just signed up a trial, and uploaded a messy Real Book scan. It did very well! It missed the coda markings, but then again the directive in the Real Book was nonstandard. I guess that's a case where a multimodal model might have been able to read the text ("after solos, D.C. al coda") and do something smarter.

by WhitneyLand6 hours ago|

prev|

[-]

“there aren't great corpora of training data that would connect a MusicXML representation to sheet music images or to audio”

It may not be necessary…a lot of the training pairs/data for this could probably be procedurally created via code.

Would be pretty fun to work on and see it come to life.

by peatmoss5 hours ago|

parent|

[-]

I'd imagine that rendered audio that just used midi voices (even high quality "Real Instruments" midi voices) would be pretty brittle for e.g. stem separation or automatic transcription. In a best case, I think you'd start with a clean digital representation, render sheet music imagery, and then have lots of recordings by a bunch of real instrumentalists playing the same music.

On the topic of stem separation, I've wondered about creating a quasi-synthetic dataset by taking chunks of recordings by real musicians playing them back in a real space in various combinations and recording the resulting analog-blended cacophony. Could repeat in various environments like cathedrals, basement bars, etc for realism :-)

by ramses04 hours ago|

prev|

[-]

So I made a comment a while back about lilypond: https://news.ycombinator.com/item?id=46148831

A salient extract:

...but why is it so complicated? A novice interpretation of "music" is "a bunch of notes!" ... my amateur interpretation of "music" is "layers of notes".

You can either spam 100 notes in a row, or you effectively end up with:

    melody   = [ a, b, [c+d], e, ... ]
    bassline = [ b, _, b,     _, ... ]
    music = melody + bassline
    score = [
       "a bunch of helper text",
       + melody,
       + bassline,
       + page_size, etc...
    ]

...so Lilypond basically made "Tex4Music", and the format serves a few dual purposes...[snip]

by aidenn04 hours ago|

prev|

[-]

As someone who has never looked at a jazz score, can you share an example of how jazz sheet music would benefit from different fonts?

by peatmoss4 hours ago|

parent|

[-]

It's just an entrenched aesthetic preference. Jazz fonts (fonts in this context refers both to the words and the music symbols) tend to be quite heavy with thick lines. I've heard that the thick hand-written style was originally to make charts more readable in dimly lit clubs, but with tablets and such, that's an anachronism now.

You can look at samples of Hal Leonard's Real Book(s) on their website to get a sense of what it looks like. Again, just an aesthetic preference, but one I and many others hold nonetheless.

by elasticdog3 hours ago|

parent|

[-]

I also don't love the conventional handwritten aesthetic you often see for jazz fonts. For a project I've been working on, I ended up pulling the handful of chord symbol glyphs out of MuseScore's Leland Text font and adjusting them for use in the UI since I couldn't find a suitable option out there.