upvote
I'm interested to know about the approaches that you tried with the ML, and then decided to not use. In practice, the options are so many. How did you come up with the final approach - and was there a systematic way to decide which options to go for?
reply
Outstanding work! I've participated in the challenge, but didn't get far. One of the questions I had at the time was - if I'm going to use ML to detect ink, could it invent hallucinated letters, or even parts of text, and how to prevent that?
reply
Yes, it's quite possible for ML to hallucinate ink, though it is on a much more local scale, like predicting a slightly longer stroke, filling in more of a character than is actually in the data, etc. Perhaps enough to change a reading of a character or show where ink isnt. It is difficult for ink detection to hallucinate grammatical and idiomatic greek and latin.
reply
What is the input to the ML algorithm? Does it know the surrounding context so that it has a chance to deduce "if this stroke is slightly longer then the end result will be idiomatic greek and latin"?
reply
The input is 3d chunks of reconstructed CT data from our scans. I can't remember the specifics but maybe enough voxels for .5mm^3 at a time or so? They're all available for free from https://registry.opendata.aws/vesuvius-challenge-herculaneum... . Our trained models are all available at https://huggingface.co/scrollprize
reply
Not all machine learning is generative AI.
reply
True but like regular document scanning software there can be errors in detection.
reply
Just as with redacted documents (consistently blocked terms) or bad OCR jobs (wrong or missing characters), even if only a certain percentage comes out unmangled it is more readable than having no data at all.

A stable base corpus and some dynamic programming will allow you to clean up the remainder[0].

[0]: http://stackoverflow.com/a/11642687/2449774

reply
Absolutely incredible work. This is one of the most amazing news articles I’ve encountered in decades. Congratulations team!
reply
What are the wildest, most exciting but plausible things that might be discovered in these documents?
reply
I am not a papyrologist or a classicist, rather I'm a computer scientist, so my expertise is unfortunately not in _what_ the scrolls say, rather how we get there. That being said I think and hope that there will be a trove of things that has no known provenance at all, completely lost works that elude the public memory.
reply
Your response reminds me of Nigel Richards :)

https://en.wikipedia.org/wiki/Nigel_Richards

Congratulations, and thank-you!

reply
Well what were your first thoughts when you decoded the script, besides the obvious Eureka, after making some sense of the texts?
reply
Probably a lot more texts of Epicurean philosophy and not a whole lot else unfortunately according to my papyrologist friend.
reply
That's what was thought, but maybe not -- only one of the three so far looks Epicurean, which is not what was expected. Maybe it's a fluke, but historians are buzzing a bit about whether it might be broader than expected.
reply
Why would Epicurean philosophy be unfortunate?

I was under the impression that there was almost nothing left of that school of thought, and that it’s writings had been destroyed.

What would you like to have instead?

reply
The unfortunate part is the lack of anything else therein, not that it's Epicurean philosophy.
reply
The Jewish Talmud uses Epicurus's name as a term meaning "heretic".
reply
Here's a list. The scrolls are from a library that burned in 79 AD.

https://en.wikipedia.org/wiki/List_of_lost_literary_works

reply
Woah there was a lost Homer epic comedy about a bumbling fool named Margites?
reply
How do get to do that? As in what did you study to get the prerequisite knowledge, and how did you find this particular job? When I see interesting jobs I'm anyways curious what path lead there
reply
I am a computer scientist. I studied CS in university, worked in the semiconductor industry for a while, got started as a participant in the challenge aspect of the Vesuivus Challenge. They were hiring, I sent in an application, interviewed, and was offered the job.
reply
That last sentence is so perfect, like my dad answering the question of how he lost weight. "I ate less and exercised more."
reply
Amazing work, fantastic!
reply
deleted
reply
No questions, but I just want to say this is really exciting work!
reply
Given the current rate of progress, how long do you think it will take to decipher the entire collection?
reply
That's a tough one to give a strong estimate of. Some scrolls are easier or harder to unwrap and read for a multitude of different reasons, mostly due to how damaged the scroll was in the eruption, and how easy or not the ink is to read. IIRC from what we've scanned of the herculaneum collection, none of the ink is easily visible via spectrum alone, so we have to use a lot of ML and physically based rendering techniques to be able to find ink. That also requires unwrapping and segmentation _before_ any ink detection.

For iron gall ink with high enough iron concentration, the ink stands out in the xray volume through simply masking off low values, such as was shown in our campfire scroll experiment a few years ago. No herculaneum scrolls show similar ink.

reply
Do you think this particular scroll is easier or harder to read that the others will be? Or about average?
reply
Pherc1667 was quite small and just so happened to have readable ink, so it was easier than I expect most others to be.
reply
Do we known what ink is used?
reply
Most of the evidence so far points towards carbon based ink. I am not sure if any of the scrolls we have scanned show strong evidence of iron gall based ink. I know that there are different types and preparation methods for different carbon based inks, but I do not know if it is possible to determine which kind(s) were used solely from inspecting the xrays.

I am, though, not a papyrologist, so historical ink making, preparation, and usage are not my field.

reply
this is überragend. by many means!
reply
Did anyone on the team come from a non-science, non-math, non-academia background? Did anyone working on this just teach themselves and start contributing?
reply
Yes. Sean, who was a co-winner of the 2024 prize, IIRC has no formal background in ML, computer science, AI, etc. He is one of our core researchers and the most productive team member.
reply
I've been on the Discord for a couple of years now, and poking around with submissions as well. Sean and the entire team deserve so much praise for all of this work.

It's easy to just read about the breakthrough and see it as one neat, linear line to get there, and hard to comprehend the hours, months and years that so many spent to get there. Big congrats to you, Sean, Nat and the entire team!

reply
Are the fragments destroyed in ‘69 and ‘80 available to be read similarly? Or were they disposed of?
reply
I am unaware of those fragments in particular. Though we have scanned a dozen or so fragments, mostly to help guide ink detection, since the ink in them is often more visible in visible and/or near IR light, but can be hard to impossible to detect in the xray spectrum.
reply
I don't have any questions, just a comment.

You have a potential to rewrite the history of European Antiquity quite substantially. The Herculaneum set of scrolls is enormous and must contain a lot of hitherto unknown.

That comes with a set of peculiar risks. Once your work starts producing something that contradicts previous work of Very Important People, they will lobby to stop you. Be prepared for that.

Science should be neutral and always value new evidence. Scientists as humans are unfortunately subject to all sorts of passions.

reply