undefined

[-]

Outstanding work! I've participated in the challenge, but didn't get far. One of the questions I had at the time was - if I'm going to use ML to detect ink, could it invent hallucinated letters, or even parts of text, and how to prevent that?

[-]

Yes, it's quite possible for ML to hallucinate ink, though it is on a much more local scale, like predicting a slightly longer stroke, filling in more of a character than is actually in the data, etc. Perhaps enough to change a reading of a character or show where ink isnt. It is difficult for ink detection to hallucinate grammatical and idiomatic greek and latin.

by im3w1l56 minutes ago|

[-]

What is the input to the ML algorithm? Does it know the surrounding context so that it has a chance to deduce "if this stroke is slightly longer then the end result will be idiomatic greek and latin"?

by verditelabs52 minutes ago|

[-]

The input is 3d chunks of reconstructed CT data from our scans. I can't remember the specifics but maybe enough voxels for .5mm^3 at a time or so? They're all available for free from https://registry.opendata.aws/vesuvius-challenge-herculaneum... . Our trained models are all available at https://huggingface.co/scrollprize

by cwnyth46 minutes ago|

[-]

Not all machine learning is generative AI.

by mc3244 minutes ago|

[-]

True but like regular document scanning software there can be errors in detection.

by dleeftink26 minutes ago|

[-]

Just as with redacted documents (consistently blocked terms) or bad OCR jobs (wrong or missing characters), even if only a certain percentage comes out unmangled it is more readable than having no data at all.

A stable base corpus and some dynamic programming will allow you to clean up the remainder[0].

[0]: http://stackoverflow.com/a/11642687/2449774

by tomcam29 minutes ago|

[-]

Absolutely incredible work. This is one of the most amazing news articles I’ve encountered in decades. Congratulations team!

by adriand1 hours ago|

[-]

What are the wildest, most exciting but plausible things that might be discovered in these documents?

[-]

I am not a papyrologist or a classicist, rather I'm a computer scientist, so my expertise is unfortunately not in _what_ the scrolls say, rather how we get there. That being said I think and hope that there will be a trove of things that has no known provenance at all, completely lost works that elude the public memory.

by readthenotes110 minutes ago|

https://en.wikipedia.org/wiki/Nigel_Richards

[-]

Your response reminds me of Nigel Richards :)

Congratulations, and thank-you!

by arikrahman29 minutes ago|

[-]

Well what were your first thoughts when you decoded the script, besides the obvious Eureka, after making some sense of the texts?

by suddenlybananas1 hours ago|

[-]

Probably a lot more texts of Epicurean philosophy and not a whole lot else unfortunately according to my papyrologist friend.

by Matticus_Rex11 minutes ago|

[-]

That's what was thought, but maybe not -- only one of the three so far looks Epicurean, which is not what was expected. Maybe it's a fluke, but historians are buzzing a bit about whether it might be broader than expected.

by cwmoore58 minutes ago|

[-]

Why would Epicurean philosophy be unfortunate?

I was under the impression that there was almost nothing left of that school of thought, and that it’s writings had been destroyed.

What would you like to have instead?

by cwnyth45 minutes ago|

[-]

The unfortunate part is the lack of anything else therein, not that it's Epicurean philosophy.

by ogogmad9 minutes ago|

[-]

The Jewish Talmud uses Epicurus's name as a term meaning "heretic".

by colechristensen1 hours ago|

https://en.wikipedia.org/wiki/List_of_lost_literary_works

[-]

Here's a list. The scrolls are from a library that burned in 79 AD.

by kouru22530 minutes ago|

[-]

Woah there was a lost Homer epic comedy about a bumbling fool named Margites?

by tsol1 hours ago|

[-]

How do get to do that? As in what did you study to get the prerequisite knowledge, and how did you find this particular job? When I see interesting jobs I'm anyways curious what path lead there

[-]

I am a computer scientist. I studied CS in university, worked in the semiconductor industry for a while, got started as a participant in the challenge aspect of the Vesuivus Challenge. They were hiring, I sent in an application, interviewed, and was offered the job.

by matneyx19 minutes ago|

[-]

That last sentence is so perfect, like my dad answering the question of how he lost weight. "I ate less and exercised more."

by BiraIgnacio40 minutes ago|

[-]

Amazing work, fantastic!

by 33 minutes ago|

[-]

deleted

by TheOtherHobbes1 hours ago|

[-]

No questions, but I just want to say this is really exciting work!

by helterskelter1 hours ago|

[-]

Given the current rate of progress, how long do you think it will take to decipher the entire collection?

[-]

That's a tough one to give a strong estimate of. Some scrolls are easier or harder to unwrap and read for a multitude of different reasons, mostly due to how damaged the scroll was in the eruption, and how easy or not the ink is to read. IIRC from what we've scanned of the herculaneum collection, none of the ink is easily visible via spectrum alone, so we have to use a lot of ML and physically based rendering techniques to be able to find ink. That also requires unwrapping and segmentation _before_ any ink detection.

For iron gall ink with high enough iron concentration, the ink stands out in the xray volume through simply masking off low values, such as was shown in our campfire scroll experiment a few years ago. No herculaneum scrolls show similar ink.

by pimlottc1 hours ago|

[-]

Do you think this particular scroll is easier or harder to read that the others will be? Or about average?

[-]

Pherc1667 was quite small and just so happened to have readable ink, so it was easier than I expect most others to be.

by superjan58 minutes ago|

[-]

Do we known what ink is used?

by verditelabs44 minutes ago|

[-]

Most of the evidence so far points towards carbon based ink. I am not sure if any of the scrolls we have scanned show strong evidence of iron gall based ink. I know that there are different types and preparation methods for different carbon based inks, but I do not know if it is possible to determine which kind(s) were used solely from inspecting the xrays.

I am, though, not a papyrologist, so historical ink making, preparation, and usage are not my field.

by helterskelter1 hours ago|

[-]

Thanks!

by temp98726 minutes ago|

[-]

this is überragend. by many means!

by echelon1 hours ago|

[-]

Did anyone on the team come from a non-science, non-math, non-academia background? Did anyone working on this just teach themselves and start contributing?

[-]

Yes. Sean, who was a co-winner of the 2024 prize, IIRC has no formal background in ML, computer science, AI, etc. He is one of our core researchers and the most productive team member.

by fintechjock1 hours ago|

[-]

I've been on the Discord for a couple of years now, and poking around with submissions as well. Sean and the entire team deserve so much praise for all of this work.

It's easy to just read about the breakthrough and see it as one neat, linear line to get there, and hard to comprehend the hours, months and years that so many spent to get there. Big congrats to you, Sean, Nat and the entire team!

by jimbob451 hours ago|

[-]

Are the fragments destroyed in ‘69 and ‘80 available to be read similarly? Or were they disposed of?

[-]

I am unaware of those fragments in particular. Though we have scanned a dozen or so fragments, mostly to help guide ink detection, since the ink in them is often more visible in visible and/or near IR light, but can be hard to impossible to detect in the xray spectrum.

by inglor_cz1 hours ago|