Amateur may have cracked Linear A, a 120-year-old puzzle

upvote

Amateur may have cracked Linear A, a 120-year-old puzzle

(aiclambake.com)

158 points

by Kosturdistan2 hours ago |

upvote

by stratocumulus037 minutes ago|

[-]

As an amateur who's been fascinated by this puzzle himself, I will add some context that might be relevant in assessing the plausibility of this claim:

- The "Libation Formula", which the author used as the base for his translations, is the most studied piece of writing in Linear A, because it's the only recurring phrase (with grammatical variation) that we have. The corpus is extremely fragmentary, with just a handful of instances of longer text (and even then, the texts are the length of an average sentence in English). The majority of documents available to us are lists (of inventory, personnel, offerings or something of this sort). The longer texts make use of punctuation marks, likely put in between words. This gives us a non-trivial vocabulary, which still does not match that of any known language.

- With such fragmentary remaining material, we cannot be sure that a) all the texts we call "Linear A" are written in the same language, and b) the recognizable words are not abbreviations, for example.

- The author made an assumption that Linear A symbols which have counterparts in Linear B should have the same phonetic values. This gives us an already known glyph that represented "NA". "Duplicate" glyphs are only found in the P-series, and are assumed to represent syllables which were distinguished by the Linear A language, but not by Greek - such as aspirated/unaspirated P. There is a glyph that stands for "NWA" in Linear B, but instances of it have been found in Linear A as well.

- There are countless words with no known etymology in Ancient Greek, assumed to originate from a substrate language or languages spoken in the area at the time Greeks migrated to their present-day homeland. The language of Linear A would be a likely candidate for such substrate. If Linear A were a Semitic language, then we should already be able to establish Semitic etymologies for those words as they were in Greek. Of course it could also be the case that these words came from an another language which did not adopt writing or its writing did not survive to our times.

reply

upvote

by vb-84481 minutes ago|

[-]

I wonder if LLMs trained specifically for this purpose can perform well with "forgotten languages".

I know I'm simplifying a lot, but all this deciphering isn't it just some kind of pattern matching?

reply

upvote

by Tuna-Fish42 minutes ago|

[-]

The reason linear A is so difficult is that the total remaining corpus of Linear A text is ~7500 characters, spread out over ~1500 inscriptions.

If you have a 4k screen, you can fit all remaining Linear A text on your screen at once, in 14pt high font.

reply

upvote

by stratocumulus019 minutes ago|

[-]

An in addition to that, a vast majority of documents are lists which consist of a "header" (1 to 3 words) and word-number pairs afterwards. An another common class are small clay seals with 1, 2 characters carved into them. It's likely that in both cases, we may be dealing with abbreviations.

Some of the lists end with "ku-ro" and a number that's the sum of all the previous numbers, oddly frequently off by one.

reply

upvote

by _kst_4 minutes ago|

[-]

They hadn't yet decided whether to count from 0 or from 1.

reply

upvote

by dehrmann34 minutes ago|

[-]

Very vaguely, it makes it like a one-time pad where it can be anything you want it to be. Not quite, but so little text leaves a lot of options open.

reply

upvote

by WithinReason35 minutes ago|

[-]

As observed by archaeologist John Younger, the entire Linear A corpus takes up only 1.84 pages of letter paper when typeset in 12 point font and 1-inch margins.

reply

upvote

by stringfood34 minutes ago|

[-]

when I first read the title thought he was talking about linear algebra and I was like damn it's not that hard

reply

upvote

by Kosturdistan2 hours ago|

[-]

A lot of loonies make this claim, but Tom's work is credible enough that it's being reviewed by linguistics experts at Rutgers and Cambridge. Additional validation: his approach produces results. He's translated over 300 words, and that's never been done before, and his solution actually solves some problems in Linear B. Tom is an AI engineer, and Claude Code was key to his work. Disclosures: I know Tom socially, and I wrote the post at the link.

reply

upvote

by kubb1 hours ago|

[-]

Let's wait until it's been verified.

reply

upvote

by mikestorrent54 minutes ago|

[-]

You're absolutely right! We've opened a ticket with the Linear A folks, hopefully they'll get back to us soon with an update as to whether we've got it correct or not. Hang tight!

reply

upvote

by kridsdale134 minutes ago|

[-]

This comment sure is load bearing.

reply

upvote

by TeMPOraL21 minutes ago|

[-]

Regardless, we should stand ready, loaded for bear.

reply

upvote

by mikestorrent22 minutes ago|

[-]

It's the veritable smoking gun

reply

upvote

by saagarjha42 minutes ago|

[-]

A Linear ticket, hopefully

reply

upvote

by yorwba1 hours ago|

[-]

Then why is there no link to the actual write-up?

reply

upvote

by GavinMcG1 hours ago|

[-]

Presumably because it hasn’t yet been published?

reply

upvote

by 1 hours ago|

[-]

deleted

reply

upvote

by m0llusk1 hours ago|

[-]

It seems this is still extremely early in the process. There is an apparent finding that was shared. Evidence which would be the basis for a paper is "being reviewed by linguistics experts at Rutgers and Cambridge". So they are trying to do the right thing by talking about what they believe they have done but holding off publication and serious claims until later. The general idea that written forms can be categorized by systems built with Claude could be applied to other as yet undecipherable languages could be used by other interested investigators just with what is discussed here.

reply

upvote

by sillysaurusx1 hours ago|

[-]

> The general idea that written forms can be categorized by systems built with Claude could be applied to other as yet undecipherable languages could be used by other interested investigators just with what is discussed here.

Could you rephrase this or explain it more thoroughly? I don’t follow. What does it mean to categorize a written form by systems built with Claude?

reply

upvote

by tyingq1 hours ago|

[-]

The same pattern/tech is generic enough that it might be able to solve other unrelated, and so-far undecipherable, written languages.

reply

upvote

by kelseyfrog1 hours ago|

[-]

You can use Claude, like the author, to reproduce the result.

reply

upvote

by _verandaguy1 hours ago|

[-]

This isn't really a reasonable approach, is it?

The original prompts aren't provided, nor is the original context; even then, you can't really treat a stochastic system like an LLM as a major component in reproducibility.

reply

upvote

by TeMPOraL8 minutes ago|

[-]

> stochastic system

Every day when you lower your butt onto your chair, you trust a stochastic system enough to assume you'll rest on the chair safely and not spontaneously phase through, which would lead to rather gory and painful terminal experience.

Physics at macro scale is stochastic, which is a good reminder that stochastic != uniformly random. Expected distributions matter.

reply

upvote

by ben_w44 minutes ago|

[-]

> even then, you can't really treat a stochastic system like an LLM as a major component in reproducibility.

If you had the other things, being "stochastic" is not even remotely a show-stopper. Stochastic processes abound and are the reason the mathematics of statistics was developed in the first place, ultimately allowing us to create such things as LLMs.

When all the relevant steps gets published, I absolutely expect a lot of people to (attempt to) reproduce this work even though LLMs are stochastic.

reply

upvote

by _verandaguy19 minutes ago|

[-]

My issue with this is that it's a form of "soft" reproducibility, where it'll work for many (maybe even most!) people, but that depends on the way the original prompt was formulated (read on) and the state of the random noise in the system.

On the prompt formulation; prompts with very similar formulations (in terms of both semantics, hamming distance, or both) can lead to _wildly divergent_ outputs in my experience. It's not rigourous, and when that divergence happens, it's extremely difficult (arguably impossible, by nature of the architecture of transformers) to identify why the divergence happened and where.

reply

upvote

by iwontberude43 minutes ago|

[-]

Actually it is because Claude did the work and being a lay person isn’t really that high of a bar.

reply

upvote

by fragmede44 minutes ago|

[-]

Sure it is. We're humans, not robots (well, I think I am, and I presume you are as well, but for all we know, we could be living in a simulation), so if the non-deterministic system decides to generate code that calls the variable foo one day and bar the next, as long as the code still does what's being asked of it, why do I care that the non deterministic system chose to call the variable something different when run on Tuesday? There's the computer science definition of determinism and the engineering result of "does it work", which are at odds. It's like the halting problem. We haven't solved the computer science definition of the halting problem, but give some C code with a loop that won't terminate to Claude, and it'll call that out as not halting.

reply

upvote

by _verandaguy24 minutes ago|

[-]

All things aside, I think this misses the forest for the trees on the halting problem.

It's not about being able to throw claude or codex at a loop and having it evaluate it for halting, it's about being able to do this for arbitrary code. Computer science rigourously defines the halting problem as not computable and undecidable. within the framework of using something akin to static analysis using any deterministic Turing machine.

There's not really a question of "solving" the halting problem like there's some as-yet unknown way of generally figuring out if arbitraty code halts. Turing proposed a proof in 1937 in favour of undecidability of what we now know as the halting problem, building on ideas first articulated by Church a few years prior.

Frankly, if anything, it's reasonable to say that the halting problem's been solved, just in the direction of undecidability rather than decidability.

Anyway, back to LLMs; as code gets more complex, the robot will need a bigger context window, more hardware resources, and more time, all of which will be variable due to the noise inherent in the system. It'll be difficult to put a useful upper and lower bound on how much computing power and time it'll take to figure out if a program ever halts. Which is all a bit moot, frankly, in the context of halting, but useful to keep in mind in the more general context of using these things as analysis tools.

reply

upvote

by atrus1 hours ago|

[-]

somehow I suspect it was a bit more involved than: Claude, please solve Linear A.

reply

upvote

by fragmede12 minutes ago|

[-]

A little bit more. If you ask ChatGPT to "solve linear a" it thinks you mean linear algebra. If you specify that it's the Minoan translation problem, you get a table similar to the one that we get a glimpse of in the without access to the paper, we can't say how much more work the paper has than my gist.

https://gist.github.com/fragmede/bbf277d36a2398065f109484f34...

reply

upvote

by smsm4254 minutes ago|

[-]

You also have to add "make no mistakes"!

reply

upvote

by justin_dash1 hours ago|

[-]

Unless if it was done by Fable!

reply

upvote

by kelseyfrog1 hours ago|

[-]

The 'major insight' described in the article predates Fable's release by two week four days. It would be a complicated timeline.

reply

upvote

by grey-area1 hours ago|

[-]

Amazing work and refreshing to see a well written and cogent post to summarise it. Would love to hear more about how he used Claude to help solve the puzzle.

reply

upvote

by 49 minutes ago|

[-]

deleted

reply

upvote

by 56 minutes ago|

[-]

deleted

reply

upvote

by dwroberts47 minutes ago|

[-]

You know him socially but is there a reason you’re writing this rather than him? It looks like he has his own web presence.

Cynical read would be you’re stealing his thunder a bit by prematurely announcing this before it’s fully confirmed

reply

upvote

by jstanley4 minutes ago|

[-]

Promoting your friends' work is hardly stealing their thunder. It's increasing their thunder!

reply

upvote

by Conscat45 minutes ago|

[-]

Isn't it customary for the author of a post shared on HN to leave a comment on the thread?

reply

upvote

by dwroberts43 minutes ago|

[-]

I’m not referring to the parent comment: The post is not written by the author of the claimed breakthrough.

reply

upvote

by iwontberude44 minutes ago|

[-]

What thunder? Claude did the work and used a human to interface with experience and causality better.

reply

upvote

by ben_w38 minutes ago|

[-]

The thunder is as per the headline. Assuming it passes review.

One of the things I find weird with AI is how the dismissals of work that involve AI splits into two camps: like yours, saying the AI did the work while the human played no role and deserves no credit; and those saying the AI rips off its training data while the human using it played no role and deserves no credit.

reply

upvote

by iwontberude35 minutes ago|

[-]

I exist in both camps. Claude can’t launder human achievement into a different person. Claude stole it, but it’s still in Claude’s possession and is not transferable in any durable sense.

reply

upvote

by ben_w32 minutes ago|

[-]

> Claude stole it, but it’s still in Claude’s possession and is not transferable in any durable sense.

No human, individually or as a team, has been able to solve this to date.

To the extent this was Claude solving it itself and thus denying Di Mino any thunder, there was nobody to have stolen anything from. To the extent he has thunder to be stolen, it wasn't ever in Claude's possession.

reply

upvote

by loudmax31 minutes ago|

[-]

This is very exciting. Congrats to Tom on the accomplishment.

To be clear, this is an attempt at a decipherment. This is not proven, and we shouldn't consider Linear A to be "solved" until experts in the field have reviewed the work. In fact, it probably shouldn't be considered "proof" unless some more Linear A writings are uncovered and these are congruent with the method proposed. All that can be said for certain at this point is that this is an interesting conjecture.

But this is a story worth following. This could be the real deal. More research and validation should follow and we should have a better idea in the next few weeks or months whether Linear A has really been solved. At the very least, this is an interesting attempt, and optimistically, it could yield real insight into Minoan culture. Kudos.

reply

upvote

by WhitneyLand16 minutes ago|

[-]

If confirmed this is really cool and impressive work.

Honestly curious how many years before it can be one shotted in a coding harness with Fable.next by someone who’s not a linguistics expert.

Develop, test, and rank hypotheses about the phonetic values, morphology, grammar, and possible language family of Linear A using the full available corpus. Do not assume any decipherment is correct. Treat all candidate readings as hypotheses to be scored…”

reply

upvote

by mNovak55 minutes ago|

[-]

Interesting writeup. Would be nice to have a couple images of Linear A/B scripts to visualize. Looking on google, they're very daunting!

reply

upvote

by indiv054 minutes ago|

[-]

Can I get his decipher-forgotten-ancient-text skill? I want to try my hand at the Voynich Manuscript

reply

upvote

by rw_panic0_011 minutes ago|

[-]

would like to hear more about Tom's learning/education path in ML/AI.

reply

upvote

by doubleorseven39 minutes ago|

[-]

crossing my fingers for this guy.

however, nawaya or what ever examples around it are not part of the Hebrew language.

reply

upvote

by NooneAtAll352 minutes ago|

[-]

relevant xkcd: https://xkcd.com/2151/

reply

upvote

by OutOfHere1 hours ago|

[-]

Is this extendible to a generalizable approach to translate any language pair (without a translation map)?

reply

upvote

by retrac36 minutes ago|

[-]

I think it is an open question: can an unknown language be cracked -- without any dictionary or grammar or understanding of the language? Just lots and lots of texts, maybe some of it bilingual.

It's a common misconception that is what happened with Ancient Egyptian with the Rosetta Stone. The Rosetta Stone was just one of the big pieces of the puzzle. The decoding came when people realized that Coptic (a language written alphabetically and still in use in the Coptic Church today) is actually descended from Ancient Egyptian; as Spanish is to Latin, Coptic is to Ancient Egyptian.

Similarly the attempts to decode classical Maya were all dead ends. Until Yuri Knorozov realized that it encoded the ancestor of the Maya languages which are still spoken to this day. (Knorozov's Wikipedia article is worth checking out just for his photo with his cat. [0] IMHO.)

I have written before about the La Mojarra 1 stele in Mexico [1]. It looks a lot like Maya. But it isn't Maya. Maybe the difference like between Russian and Latin?

No one can read it. It's undecipherable. There are some attempts to identify it with a proposed ancient language that would have been related to the modern Mixe-Zoque languages: some of the glyphs that are shared with Maya, when read phonetically, start sounding like a Mixe-Zoque language. But no one has proposed a confident decipherment. There probably isn't enough text. La Mojarra 1 is the only long example of the Isthmian script.

Deciphering Akkadian was very difficult, at first. The process started with Persian; old Persian was written in a simplified adapted form of the Mesopotamian cuneiform (wedges on clay). It was a kind of alphabet. And Old Persian was already understood. And there was a bilingual text on a monument carved by Darius I. But even then -- decoding relies so heavily on the fact that Akkadian is a Semitic language distantly related to Hebrew, more distantly, also Ancient Egyptian. So again, we sort of knew what we were looking for.

That is all to say: even if the Voynich manuscript (for example) contains real text I'm not sure it is possible even theoretically to translate it.

[0] https://en.wikipedia.org/wiki/Yuri_Knorozov

[1] https://en.wikipedia.org/wiki/La_Mojarra_Stela_1

reply

upvote

by SoftTalker44 minutes ago|

[-]

Towards the Star Trek universal translator.....

reply

upvote

by 49 minutes ago|

[-]

deleted

reply

upvote

by fooster33 minutes ago|

[-]

Alot of the comments in this thread are disappointing. Rather that celebrating an achievement (whether or it is validated yet), many of you seem to want to put him down, or make it seem like claude did all the work.

Claiming that claude did all the work is patently ridiculous. Claude is a tool, like any other. The corpus of linear A is ~7500 characters across ~1500 inscriptions and claude, no matter how smart, doesn't just solve that on its own.

What a shame.

reply

upvote

by iwontberude44 minutes ago|

[-]

Sorry but I don’t recognize this as being an achievement by an amateur. This dude had no chance in hell until we trained a model to use his time to suss it out.

reply

upvote

by jonahx37 minutes ago|

[-]

Assuming this pans out, every other professional linguist in the world has had the option to use Claude or other LLMs, but has not solved this problem, despite the incentives for doing so. It stands to reason the human is adding crucial value.

reply