undefined

points

[-]

You might want to add The Reader's Guide to the Encyclopaedia Britannica, PD text available at https://www.gutenberg.org/ebooks/74039 and scans at https://archive.org/details/readersguidetoen00londuoft - It would fit naturally with the Ancillary material that includes the topic-based index.

by ahaspel17 hours ago|

parent|

[-]

It would indeed. I will see about working this in, it's highly pertinent.

by bentley3 hours ago|

prev|

[-]

There’s an escaping issue in tables of contents. See, e.g., “Roosevelt's” in the “United States” article. https://britannica11.org/article/27-0635-united-states-the/u...

by huijzer7 hours ago|

prev|

[-]

Really nice. Well done.

As a feature request, would it possible for your pipeline to also create an EPUB? Then people can easily access and search through the document even when your site would go down. EPUB by default uses compression so the file size might even not be too bad for the full encyclopedia.

by nyc_pizzadev16 hours ago|

prev|

[-]

Very nice. I actually spent a bit of time browsing a few topics, which is something I rarely do these days!

A few things... when I click an article and try to jump to a new topic, the top search box (labeled "Search titles and full text...") doesn't work. Second, when I first came to the site, I was a bit stuck. It took a bit of time to realize I need to click on "Articles" or even "Topics" to start browsing. Not sure why, maybe I expected the image to let me enter the site somehow...?

by ks204815 hours ago|

prev|

[-]

Nice job. How about wikipedia-style links to other articles for topics mentioned within another article?

by logicallee20 hours ago|

prev|

[-]

Thanks so much for sharing this. It looks fantastic. A couple of questions, if you don't mind: what license are you releasing this under, if any? Is there any way to download it? The reason someone might want to download it is for use as training data.

by zozbot23418 hours ago|

parent|

[-]

Wikisource has the original scans available in the public domain, and their enriched text under CC-BY-SA: https://en.wikisource.org/wiki/EB1911

by realityfactchex19 hours ago|

parent|

prev|

[-]

> Is there any way to download it? The reason someone might want to download it is for use as training data.

Another reason would be to able to keep running/using it even if the main site were to go down for whatever reason eventually; or, to operate a mirror of it, for redundancy (linking back to the original, of course).

by ahaspel20 hours ago|

parent|

prev|

[-]

Thanks!

The underlying text (1911 edition) is public domain, but the structured version here — the parsing, reconstruction, and linking — is something I put together for this site. Right now there isn’t a bulk download available. I’m considering exposing structured access (API or dataset) in some form, but haven’t decided exactly how that will work yet.

If you have a specific use case in mind (especially for training), I’d be interested to hear more.

by hallole18 hours ago|

parent|

[-]

I've wanted to do something like this for The Encyclopédie, a hugely relevant text to the Enlightenment. If you ever get around to adding a rough "How I (generally) Made This" section, that'd be appreciated! Site looks great :)

by logicallee19 hours ago|

parent|

prev|

[-]

Regarding the specific use case, I was thinking this: I had Gemma 4 (a small but highly capable offline model released by Google) make a public domain cc0 encyclopedia of some core science and technology concepts[1]. I thought it was pretty good.

Separately, I've fine-tuned the Gemma 4 model[2], it was very quick (just 90 seconds), so I think it could be interesting to train it to talk like 1911 Encyclopedia Britannica.

I would use the entries as training data and train it to talk in the same style. There isn't a specific use case for why, I just think it would be interesting. For example, I could see how it writes about modern concepts in the style of 1911 Britannica.

[1] https://stateofutopia.com/encyclopedia/

[2] To talk like a pirate! https://www.youtube.com/live/WuCxWJhrkIM

by ahaspel19 hours ago|

parent|

[-]

That’s a fun idea — I can see the appeal of that style.

The underlying text is public domain, but the structured version here is something I put together for the site. I haven’t released a bulk dataset yet.

If you end up experimenting with it, I’d love to hear how it turns out — and I’m still figuring out what structured access might look like.

by gnerd0019 hours ago|

prev|

[-]

legal terms question here also -- several major world economies are operating under very different rules regarding datasets and publication rights. I am in the USA / California.. will there be terms for me, given that I am not a giant deep-pockets FAANG, just a book person ? commercial use terms for "small business" scale ?

by ahaspel19 hours ago|

parent|

[-]

The 1911 text itself is public domain, so anyone is free to use it.

What I’ve built here is a structured edition — the parsing, reconstruction, linking, indexing, etc. I haven’t published a formal license for that yet.

For casual or small-scale use there’s no issue at all. For bulk use (e.g. dataset / training / redistribution), I’d prefer people get in touch so I can figure out a sensible way to support that.

by Kerrick10 hours ago|

parent|

[-]

> What I’ve built here is a structured edition — the parsing, reconstruction, linking, indexing, etc. I haven’t published a formal license for that yet.

If you live in the U.S. I recommend you read No Sweat of the Brow Copyright: https://www.gutenberg.org/help/no_sweat_copyright.html

by dessimus16 hours ago|

parent|

prev|

[-]

It's been on Project Gutenburg for over 20 years: https://www.gutenberg.org/ebooks/13600

They only release books that are in the public domain.

by bentley3 hours ago|

parent|

[-]

> They only release books that are in the public domain.

Not necessarily. Project Gutenberg does provide some works still under US copyright, such as F. P. Walter’s 1999 translation of Twenty Thousand Leagues Under the Seas: https://gutenberg.org/ebooks/2488

by TremendousJudge19 hours ago|

parent|

prev|

[-]

I guess such an old edition is in the public domain

by Soluod17 hours ago|

prev|

[-]

[dead]