upvote
> The question is how do you take it to a million?

Do you need to take it to a million in the same place? Is that still "small"?

Why not have 2000 hand curated directories instead?

reply
I mainly use Kagi Small Web as a starting point of my day, with my morning coffee. Especially now when categories are added, always find something worth reading. The size here does not present a problem as I would usually browse 20-30 sites this way.
reply
Right, but that basically works as a retro alternative to scrolling through social media. If you're looking for something specific, it's simultaneously true that there's a small web page that answers your question and that it's not on any "small web" list because the owner of the webpage never submitted it there, or didn't meet the criteria for inclusion.

For example, I have several non-commercial, personal websites that I think anyone would agree are "small web", but each of them fails the Kagi inclusion criteria for a different reason. One is not a blog, another is a blog but with the wrong cadence of posts, etc.

reply
Feel free to suggest changes to criteria for inclusion. It is mostly the way it is now as the entire project is maintained by one person - me :)
reply
Looking at the criteria again, I can think of at least three things that arbitrarily exclude large swathes of the small web:

1) The requirement that it needs to be a blog. There's plenty of small-web sites of people who obsess over really wonderful and wacky stuff (e.g., https://www.fleacircus.co.uk/History.htm) but don't qualify here.

2) The requirement that it needs to be updated regularly. Same as above - I get that infrequently updated websites don't generate a "daily morning" feed, but admitting them wouldn't harm in any way.

3) Blanket ban on Substack-like platforms while allowing Blogspot, Wordpress.com, YouTube, etc. Bloggers follow trends, so you're effectively excluding a significant proportion of personal blogs created in the last six years, including the stuff that isn't monetized or behind interstitials. The outcomes are pretty weird: for example, noahpinionblog.blogspot.com is on your list, but noahpinion.blog is apparently no longer small web.

reply
1) It has to have a feed (we dont want to overcrawl) so hence 'blog' - more accurately any site with an RSS/atom feed would do

2) 'Regularly' means posted in the last 2 years to be included

3) Substack has an annoying subcribe popup and ads/popups are against the spirit of what this represents

reply
My approach operates under the assumption that good, non-commercial webpages will be similar to other good webpages. Slop, SEO spam, and affiliate content will resemble other such content.

So a similarity-based graph/network of webpages should cluster good with good, bad with bad. That is what I've seen so far, anyway.

With that, you just need to enter the graph in the right place, something that is fairly trivial.

reply