I've been pretty sceptical about Kagi, feeling that it was a bit to expensive and perhaps just relying on other companies indexes to much and I spend to much time looking at how many searches I had left. After getting the subscription I just don't want to go back, the price is perfectly reasonable for the value. Being able to just search again and not sort through junk and spam and ads and just getting the pages I want and need is amazing.
Honestly it's a slightly weird feeling to look a the results from Kagi and notice it found exactly what you where looking for.
Once my gifted credits run out, that is going to be an easy renewal for me. I do not want to go back, even if I think Ecosia is a good option.
Even after the recent AI run-up, disk prices are about $20/TB for a 20TB, so you can store this index on 3-5 hard disks that will cost you about $1200-2000. For self-hosted use you don't need to serve them in 50ms, so you don't need to put the whole thing in RAM like Google did, you can serve off of disk.
ElasticSearch uses basically the same data structures and gives you the same infrastructure that Google's ~late-00s search stack did, and is actually more advanced in some respects (like ad-hoc queries, debuggability, and updateability), so software isn't much of an issue.
The big part missing that can't really be replicated today is the huge web of authentic hyperlinks. The reason Google was so good at search was because many humans effectively "tagged" a given webpage with a series of short, descriptive words and phrases. When they went to search for a page, Google could mine this huge treasure trove of backlinks to identify exactly what the page was good for, even if those search terms never appeared on the page. SEO and link farms kinda killed this, as did the rise of social media walled gardens, and so the Google of 2009 basically wouldn't work today anyway. Maybe if you pulled old versions of Common Crawl or archive.org you could reconstruct it, but the relevant pages are often offline anyway today.
At least if we're speaking a more generalist web search it requires dedicated hardware, that's pretty costly. Marginalia's production server cost about $20k back when RAM and SSDs were cheap. It used to run on $5k of PC hardware before, but that was very limiting.
So no data center, but at the same time, not everyone has that sort of cash to throw around.