SearXNG: A free internet metasearch engine

upvote

SearXNG: A free internet metasearch engine

(github.com)

259 points

by theanonymousone22 hours ago |

upvote

by asciimoo19 hours ago|

[-]

Ohi, I'm the original creator of Searx, but due to the limitations of the metasearch concept I'm not involved in the development anymore. My new search project is https://github.com/asciimoo/hister (https://hister.org/).

Hister is a full text indexer for websites and local files which automatically saves all the visited pages rendered by your browser. Storing full page content allows serving offline result previews and the full page content via MCP.

Take a look at how the MCP can be utilized: https://hister.org/posts/give-your-ai-assistant-a-private-me...

reply

upvote

by jodoherty1 hours ago|

[-]

Beautiful! Thank you for making this.

I've been trying to find something to use for enriching my own self-hosted LLMs and agentic tools with information I find useful. Metasearch tools like SearXNG make it less likely you'll get blocked by bot detection tools when finding information, but usually it's something I've already found, read, or seen that I want to incorporate into my tooling.

I came to the conclusion that a self-hosted content storage system with a search engine and a browser extension that can extract and save web page content and metadata was the ideal setup for me. Preferably with some sort of federated content sharing ability and the ability to import creative commons content like Wikipedia and Gutenberg.

This looks almost exactly like what I wanted.

It'll be a few weeks before I have time to audit the code and deploy it, but I'm really looking forward to trying it out.

reply

upvote

by zeroq18 hours ago|

[-]

I'm sorry for not taking the time to read the docs, but I have a question.

Some 20 years ago a friend of mine has set up a local proxy (python if I'm not mistaken) that was gathering all his web traffic and served him as a long term memory. The proxy had a web interface and allowed him to quickly find something he saw ca. 10 days ago, or that specific algorithm he recalls but can't remember it's name.

For years I've been collecting links to different work related trivia which I use on a daily basis as a rabbit-from-a-hat solution to answer random question from friends and coworkers. For example someone randomly asked me for an idea for color palette for data charts and I can immediately give them a scientific research into the color palette. Or an obscure algorithm.

But with time the collection has grown substantially and it's really cumbersome to find the proper things.

Would your project be a good fit for my problem?

reply

upvote

by asciimoo18 hours ago|

[-]

Absolutely, this is a great example where Hister can shine.

I started Hister as a proxy as well, but quickly switched to the current extension based approach, because intercepting HTTPS traffic requires a MiTM proxy which is much more painful to setup than installing a browser extension.

reply

upvote

by zeroq17 hours ago|

[-]

would it be possible to gdrive/rsync/git the data between machines and then use the data on an online server for retrieval (given that I would handle data sync myself)?

also what exactly are you using for search? does it support trigrams? how do you sort results?

reply

upvote

by sunshine-o9 hours ago|

[-]

I found Hister a few month ago and was amazed by it.

Now for many of us the browser extension approach is not possible (mobile usage, security, etc.)

My feeling is for a lot of users there is really a third way apart from the MiTM proxy or Browser extension approach. I actually do not want my "personal" / "logged in" pages to be indexed. This is a bit like the MS recall nightmare (self hosted version).

Any way to get the list of URL visited (with something like Privoxy, or maybe one of those popular ad blockers like Pi Hole but I guess they just get DNS queries?) and then importing it with some filtering rules with a nightly batch job is good enough for a lot of people.

The browser import [1] is great but I guess hard to use with mobile...

- [0] https://www.privoxy.org/

- [1] https://hister.org/docs/importing-browser-history

reply

upvote

by asciimoo7 hours ago|

[-]

Thanks for the kind words =]

There is already an ongoing discussion about the topic: https://github.com/asciimoo/hister/issues/387

The currently discussed solution relies on the browser extension, but mobile Firefox has extension support.

reply

upvote

by justusthane18 hours ago|

[-]

Also very interested in this. I was playing around with doing the same thing with YaCY. I want the proxy aspect so that I can proxy my phone traffic through it as well.

reply

upvote

by asciimoo17 hours ago|

[-]

Unfortunately mobile Chrome browsers don't support browser extensions, but our extension works well on mobile Firefox.

reply

upvote

by left-struck15 hours ago|

[-]

Would you mind sharing these links? Or a subset? I want to grow my collection which is tiny because I started way too late

reply

upvote

by ydj14 hours ago|

[-]

Hister sounds like something I wanted for a while, but never got around to building. Searching stuff I’ve seen before is most of what I do with a search engine, so having it local and fast would be amazing. Eager to give it a try.

reply

upvote

by phrotoma5 hours ago|

[-]

And the number of times I've searched for something that I saw a while ago but is now gone is way too damned high.

reply

upvote

by Leonard_of_Q4 hours ago|

[-]

Interesting, a local search option. I made the recoll engine for SearX and now SearXNG and still use this daily over a rather large archive of journal articles and other non-fiction texts. Recoll's indexer can extract text from just about anything I throw at it, it also extracts and indexes metadata. Would Hister serve the same purpose and if so is there a SearXNG engine to integrate it into the result stream?

reply

upvote

by exiguus9 hours ago|

[-]

YaCY has a proxy mode that automatically index your web-serving. In my experience, the index grow in size very fast and reaches ~100GB or more. How does the index size of Hister compare to that?

reply

upvote

by asciimoo8 hours ago|

[-]

Hister stores only the text content of HTML/pdf pages. 1000 documents require around 80-100MB of storage and there is still plenty of room to optimize for storage space.

I'm using it for 6-7 months and my index size is below 1GB with almost 10k pages.

Also, a downside of the proxy approach: it does not handle properly JS based websites and cannot identify dynamic content changes. Our extension periodically checks if the browser tabs' content has been changed and automatically updates the index when change detected.

reply

upvote

by BrunoBernardino7 hours ago|

[-]

Hister is a great idea and the creator is a really nice person, please give it an honest look and consider supporting them (I'm Uruky's co-founder and we sponsored them)!

reply

upvote

by scritty-dev3 hours ago|

[-]

this is really cool, first time hearing about this, is there any org level model for this so you can promote individual's indexed websites into an organization/team owned model?

reply

upvote

by asciimoo3 hours ago|

[-]

Multiple users can use a shared instance and collect their indexed content in a central place. Hister has user handling and a "public mode" as well: https://hister.org/posts/public-search

reply

upvote

by MrDrMcCoy17 hours ago|

[-]

Always excited to see new things like Hister in the search space. What are the scaling limits, as far as you can tell in terms of how much can it hold before queries start breaking down or become too slow to be useful? Could it evolve into a general internet search engine if, say, enough trusted members of a geo-distributed YugabyteDB cluster and an army of crawlers built a sufficient index?

reply

upvote

by asciimoo17 hours ago|

[-]

> What are the scaling limits, as far as you can tell in terms of how much can it hold before queries start breaking down or become too slow to be useful?

There has been no stress tests in this regard. The indexer lib Bleve [1] can handle millions of documents according to their documentation.

> Could it evolve into a general internet search engine if, say, enough trusted members of a geo-distributed YugabyteDB cluster and an army of crawlers built a sufficient index?

My long term goal is exactly this. I'd like to add federation/P2P feature [2][3] to evolve from being a private search companion. I'd appreciate any help designing the system.

[1] https://blevesearch.com/docs/Home/ [2] https://github.com/asciimoo/hister/discussions/432 [3] https://hister.org/posts/public-search

reply

upvote

by Abishek_Muthian14 hours ago|

[-]

This is great, like many others I've been thinking of something like hister but only for bookmarked web pages. I presume it should be straightforward with hister to do that?

All the best!

reply

upvote

by asciimoo11 hours ago|

[-]

It is possible. The automatic website indexing can be turned off in the extension and manual indexing can be triggered via the command line tool, the extension popoup or by hotkeys.

reply

upvote

by derrida17 hours ago|

[-]

Wow! that looks like a bit of software I have been dreaming about for awhile - will definately check out! You're at least doing something right in communicating the reasons why and appeal for starters! All the best!

reply

upvote

by chrisss39518 hours ago|

[-]

I love your idea and wondered why saving and indexing browser visited pages was not being done. Does this handle large amounts of local files, for example 10-20TB across file types like Powerpoint, Excel, Word, and PDF?

reply

upvote

by asciimoo18 hours ago|

[-]

In its current form it cannot handle this amount of data efficiently (and doesn't support powerpoint/excel/word yet), but this is a valid use-case, I've added a TODO item to experiment with it.

reply

upvote

by blackqueeriroh17 hours ago|

[-]

Oh thank god there used to be several tools like this and they slowly went away. I’ve been wanting this to return.

reply

upvote

by 19 hours ago|

[-]

deleted

reply

upvote

by kristianpaul19 hours ago|

[-]

Is this similar to fastcrw ?

reply

upvote

by asciimoo19 hours ago|

[-]

Both are search engines, but that's all the similarity. Hister has a traditional crawler, but its biggest strength is automatically indexing browser tabs as those are rendered. This way it bypasses authentication, CloudFlare, captchas and most of the annoying limitations of traditional crawlers. Hister also provides full offline result previews. Check out the small read-only demo: https://demo.hister.org/

reply

upvote

by nickpsecurity14 hours ago|

[-]

I was considering paying someone to build something like this at some point. With two jobs, I eventually had no time to even organize what I find. It's just piles of links in text files.

Can I give your software a huge list of URL's to index? Or do I need to use browser automation to open them a few at a time with it caching and indexing them?

reply

upvote

by asciimoo8 hours ago|

[-]

I accept donations ;)

Hister has a built in crawler with standard HTTP lib and browser based backends, you can feed your link collection to it. Also, Hister supports importing your existing browser history automatically using either of the mentioned backends.

reply

upvote

by operatingthetan19 hours ago|

[-]

I installed this a while back and honestly I almost never touch it. It turns out that for me searching my history doesn't really replace a search engine at all. The built in extractor list is pretty limited and adding them seems like too much of an ordeal for me to bother.

reply

upvote

by asciimoo17 hours ago|

[-]

Sure, it cannot fully replace web search engines (yet), but it can reduce the dependence on these services more and more as your index grows. Hister is designed to support quickly falling back to traditional search engines with a single hotkey if no results found.

I agree, we should add more extractors [1]. Can you recommend extractors you missed?

[1] https://github.com/asciimoo/hister/issues/305

reply

upvote

by exiguus19 hours ago|

[-]

SearXNG is my daily internet search now +5 years; with YaCY Backends and else as fallback. I also build internal document search or RAG applications with this setup (SearXNG also support json results). However, there are some downer I accept because of privacy: 1. Its slower and the results are not that good then with others. But fast and good enough for most of my queries. 2. From time to time you get blocked on the duckduckgo, brave or whatever search and you must solve some captures. You can prevent this by getting and using API-Keys from them.

The nice thing about using your own backend is, that you can prio it in the results and for example, if I crawl the smallweb and other site important for myself, this sites come up first in the results.

reply

upvote

by sunshine-o9 hours ago|

[-]

> SearXNG is my daily internet search now +5 years

Same here

> with YaCY Backends and else as fallback.

Do you run your own "super fast" YaCy instance? or with specific settings?

My experience with YaCy is it doesn't fit in the backend of SearX since YaCy kind of slowly stream results for about 30 seconds...

I also have a local `kiwix-serve` serving ZIM files of wikipedia, wiktionary, gutemberg, archwiki, etc. but same problem the kiwix search engine [0] doesn't really fit as a backend for SearX as it returns too many results and pollute the SearX result page.

What I haven't done yet is trying to plug SearX to a local Recoll instance [1]. But Recoll doesn't support indexing ZIM files... but could be useful for other archived html documents.

I would be curious to know more about a working setup since search is hard to get right.

- [0] https://kiwix-tools.readthedocs.io/en/latest/kiwix-serve.htm...

- [1] https://docs.searxng.org/dev/engines/online/recoll.html

reply

upvote

by exiguus7 hours ago|

[-]

I ran my own YaCY instances. Three of them to be specific, because they are "super fast" and "reboot" often. I crawl with them the smallweb, smallcomic and smallyt sites and also all feeds from my miniflux instance; getting them via the miniflux api. Beside that i have other static entries that i crawl. For wikibooks and wikipedia i tried and use also YaCY, but it use a lot of resources. So its only in one instance. I suggest >16GB RAM and 300GB+ HDD if you want to do this. To access wikimedia, gutemberg, archwiki or media.ccc.de directly, I use also SearXNG. Usually it takes 1-3 Seconds to get search results from YaCY in my setup. I run them in docker on aarch64 with ~6GB of RAM and 200GB HDD. The VPS it-self has 8GB RAM, 6 arm cores and 250GB HDD. If YaCY hang, i just restart it. This are my pretty good working docker deploy and java settings I use currently:

    environment:
      JAVA_OPTS: >-
        -XX:+UseG1GC
        -XX:MaxGCPauseMillis=200
        -XX:+ParallelRefProcEnabled
        -XX:+UseStringDeduplication
        -XX:InitiatingHeapOccupancyPercent=45
        -XX:G1ReservePercent=15
        -Xms1024m
        -Xmx3072m
        -XX:MaxMetaspaceSize=256m
        -XX:MaxDirectMemorySize=256m
        -XX:+ExitOnOutOfMemoryError
        -XX:G1HeapWastePercent=10
        -XX:G1MixedGCCountTarget=4
    deploy:
      resources:
        limits:
          cpus: "4.2"
          memory: 5.2G
        reservations:
          cpus: "2"
          memory: 2.5G
    healthcheck:
      test: |
        /bin/bash -c '
        if ! timeout 55s wget --spider --no-verbose http://127.0.0.1:8090/yacysearch.html?query=exiguus; then
          exit 1
        fi
        if ! timeout 55s yacy_search_server/bin/checkalive.sh; then
          exit 1
        fi
        exit 0
        '
      interval: 120s
      timeout: 60s
      retries: 3
      start_period: 240s

That's the smallest I got it running mostly stable and self-healing with a index size of +100GB. I also avoid to use crawling by the build in tasks and use the API and cron jobs for weekly feed importing, because I found out, that kind of crawling eats up less resources then the usual. All-Over, to much running crawlers, make retrieving search results slow. For production use, I suggest to min. double the resources. If you do this, it becomes very stable.

Thanks to pointing out kiwix. I'll give it a try.

reply

upvote

by sunshine-o2 hours ago|

[-]

Thanks you so much this is highly very valuable information.

> Thanks to pointing out kiwix. I'll give it a try.

I see YaCy works with ZIM files [0] packaged by Kiwix so this is great.

In theory if you run YaCy kiwix is not necessary but they do package already valuable sites likes Wikipedia, iFixit, archwiki, etc. [0] so you do not have the worry of your crawler to be blocked and have local copy anyway [1]. So a lot of bandwidth and headache saved.

- [0] https://github.com/yacy/yacy_search_server/tree/master/sourc...

- [1] https://browse.library.kiwix.org/#lang=eng

reply

upvote

by goodroot19 hours ago|

[-]

This appears to be a key tool for providing search to local models.

I'm curious what setups folks use to provide this functionality.

Since the quantized 24B parameter Gemma model came out, I've had good luck with tool calling on a 4070 Ti Super.

Successful tool calling is what finally made the local experience useful.

I should note this is for the general and not coding specific context.

reply

upvote

by 2 hours ago|

[-]

deleted

reply

upvote

by gardnr19 hours ago|

[-]

It has a JSON mode that you need to enable in settings and then you can create a simple python script to interact with it or have the agent use `curl` and `jq` to interact with it.

It's at the bottom of this page: https://docs.searxng.org/admin/settings/settings_search.html

reply

upvote

by drnick119 hours ago|

[-]

I am also interested in what a full local AI stack with web search and other tools looks like. As far as I can tell, SearX does not embed an MCP server, so it can't be directly called from llama-server for example. Open WebUI does have an integration for SearX and other providers, but the results I obtained weren't particularly impressive.

reply

upvote

by c-hendricks16 hours ago|

[-]

I use Searxng through Onyx, both as regular search and Onyx's Deep Research mode. I also have https://github.com/ihor-sokoliuk/mcp-searxng to add search to coding agents. Haven't really had many issues with it.

reply

upvote

by jared0x9016 hours ago|

[-]

are you running a quant?

i have a friend with a 4080 that is wanting to experiment with local models and those cards should be similar enough. can you give any more detail about your setup? ty!

reply

upvote

by goodroot15 hours ago|

[-]

Yep -

`gemma4-26b-a4b-it-qat.gguf`

https://huggingface.co/lmstudio-community/gemma-4-26B-A4B-it...

It is really great to use. As the poster above mentioned, my setup with Sear is the following, all through `llama.cpp`, which has a built-in webui with an MCP client:

* SearXNG in Docker — enable the JSON API (`search.formats: [html, json]`; off by default).

* `searxng-mcp` (FastMCP, native streamable-HTTP): `TRANSPORT=streamable-http HOST=127.0.0.1 PORT=8100` `SEARXNG_URL=http://localhost:8888 uvx --from searxng-mcp --with fastmcp searxng-mcp`

* `llama-server` with `--webui-mcp-proxy`, then add the server in the webui.

Some gotchas:

* `searxng-mcp` forgets to declare its own dep → `--with fastmcp`.

* Endpoint is `/mcp`, not the `/searxng-mcp/mcp` the docs claim.

* `--webui-mcp-proxy` only enables the CORS proxy; each MCP server entry still needs its "Use llama-server proxy" checkbox ticked, or the browser fetches direct and CORS-fails.

* Terminal clients (OpenCode etc.) skip the proxy — point them straight at `:8100/mcp`.

A couple interesting tidbits:

* There are temporal issues with search-related tool calls. The model trips out. 2026 results read to it a "future-dated hallucination" because it doesn't know the date. There's an additional `--tools get_datetime` function that will allow it to ground via the real date.

* Snippets-only is enough for most "what's current" questions and keeps context tiny.

Let me know if you have any questions!

reply

upvote

by zarldev6 hours ago|

[-]

https://www.zarl.dev/posts/hal-by-any-other-name Here is my write up on my local model setup also have https://zarldev.github.io/zarlmono/ as my local 1st coding agent

reply

upvote

by dexterdog21 hours ago|

[-]

I've been self hosting this as my default engine across all of my searches for a few years now. I can't recommend it more highly.

reply

upvote

by viviansolide21 hours ago|

[-]

Same experience

reply

upvote

by ProofHouse21 hours ago|

[-]

I’ll have to try, I’ve only recently learned Exa pricing is a bit crazy (especially on searches where you source 30-40 sources)I just used it be default and then was like oh damn when I got hit

reply

upvote

by RandyOrion10 hours ago|

[-]

I've been using searxng for several years now. I don't run my own instances because the inhumane network censorship imposed by GFW, and proxy detection enforced by search engines. Instead, I rely on public instances on the list [1] and libredirect [2]. Note that service from a single instance is not guaranteed, but you can always switch to other available instances with little cost within a minute.

I won't say searxng can help you degoogle because metasearch engine calls other search engines, e.g., google, to collect results. However, if you try searxng, you can at least get rid of things like ai reviews in no time.

In the end, thank you people after searxng project and public instances.

[1] https://searx.space

[2] https://github.com/libredirect/browser_extension

reply

upvote

by baranul1 hours ago|

[-]

The thing about the public instances, is now you often have to go through a lot of them to verify they work properly. SearXNG needs better quality control.

Often have to go through the preferences to deselect search engines that don't work (often because of the instance being blocked) or select those that do work, because of reliability problems. Which engines are working, can be different for each public instance, so that even saving a preference hash doesn't always work.

Would be great if SearXNG did automatic adjustment of presented search engines (or offered the option) based on reliability.

reply

upvote

by satvikpendem21 hours ago|

[-]

TinySearch wraps this and works well for agents. It's better than the native SearXNG MCP because it optimizes the context before it even gets to the agent so as to not waste tokens.

https://github.com/MarcellM01/TinySearch

reply

upvote

by 21 hours ago|

[-]

deleted

reply

upvote

by drnick120 hours ago|

[-]

SearXNG did not include a built-in MCP server, last time I checked.

reply

upvote

by Havoc17 hours ago|

[-]

It does have json response though so pretty trivial to get an LLM to make you a mcp

reply

upvote

by ProofHouse21 hours ago|

[-]

Props

reply

upvote

by brucejackson3 hours ago|

[-]

Have used it in my homelab for the past 2 years, can recommend it. Easy to run in docker and helps both get better search results and keeps your search history local and in your control.

reply

upvote

by artooro20 hours ago|

[-]

It works well if you connect it the Brave Search API, but using it a scraper is fairly unreliable. Google stopped working a few days ago.

reply

upvote

by denysvitali10 hours ago|

[-]

I've built https://github.com/denysvitali/searxng-mcp to use this as an MCP for coding agents. Works very well, until you get rate limited by the providers (e.g: DDG).

It also needs a SearXNG server to run, so I recently pivoted towards a self-contained solution: https://github.com/denysvitali/search-mcp

reply

upvote

by nikvdp7 hours ago|

[-]

I built something similar ([1]) that you might find interesting. Similar to your project, but with the fun tweak that it bundles searxng inside itself, so you don't need to run or find a searxng instance to use it.

[1]: https://github.com/nikvdp/searxng-ai-kit

reply

upvote

by ninjahawk115 hours ago|

[-]

I’ve been a big fan of SearXNG for a while now. My distain for google has only grown, so having the ability to search and avoid things like yk, small AI models being installed on my PC without my consent, is awesome.

reply

upvote

by 9 hours ago|

[-]

deleted

reply

upvote

by chatmasta18 hours ago|

[-]

I’ve always liked this tool, but I’m of two minds regarding the privacy gained by sending my searches to 280 companies instead of just one.

reply

upvote

by fishgoesblub20 hours ago|

[-]

I've been using SearXNG for a few years now, however I've been trying out Degoog as a SearXNG alternative since I've had issues with engines constantly failing or being slow since day 1 of using SearXNG, but Degoog has worse results with the same engines. It's a shame since I'm having to pick between slower but better results, or very fast but worse results.

reply

upvote

by tom9ow20 hours ago|

[-]

[flagged]

reply

upvote

by ManWith2Plans21 hours ago|

[-]

I've been using this for some projects. It's exceptional and I recommend it highly.

I actually included a recipe to deploy it to kubernetes in typekro, my TypeScript infrastructure-as-code project for kubernetes: https://typekro.run/api/searxng/

reply

upvote

by rcarmo20 hours ago|

[-]

Years of regular use here, has been great even before I started using it as an agent tool.

reply

upvote

by arikrahman21 hours ago|

[-]

I have used SearXNG hosts like https://searx.be/ but stick with Brave search for the most part. Are there other good hosts people tend to use?

reply

upvote

by vimredo20 hours ago|

[-]

Personally, I self-host it myself. All the hosts I tried either errored often, or gave search results that were complete garbage.

reply

upvote

by another_twist20 hours ago|

[-]

Been a fan of searX for a while. Not sure if this is the same thing but there were plenty of hosted versions too.

reply

upvote

by lucasrufkahr20 hours ago|

[-]

Yeah, I find that searx results are way more relevant to what I’m actually looking for than a single engine. There’s so much manipulation going on that if you don’t aggregate multiple engines, it’s near impossible to get what you want.

reply

upvote

by jaygreat202013 hours ago|

[-]

This is really great!

reply

upvote

by MrDrMcCoy17 hours ago|

[-]

Friendly reminder that if your user and traffic count is low, your traffic is still unique and able to be profiled. Love and use this project, though.

reply

upvote

by grigio11 hours ago|

[-]

searxng is fantastic for AI agents

reply

upvote

by queenkjuul13 hours ago|

[-]

Highly recommend this. I set up a self-hosted instance and been using it exclusively for months. It's better than DDG and i don't miss Google whatsoever.

Image search is worse i guess mostly for lack of CDN so it's slow but whatever.

reply

upvote

by salmonik20 hours ago|

[-]

I prefer 4get.

reply

upvote

by noobcoder20 hours ago|

[-]

how do i configure which specific search engines SearXNG pulls its results from? Can we extend it to onyl search Stack Overflow and GitHub

reply

upvote

by tosief19 hours ago|

[-]

[dead]

reply

upvote

by tomfow21 hours ago|

[-]

[flagged]

reply

upvote

by tom6ow20 hours ago|

[-]

[flagged]

reply

upvote

by tom6ow20 hours ago|

[-]

[flagged]

reply

upvote

by tomnow21 hours ago|

[-]

[flagged]

reply