`gemma4-26b-a4b-it-qat.gguf`
https://huggingface.co/lmstudio-community/gemma-4-26B-A4B-it...
It is really great to use. As the poster above mentioned, my setup with Sear is the following, all through `llama.cpp`, which has a built-in webui with an MCP client:
* SearXNG in Docker — enable the JSON API (`search.formats: [html, json]`; off by default).
* `searxng-mcp` (FastMCP, native streamable-HTTP): `TRANSPORT=streamable-http HOST=127.0.0.1 PORT=8100` `SEARXNG_URL=http://localhost:8888 uvx --from searxng-mcp --with fastmcp searxng-mcp`
* `llama-server` with `--webui-mcp-proxy`, then add the server in the webui.
Some gotchas:
* `searxng-mcp` forgets to declare its own dep → `--with fastmcp`.
* Endpoint is `/mcp`, not the `/searxng-mcp/mcp` the docs claim.
* `--webui-mcp-proxy` only enables the CORS proxy; each MCP server entry still needs its "Use llama-server proxy" checkbox ticked, or the browser fetches direct and CORS-fails.
* Terminal clients (OpenCode etc.) skip the proxy — point them straight at `:8100/mcp`.
A couple interesting tidbits:
* There are temporal issues with search-related tool calls. The model trips out. 2026 results read to it a "future-dated hallucination" because it doesn't know the date. There's an additional `--tools get_datetime` function that will allow it to ground via the real date.
* Snippets-only is enough for most "what's current" questions and keeps context tiny.
Let me know if you have any questions!