Strongly agree that the knowledge density is impressive for the being a 1-bit model with such a small size and blazing fast response
I should note this is running on an RTX 6000 pro, so it's probably at the max speed you'll get for "consumer" hardware.
That... pft. Nevermind, I'm just jealous
The question is about the best mode of transportation to a car wash located 100 meters away. Since the user is asking for a recommendation, it's important to consider practical factors like distance, time, and convenience.
Walking is the most convenient and eco-friendly option, especially if the car wash is within a short distance. It avoids the need for any transportation and is ideal for quick errands.
Driving is also an option, but it involves the time and effort of starting and stopping the car, parking, and navigating to the location.
Given the proximity of the car wash (100 meters), walking is the most practical and efficient choice. If the user has a preference or if the distance is longer, they can adjust accordingly.We're only a couple of years into optimization tech for LLMs. How many other optimizations are we yet to find? Just how small can you make a working LLM that doesn't emit nonsense? With the right math could we have been running LLMs in the 1990s?
Sample output below (the model's response to "hi" in the forked llama-cli):
X ( Altern as the from (.. Each. ( the or,./, and, can the Altern for few the as ( (. . ( the You theb,’s, Switch, You entire as other, You can the similar is the, can the You other on, and. Altern. . That, on, and similar, and, similar,, and, or in
1. Clone their forked repo: `git clone https://github.com/PrismML-Eng/llama.cpp.git`
2. Then (assuming you already have xcode build tools installed):
cd llama.cpp
cmake -B build -DGGML_METAL=ON
cmake --build build --config Release -j$(sysctl -n hw.logicalcpu)
3. Finally, run it with (you can adjust arguments): ./build/bin/llama-server -m ~/Downloads/Bonsai-8B.gguf --port 80 --host 0.0.0.0 --ctx-size 0 --parallel 4 --flash-attn on --no-perf --log-colors on --api-key some_api_key_string
Model was first downloaded from: https://huggingface.co/prism-ml/Bonsai-8B-gguf/tree/mainAnd this is when Im serving zero prompts.. just loaded the model (using llama-server).
Literally just downloaded the model into a folder, opened cursor in that folder, and told it to get it running.
Prompt: The gguf for bonsai 8b are in this local project. Get it up and running so I can chat with it. I don't care through what interface. Just get things going quickly. Run it locally - I have plenty of vram. https://huggingface.co/prism-ml/Bonsai-8B-gguf/tree/main
I had to ask it to increase the context window size to 64k, but other than that it got it running just fine. After that I just told ngrok the port I was serving it on and voila.