So the full solution would be models trained in an open verifiable way and running locally.
You can trigger the the service's ToS violation or worse, get tipped off to law enforcement for something you didn't even write.
In HN circles perhaps. Average Joes don’t care.
anthropic, google, openai etc, decided that their consumer ai plans would not be private. partly to collect training data, the other half to employ moderators to review user activity for safety.
we trust that human moderators will not review and flag our icloud docs, onedrive or gmail, or aggregate such documents into training data for llms. it became the norm that an llm is somehow not private. it became a norm that you can't opt out of training, even on paid plans (see meta and google); or if you can opt out of training, you can't opt out of moderation.
cloud models with a zero retention privacy policy are private enough for almost everyone, the subscriptions, google search, ai search engines are either 'buying' your digital life or covering themselves for legal reasons.
you can and should have private cloud services, and if legal agreement is not enough, cryptographic attestation is already used in compute, with AWS nitro enclaves and other providers.
I personally think everyone should default to using local resources. Cloud resources should only be used for expansion and be relatively bursty rather than the default.
That's two halves of "why", sure.
Another interesting half would be that those companies have US military officers on their boards, and LLMs are the ultimate voluntary data collection platform, even better trojan horses than smartphones.
Yet another "half" could be how much enterprise value might be found by datamining for a minute or two... may I suggest reading a couple of Martha Wells books.
As an enthusiastic reader of books like Privacy is Power and Surveillance Capitalism, it feels good to have a private tool that is ready at hand.
I saw a service named Phala, which claims to be actually no-knowledge to server side (I think). It was significantly more expensive, but interesting to see it's out there. My thought was escaping the data-collection-hungry consumer models was a big win.
cryptographic confirmation of zero knowledge: yes.
the latter, based on trust in the hardware manufacturer and their root ca. so, encrypted if you trust intel/nvidia to sign it.
there are a few services, phala, tinfoil, near ai, redpill is an aggregator of those
if you are happy with off-prem then the llm is ok too, if you need on-prem this is when you will need local.
The private thing is the prompt.
But also, a local LLM opens up the possibility of agentic workflows that don't have to touch the Internet.
AFAIK the current model is on par with with Qwen-3-4B, which is from a year ago [0]. There's a big leap going from last year Qwen-3-4B to Qwen-3.5-4B or to Gemma 4.
Apple model is nice since you don't need to download anything else, but I'd rather use the latest model than to use a model from a year ago.
https://machinelearning.apple.com/research/apple-foundation-...
Of course I imagine Apple is not going to be the fastest mover in this regard. I’m not even sure they believe the product will be widely impactful anymore and may keep it relegated to a small list of popular use cases like photo touch ups and quick questions to Siri. For me the most useful parts of Apple ’s AI don’t even require me to enable Apple Intelligence.
With the Claude bug, or so it is known, burning through tokens at record speed, I gave alternative models a try and they're mostly ... interchangeable. I don't know how easy switching and low brand loyalty and fast markets will play out. I hope that local LLMs will become very viable very soon.
I have a new prompt to test LLMs much like simonw's pelican test.
"What is 9:30am Taiwan time in US, Pacific?" For some reason, the answers are quite inconsistent but all wrong.
./apfel "what is 9:30am Taiwan time in US, Pacific?"
Taiwan is 12 hours ahead of the Pacific Time Zone. Therefore, 9:30 AM Taiwan time would be 9:30 PM Pacific Time.
Taiwan is 13 hours ahead of the Pacific Time Zone. Therefore, 9:30 AM in Taiwan is 10:30 PM in the Pacific Time Zone.
Taiwan is in the China Standard Time (CST) zone, which is 12 hours ahead of the Pacific Standard Time (PST) zone. Therefore, 9:30 AM in Taiwan is 9:30 PM in the Pacific.
Taiwan is typically 11 hours ahead of the Pacific Time Zone. Therefore, 9:30 AM in Taiwan is 8:30 PM in the Pacific Time Zone.
Taiwan is 13 hours ahead of the Pacific Time Zone. Therefore, 9:30 AM in Taiwan is 10:30 PM the previous day in the Pacific Time Zone.Some such projects use CORS to allow read back as well. I haven’t read Apfel’s code yet, but I’m registering the experiment before performing it.
This is partially in response to https://localmess.github.io/ where Meta and Yandex pixel JS in websites would ping a localhost server run by their Android apps as a workaround to third-party cookie limits.
Chrome 142 launched a permission dialog: https://developer.chrome.com/blog/local-network-access
Edge 140 followed suit: https://support.microsoft.com/en-us/topic/control-a-website-...
And Firefox is in progress as well, though I couldn't find a clear announcement about rollout status: https://fosdem.org/2026/schedule/event/QCSKWL-firefox-local-...
So things are getting better! But there was a scarily long time where a rogue JS script could try to blindly poke at localhost servers with crafty payloads, hoping to find a common vulnerability and gain RCE or trigger exfiltration of data via other channels. I wouldn't be surprised if this had been used in the wild.
The default scenario should be secure. If the local site sends permissive CORS headers bets may be off. I would need to check but https->http may be a blocker too even in that case. Unless the attack site is http.
The task is basically predicting pricing and costs.
Apple’s model came out on top—best accuracy in 6 out of 10 cases in the backtest. That surprised me.
It also looks like it might be fast enough to take over the whole job. If I ran this on Sonnet, we’re talking thousands per month. With DeepSeek, it’s more like hundreds.
So far, the other local models I’ve tried on my 64GB M4 Max Studio haven’t been viable - either far too slow or not accurate enough. That said, I haven’t tested a huge range yet.
This doesn't feel truthful, it sounds like this tool is a hack that unlocks something. If I understand it correctly, it's using the same FoundationModels framework that powers Apple Intelligence, but for CLI and OpenAI compatible REST endpoint. Which is fine, just the marketing goes hard a bit.
> Runs on Neural Engine
Also unsure if this runs on ANE, when I tried Apple Intelligence I saw that it ran on the GPU (Metal).
Also unsure…
Thank you for sharing your feelings and uncertainty.
Perhaps resist the urge to post until you have something to contribute.
You on the other hand contributed literally nothing to the topic
The poster said:
> Also unsure if this runs on ANE, when I tried Apple Intelligence I saw that it ran on the GPU (Metal).
They added something of some substance here.
Your post expressing your feelings did not.
Submitted a PR to prevent its installation on macos versions older than Tahoe(26), since I was able to install it on my older macos 15, but it aborted on execution.
Already in Chrome as an origin trial: https://developer.chrome.com/docs/ai/prompt-api
apfel -o json "Translate to German: apple" | jq .content
Unfortunately, I found the small context window makes the utility pretty limited.
Then save the heavy lifting for the big boys.
Both are easily extensible, so you could certainly wire either up to the local (or hosted) LLM of your choice.
Side note: I just noticed the extension example in the Command Palette demo reel searches HN.
[1] https://www.alfredapp.com/
[2] https://learn.microsoft.com/en-us/windows/powertoys/command-...
https://github.com/ehamiter/afm
It's really handy for quick things like "what's the capital of country x" but for coding, I feel that it is severely limited. With such a small context it's (currently) not great for complicated things.
Can you share a working example?
trying to run openclaw with it in ultra token saving mode, did totally not work.
great for shell scripts though (my major use case now)
apple does have an on device rag pipeline called the semantic index that feeds personal data like contacts emails calendar and photos into the model context but this is only available to apples own first party features like siri and system summaries.
it is not exposed through the foundationmodels api.
The mic button requires clicking to transcribe and start listening again, and default voice is low-quality (I assume it can be configured).
In general I'm looking for a way to try the on-device hands-free voice mode.
https://developer.apple.com/documentation/Updates/Foundation...
They released an official python SDK in March 2026:
Imagine they baked Qwen 3.5 level stuff into the OS. Wow that’d be cool.
https://www.linkedin.com/posts/nathangathright_marco-arment-...
and yet... successful people have used it to build really successful things: Facebook, Tumblr (I think), the things Marco's been involved with.
I just dunno outside of meta should we really be pushing php with all its flaws? or is it still flawed and I need to update my priors?
And living with decisions made 15 years ago may be much more successful than trying to change horses mid-stream.
% apfel --model-info
apfel v0.6.25 — model info
├ model: apple-foundationmodel
├ on-device: true (always)
├ available: yes
├ context: 4096 tokens
├ languages: zh, en, nl, zh, es, es, ja, en, pt, da, fr, it, nb, vi, tr, en, de, fr, es, pt, ko, sv, zh
└ framework: FoundationModels (macOS 26+)
Just use the language you want when prompting it, like other LLMs? % apfel "Gib mir ein Rezept für Currywurst."
Natürlich! Hier ist ein einfaches Rezept für Currywurst:
### Zutaten:
- **Für die Würste:**
- 500 g Bratwürste (z. B. Frankfurter Würste)
(note: clipped most of the reply, since I assume most of us here don't actually need an LLM-generated recipe)I guess LLMs have made it too simple to instantly build startup landing page slop, which causes this? Like, do we need to see the github star count chart? Do we need all the buzzwords and stuff? You'd think this was a startup trying to get a billion dollar evaluation. It feels disingenuous.
Maybe I'm just being a hater.
parsing logfiles line by line, sure
parsing a whole logfile, well it must be tiny, logfile hardly ever are
If all LLMs did this, people would trust them more.
I too would love to try this for simple prompts but won’t be updating past Sequoia for the foreseeable future.
Hard to know what to do with this. I'm interested in the project and know others who would be, but I feel like shit after being slopped on by a landing page and I don't wish to slop on my friends by sharing it with them. I suppose the github link is indeed significantly better, I'll share that.
It's just these flashy LLM-generated webpages are really misleading.
It's filled with attention-grabbing LLM-filler. It presents itself as high-effort content, but in reality is unclear if it's accurate, unclear how much the author even put into READING the content they're asking me to read, and overall just really unclear what parts of the page are important to the author and what are just stuff the LLM inserted into a marketing card to make it look like a SaaS sales pitch.
dyld[71398]: Library not loaded: /System/Library/Frameworks/FoundationModels.framework/Versions/A/FoundationModels
Referenced from: <32818E2F-CB45-3506-A35B-AAF8BDDFFFCE> /opt/homebrew/Cellar/apfel/0.6.25/bin/apfel (built for macOS 26.0 which is newer than running OS)
Reason: tried: '/System/Library/Frameworks/FoundationModels.framework/Versions/A/FoundationModels' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/System/Library/Frameworks/FoundationModels.framework/Versions/A/FoundationModels' (no such file), '/System/Library/Frameworks/FoundationModels.framework/Versions/A/FoundationModels' (no such file, not in dyld cache) apfel "How many shopping days until Christmas?"
To determine how many shopping days until Christmas, you'll need to know the current date and the date of Christmas. Christmas is on December 25th, so you can subtract the current date from this date to find the number of shopping days. If you provide the current date, I can help you calculate that. apfel "How many shopping days until Christmas? Today is April 3rd."
To calculate the number of shopping days until Christmas, we first need to determine the date of Christmas this year.Christmas in 2023 is on December 25th.
Starting from April 3rd, we calculate the days:
1. April: 30 days (from April 3rd to April 30th) 2. May: 31 days 3. June: 30 days 4. July: 31 days 5. August: 31 days 6. September: 30 days 7. October: 31 days 8. November: 30 days
Adding these up gives us:
30 + 31 + 30 + 31 + 31 + 30 + 31 + 30 = 252 days
Christmas is on December 25th, so we subtract 252 days from today, April 3rd:
252 - 252 = 0 days
So, there are 0 shopping days left until Christmas, but since Christmas is already past, there are no shopping days left.
Yep, feels like Siri.
"Text Carol bring me a glass of water please"
"I'm sorry, I don't see a 'Carol Bring' in your contacts"
It’s a nice LLM because it seems fairly decent and it loads instantly and uses the CPU neural engine. The GPU is faster but when I run bigger LLMs on the GPU the normally very cool M series Mac becomes a lap roaster.
It’s a small LLM though. Seems decent but it’s also been safety trained to a somewhat comical degree. It will balk over safety at requests that are in fact quite banal.
> $0 cost
No kidding.
Why not just link the GH Github: https://github.com/Arthur-Ficial/apfel
So you have to put up with the low contrast buggy UI to use that.
This actually looks really neat. I'll have to bookmark this for whenever I'm dragged kicking and screaming into the abomination that is "Tahoe."