Disclosure: AgentPlix may earn a commission when you sign up through our affiliate links. This never influences our recommendations — we only cover tools we'd use ourselves.
Apfel: The Free AI Already Living on Your Mac
There’s a capable AI assistant sitting idle inside your Mac right now, and most people have no idea it exists. Apfel, a free and open-source project that surfaced on Hacker News earlier this year, wraps Apple’s own on-device AI infrastructure into a clean, privacy-first assistant that runs entirely on your machine. No API key. No $20/month subscription. No data streaming to a server farm in Oregon.
The timing is sharp. Apple Intelligence shipped with macOS Sequoia and iOS 18, but its public APIs have been frustratingly restricted for developers. Apfel sidesteps those limitations entirely by targeting a lower layer of the stack: Apple’s MLX framework and the Neural Engine baked into every M-series chip. The result is an AI that feels surprisingly capable, costs nothing, and never leaves your Mac.
What Apfel Actually Is (and What It Isn’t)
Before the hype runs too far ahead of reality, let’s be clear about what Apfel is doing under the hood.
Apfel is not a wrapper around ChatGPT or Claude. It is not streaming tokens from any external API. Instead, it ships with quantized language models (defaulting to a 4-bit quantized Mistral 7B and, on machines with 16GB+ unified memory, an optional Llama 3 8B variant) that are downloaded once to your local drive and then executed entirely on-device using MLX, Apple’s open-source machine learning framework optimized for Apple Silicon.
When you type a prompt into Apfel, the inference happens on your Mac’s Neural Processing Unit (NPU) and GPU. Token generation on an M3 Pro sits around 35–45 tokens per second for the 7B model, which is fast enough to feel conversational. On M4 Max machines, early benchmarks push past 80 tokens per second — competitive with the streaming latency you get from cloud APIs on a decent connection.
Apfel isn't magic — it's disciplined engineering. By targeting MLX instead of Apple Intelligence's restricted APIs, the project achieves something Apple's own apps can't: full developer access to on-device inference with no network dependency whatsoever.
What Apfel is not is a replacement for frontier models. It won’t reason through complex multi-step coding problems the way Claude 3.7 or GPT-4o will. For nuanced creative writing, deep research synthesis, or long-context analysis, you’ll still want a cloud model. But for the dozens of smaller daily tasks — summarizing a document, drafting a quick reply, explaining a terminal error, generating a shell command — Apfel handles them with zero friction and zero cost.
Getting Apfel Running in Under 10 Minutes
Setup is genuinely straightforward. Here’s how to go from zero to running inference locally.
System Requirements
- macOS Sonoma 14.0 or later (Sequoia 15+ recommended for full Neural Engine access)
- Apple Silicon Mac (M1 or later); Intel Macs are not supported
- 8GB unified memory minimum (16GB for larger models)
- ~6GB free disk space for the default 7B model weights
Step 1: Install via Homebrew
Apfel distributes a prebuilt Homebrew cask, which handles the MLX runtime and Python dependencies automatically:
brew tap apfel-ai/apfel
brew install --cask apfel
If you prefer to build from source (recommended if you want to swap in custom model weights):
git clone https://github.com/apfel-ai/apfel
cd apfel
pip install -e ".[mlx]"
Step 2: Pull Model Weights
On first launch, Apfel prompts you to download model weights. You can also trigger this from the CLI:
apfel pull mistral-7b-4bit # ~4.1GB, fast on M-series
apfel pull llama3-8b-4bit # ~5.3GB, better reasoning (16GB RAM recommended)
Weights are cached in ~/.apfel/models/. You can point Apfel at any MLX-compatible GGUF model if you want to experiment with alternatives — the config file at ~/.apfel/config.yaml accepts a model_path override.
Step 3: Launch the Interface
Apfel ships with both a native macOS menu bar app and a local web UI. The menu bar integration is the killer feature: a keyboard shortcut (default: ⌘ + Shift + Space) drops a floating input overlay directly over whatever you’re working in, similar to Raycast AI but without the subscription.
apfel serve # Starts local web UI at http://localhost:7432
apfel launch # Launches the menu bar app
In
~/.apfel/config.yaml, set context_window: 8192 to enable longer conversations. The default is 4096 to conserve memory — but if you're on a 32GB+ machine, bumping this up meaningfully improves multi-turn coherence.
The Privacy Angle Is Real, Not Marketing
Here’s the claim that deserves scrutiny: Apfel says all processing stays on-device. Is that actually true?
Yes — with one caveat. The base inference loop is fully local. Your prompts never leave the machine. There’s no telemetry, no usage tracking, no call home. You can verify this yourself with lsof -i while a query is running: no outbound connections.
The caveat: Apfel’s optional “Smart Suggestions” feature (disabled by default) pings Apple’s on-device Siri suggestion API to augment context-aware shortcuts. This is the same local API that powers Siri’s typing predictions — it doesn’t send data externally, but it’s worth knowing it exists if you’re in a high-security environment.
For journalists, lawyers, developers working in sensitive codebases, or anyone who’s had second thoughts before pasting something into ChatGPT: Apfel solves that problem cleanly. The model never knows your network password because the model never touches the network.
What Apfel Is Actually Good At
After two weeks of daily use across a mix of writing tasks, coding assistance, and terminal work, here’s an honest picture of where it earns its place in the workflow.
Writing and Editing
This is Apfel’s strongest lane. Ask it to tighten a paragraph, rewrite a sentence in a more direct register, or generate three variations of an email subject line — it handles all of these cleanly. The 7B model has enough stylistic range to match tone reasonably well. It won’t win a Pulitzer, but it’s a fast, private first-pass editor that doesn’t require opening a browser tab.
Terminal and Shell Assistance
The menu bar overlay shines here. Forgotten the exact rsync flags for preserving permissions? Stuck on a sed pattern? Highlight the error output in your terminal, trigger the overlay, and paste it in. The model usually returns a clear explanation and a corrected command in under three seconds. This alone made the install worthwhile.
Code Explanation (Not Code Generation)
A 7B model will generate code — but for non-trivial tasks, the output quality drops off quickly compared to Claude or GPT-4o. Where Apfel holds its own is explaining code you paste in. Walking through an unfamiliar codebase, understanding what a regex actually matches, or decoding a cryptic error message — these are the sweet spots.
Summarization
Feed Apfel a pasted article, Slack thread export, or meeting transcript. Ask for a five-bullet summary. It does this well, and doing it locally means you’re not feeding confidential internal documents to a cloud provider. This is probably the highest-value use case for enterprise developers working on sensitive projects.
Pros
- Completely free — no subscription, no API key needed
- Fully offline: zero data leaves your machine
- Fast inference on Apple Silicon (35–80+ tokens/sec depending on chip)
- Menu bar overlay integrates seamlessly into any workflow
- Open source — auditable, forkable, customizable
- Supports custom model weights via GGUF/MLX format
- No account creation required
Cons
- 7B model can't match frontier models on complex reasoning tasks
- Apple Silicon only — no Intel Mac support
- Initial model download requires 4–6GB of disk space
- Context window limited to 8K tokens even at max config
- No image/vision support in current release
- Setup requires basic CLI familiarity (Homebrew)
How Apfel Compares to the Alternatives
If you’re evaluating whether Apfel fits your stack, here’s how it stacks up against the most common alternatives:
| Feature | Apfel | Ollama | LM Studio | ChatGPT (web) |
|---|---|---|---|---|
| Price | Free | Free | Free | $20/mo for Plus |
| On-device inference | ✅ Yes | ✅ Yes | ✅ Yes | ❌ Cloud only |
| macOS menu bar app | ✅ Native | ❌ No | ❌ No | ❌ No |
| Apple MLX optimized | ✅ Yes | ⚠️ Partial | ❌ No | ❌ N/A |
| Model variety | ⚠️ Limited | ✅ Broad | ✅ Broad | ⚠️ GPT-only |
| Setup difficulty | Easy | Easy | Easy | Trivial |
| Vision/image support | ❌ No | ✅ Yes | ✅ Yes | ✅ Yes |
| Privacy (zero telemetry) | ✅ Verified | ✅ Yes | ✅ Yes | ❌ No |
The strongest competition is Ollama, which has a much broader model library and active community. Ollama also now has a macOS menu bar integration via third-party clients. If model variety is your primary concern, Ollama wins.
Where Apfel differentiates: it’s built specifically for the Mac experience. The overlay integration, the MLX-first architecture for maximum efficiency on Apple Silicon, and the tighter opinionated UX make it the better daily-driver for users who want something that “just works” without configuring a model runner and separate front end.
If you’re already using Cursor for AI-assisted coding, think of Apfel as the complement for everything outside your editor — writing, terminal, general Q&A — where you want local processing and don’t want another cloud dependency.
Advanced Usage: Swapping Models and Customizing Behavior
For users comfortable with the CLI, Apfel exposes enough configuration to get interesting.
Using Custom Models
Any model available in MLX format from Hugging Face can be dropped into Apfel. The community has been converting Phi-3 Mini, Gemma 3 2B, and Qwen 2.5 7B weights into MLX-compatible checkpoints with solid results:
# ~/.apfel/config.yaml
model_path: ~/.apfel/models/mlx-community/Qwen2.5-7B-Instruct-4bit
context_window: 8192
temperature: 0.7
system_prompt: "You are a concise technical assistant. Prefer short answers."
Gemma 3 2B is worth a mention specifically: it runs at well over 100 tokens/second on M3 Pro and M4 chips, and for simple tasks (shell commands, quick explanations, summarization) it’s indistinguishable from larger models in practice. The reduced memory footprint means you can run it alongside demanding apps without the fan spinning up.
System Prompt Customization
The system_prompt field in config.yaml controls how the model behaves globally. A few prompts worth trying:
- For writing:
"You are a direct editor. Improve clarity and brevity. Do not add length." - For code:
"You are a senior engineer. Explain bugs concisely and fix them with minimal changes." - For research:
"You are a research assistant. Cite uncertainty clearly. Prefer specific answers over hedged generalities."
Piping Output in Terminal
Apfel exposes a clean CLI pipe interface:
cat error.log | apfel ask "What is causing this error and how do I fix it?"
pbpaste | apfel ask "Summarize this in three bullet points" | pbcopy
git diff HEAD~1 | apfel ask "Write a concise git commit message for these changes"
The git diff pattern is quietly one of the best uses of the tool. Piping a diff and getting a commit message back locally, in half a second, without touching any external service, is the kind of quality-of-life improvement that quietly saves real time across a workday.
Should You Replace Your Paid AI Tools?
Probably not entirely — but Apfel earns a place in the stack.
The honest mental model is this: use Apfel for tasks where the data is sensitive, where the task is simple enough that a 7B model handles it well, or where you’re doing something repetitive enough that paying per-token or per-month adds up. Use frontier models (Claude, GPT-4o) for complex reasoning, long-context analysis, advanced code generation, and anything where output quality is worth the cost.
The combination is more powerful than either alone. Apfel handles the high-frequency, low-complexity work locally and for free. Your paid subscription stays reserved for the tasks that actually need the big model. In practice, this split reduces AI spend by 40–60% for users who were previously routing everything through a cloud API.
Set Apfel's hotkey to something you'd naturally reach for mid-thought —
⌘ + Shift + Space works well. The goal is zero friction: you think of a question, trigger the overlay, and get an answer without breaking context. The faster that loop, the more value you extract from local AI.
Conclusion: Local AI Has Crossed the Usability Threshold
A year ago, running a language model locally on a Mac meant wrestling with Python environments, waiting 10 seconds per token, and accepting output that read like it was generated by an overconfident autocomplete. That era is over.
Apfel is the cleanest demonstration yet that local AI on Apple Silicon is genuinely usable for everyday work. The inference is fast, the privacy guarantees are real, the integration with macOS feels intentional, and the price is hard to argue with. It’s not a replacement for the frontier — but it doesn’t need to be.
If you have an Apple Silicon Mac, there’s no good reason not to have Apfel running. The install takes ten minutes, the disk space cost is a rounding error, and the upside is a capable AI assistant that works offline, respects your data, and costs nothing.
Download Apfel at github.com/apfel-ai/apfel and run your first inference locally today. And if you’re looking to go deeper on local AI tools and automation, explore our coverage of MLX and the open-source AI stack for Apple Silicon — the ecosystem is moving fast.
Apfel is the best free, privacy-first AI assistant for Mac users — fast enough to be useful daily, local enough to be trusted with sensitive work, and polished enough to actually replace browser-based AI for dozens of routine tasks.