The best on-device AI apps for Mac (2026)
Your M-series Mac runs AI that was frontier-class two years ago. Eight jobs, eight picks, and the four categories nobody has solved yet.
Picture the menu bar of a fully-stocked indie-AI Mac in 2026. Five icons. A chat client running a 30-billion-parameter model on the GPU. A dictation app converting your voice to text in faster-than-real-time. An image generator that does FLUX in a minute and SDXL in fifteen seconds. A notes vault with semantic search. A code editor talking to a local server on port 11434. Network off, lid open, plane window. Every answer comes from the laptop on the tray table.
None of those apps existed in a usable form three years ago. Half of them are free. The other half cost less than one month of a frontier cloud subscription. The reason this list is worth writing now — and the reason most Mac users still don’t know what to install — is that the local AI category quietly graduated. Apple Silicon finally has GPU Neural Accelerators in the M5, Apple’s MLX framework is on its 0.31 release with 4,000+ pre-converted models on Hugging Face, and an open-weight 30B model can now beat last year’s GPT-class flagship on most everyday tasks. The hardware is here. The models are here. This post is the install list.

One framing note before the picks. This is the local-first list, not the “AI on Mac in general” list. If you want a chat that reaches the absolute frontier of reasoning, the answer is still a cloud product — for now. What changed in the last twelve months is that the 80% of daily AI work most people do — drafting, summarizing, transcribing, sketching, coding — can be done on a 16-to-32 GB M-series Mac with no cloud in the loop, no monthly fee, and no connection to a vendor that might shut down or train on your inputs. That is the prize the picks below are aimed at.
The rubric, before the picks
Every “best AI app” roundup smuggles its rubric. Naming the rubric first is the only way to read the list. Six tests, in order of how often they decide between two near-identical apps.
The two tests that catch most pretenders are the first and the last. A lot of apps marketed as “private AI” quietly call a cloud endpoint for the actually-hard work, then ship a local model for the easy bits to keep the marketing copy honest. Lulu, Little Snitch, or macOS’s own Privacy & Security pane in System Settings will tell you the truth in thirty seconds. Anything that asks you to sign in before you can prompt has flunked the airplane test before the engines start.
Chat: LM Studio if you tinker, Ollama plus Enchanted if you ship
The chat category had the loudest year. The headline change is LM Studio dropping its commercial-use restriction in July 2025 — it is now free for personal and at-work use, ships a polished Apple-Silicon-native build with MLX baked in, and exposes an OpenAI-compatible local server on port 1234 so every other tool on your system can talk to it. RAM footprint is whatever the model demands — a 7B Q4 idles around 5 GB, a Qwen3 30B-A3B MoE wants 24 GB loaded. On a 16 GB Mac it is still useful; on 32 GB it is excellent; on 64 GB+ it is the only chat app a power user needs.
The honourable runner-up is Ollama’s native macOS app from July 2025, paired with the open-source Enchanted SwiftUI client on top of it. Ollama added MLX support in March 2026 and now claims a 1.6× prompt-processing speedup and roughly 2× generation speed on Apple Silicon, with the biggest wins on M5 thanks to the new GPU Neural Accelerators. Pick this stack if you prefer the Mac App Store aesthetic, or if you want a CLI for scripting. Pick LM Studio if you want one window that handles model discovery, chat, and a server — for most readers, that is the right pick.
If you genuinely want the Apple-everything answer, the Foundation Models framework shipped at WWDC 2025 and now lets any third-party app call Apple’s on-device ~3B foundation model in three lines of Swift. It is free, always offline, and integrated with Writing Tools across the system. Quality and context length both cap below what a 7B open-weight gives you. It is the right answer for “summarize this paragraph” in a native app. It is not the right answer for a daily-driver chat.
Skip: Msty is fine, but every feature it has lives in one of the two picks above for free. The lifetime tier is real but the gap with LM Studio narrowed to nothing in the last six months.
Image: Draw Things, then a long way down
Image generation is the category with the clearest winner on the Mac. Draw Things is free, in the Mac App Store, written in Swift on a custom inference engine, and uses Metal FlashAttention on the GPU plus Core ML on the Neural Engine. It runs SDXL, FLUX.1, Wan 2.2 video, ControlNet, LoRA, and trains a LoRA on your own photos without leaving the app. The developer ships an update every couple of weeks. There is no subscription. It is, by a wide margin, the first app to install on a new Mac if you make images.
Two runners-up worth knowing. Mochi Diffusion is the Core-ML-first option — smaller memory footprint, slower first run, faster subsequent runs once the model is compiled to .mlmodelc. Use it if you only run Stable Diffusion variants and want the leanest setup. InvokeAI is the cross-platform power tool with the deepest ControlNet workflows; Mac support is real but the BitsAndBytes-quantized FLUX path breaks on Apple Silicon as of May 2026, which makes it the second choice for anyone whose Mac is their only machine.
For RAM planning: SDXL is comfortable on 16 GB. FLUX is comfortable on 24 GB+, possible on 16 GB with Q4 quantization and patience. Real-time interactive video generation on a local Mac is not yet a consumer category — the closest is Wan 2.2 inside Draw Things, and it is closer to a render than a generation.

Transcription: MacWhisper is the floor
Transcription is the category where on-device beats every cloud product on every axis except marketing budget. The Whisper family of open-weight models is small, fast on Apple Silicon, and accurate enough for legal depositions, medical dictation, and journalism. The polished apps are cheap, the open-source apps are competent, and there is no reason to send audio to anyone’s server unless you genuinely need a platform-grade speaker-diarization pipeline.
Pick: MacWhisper Pro at €59 lifetime on Gumroad (the App Store edition, Whisper Transcription, runs $6.99/month, $29.99/year, or $99.99 lifetime — same engine, sandboxed). Built on whisper.cpp with the Apple Neural Engine path, runs Whisper large-v3 (~3 GB) comfortably on any 16 GB Mac, and finishes a one-hour interview in a few minutes on M4 or later.
Runners-up: Aiko by Sindre Sorhus, a clean App Store purchase that handles large-v3 on macOS and smaller models on iOS; and VoiceInk, an open-source GPL system-wide dictation tool with app-specific config — free if you build from source, $25 for the Solo plan. Underneath the consumer apps, WhisperKit is the Swift package half of them ship on top of.
Skip: Wispr Flow. The product is polished and the dictation quality is excellent, but there is no offline mode — audio and screen context both leave your machine. If you wanted on-device transcription, that is not it.

Coding with local models: Continue.dev or Zed
Coding has two viable on-device paths in 2026, both free, both real. The first is Continue.dev as a VS Code extension pointed at LM Studio or Ollama — the official Ollama guide walks through a five-minute setup. The second is Zed’s native Ollama provider: point it at http://localhost:11434, pick a model with the supports_tools capability, and the built-in agent runs against your local stack with no cloud round-trip.
The model picks for May 2026, in rough order: Qwen3-Coder for general completion, DeepSeek-Coder-V2 for explanation-heavy tasks, Codestral 22B if your laptop has the headroom, and Phi-4 14B for autocomplete on 16 GB machines that need the latency. Aider is the CLI option for the same workflow with a different ergonomic; its Ollama documentation flags the default 2K context window as a footgun and tells you to set num_ctx: 32768 in .aider.model.settings.yml on the way in. That is the single most useful five lines of YAML in this post.
A warning that catches people. Cursor advertises support for local models. It is true on a technicality: the local model is the inference backend, but Cursor cannot connect to localhost directly and requires a public HTTPS endpoint, with Cursor in the request path. If you wanted end-to-end on-device, Cursor is not it. Continue.dev and Zed both are.
Notes and RAG: Reor, Obsidian, AnythingLLM
This is the category where the local-first answer beats the cloud answer on outcomes, not just on principle. Your notes are the highest- stakes RAG corpus you own — client work, salary numbers, medical history, half-written essays. The serious picks are all local.
Reor is the closest thing to a turn-key local notes app with RAG built in: llama.cpp inside, Transformers.js for embeddings, LanceDB as the vector store, vault stored as plain Markdown so you keep the files if you ever leave. AGPL, free. Best fit for “I want to write notes and ask them questions, and I want all of that on my disk.”
If you already live in Obsidian, the two plugins to install are Smart Connections for semantic search (the embeddings run locally, no API key required) and Copilot for Obsidian for a chat panel that talks to your local LM Studio or Ollama. The combination is the best second-brain workflow on the Mac and stays offline end-to-end.
For document chat with PDFs, spreadsheets, and the long tail of knowledge-worker files, the pick is AnythingLLM Desktop with a local model backend. Free, MCP-capable, workspace-based, and ships with thirty-plus provider plug-ins so you can fall back to a cloud model on the rare doc that exceeds your local context. The runner-up is GPT4All, whose LocalDocs feature is the simpler entry point if you want one binary, one button, and an answer.
Categories without a great local answer yet
Naming the gaps is how a buyer’s guide earns the picks. Four categories are not yet ready on a local Mac, and a list that pretended otherwise would burn the rest of the recommendations.
Voice and TTS. There is no MacWhisper-equivalent for text-to-speech. The serious work happens in libraries: mlx-audio runs Kokoro (an 82M-parameter Apache-licensed TTS model) on Apple Silicon, and Piper TTS is the ONNX option. Both produce excellent audio if you are willing to script. Neither has a polished consumer Mac app. Apple’s own AVSpeechSynthesizer is the system fallback, and Personal Voice clones a user-supplied voice locally on macOS, but that’s an OS feature, not a third-party app.
Real-time video generation. Wan 2.2 inside Draw Things is the best you can do on a Mac today, and it is closer to an overnight render than a Sora-style interactive experience. The consumer-grade local video category is two hardware generations away.
Browser-driving agents.The polished agents — Claude with computer use, OpenAI’s desktop agent, Perplexity — all run inference in the cloud. The local options are developer libraries: Agent Browser, browser-use, LaVague, each of which will accept an Ollama backend. The category will arrive; it is not here.
Frontier reasoning.A 128 GB M5 Max can run Llama-4 70B Q4 or Qwen3-class MoE models at acceptable speed, and they are excellent. They are not GPT-5 or Claude Opus on the hardest reasoning benchmarks. If your job depends on the absolute ceiling of AI reasoning, keep a cloud subscription for that one task and run everything else local.
How to choose, in four questions
If you wanted one answer and only one answer, the default Mac stack in May 2026 is: LM Studio for chat, Draw Things for images, MacWhisper for transcription, Continue.dev for coding, Reor or Obsidian with Smart Connections for notes. Four are free. One is €59 once. Total cost to assemble: the price of a single year of a frontier cloud subscription, paid once, with everything running on your laptop. That is the news.
The picks above will move. We will refresh this list quarterly — new models, new pricing, new MLX adoption. The frame underneath the picks is the part to keep: name the rubric, run the airplane test, prefer one-time purchases over subscriptions, choose the app that treats your unified memory like a feature rather than a workaround. Serious software has always been local on the Mac. AI is the last category to remember that.


