ListicleLocal AIMacMay 25, 202612 min read

The best on-device AI apps for Mac (2026)

Your M-series Mac runs AI that was frontier-class two years ago. Eight jobs, eight picks, and the four categories nobody has solved yet.

By Atul

On-device AI · macOS 26 Tahoe

Eight jobs · eight picks

Your M-series Mac runs AI that was frontier-class two years ago. Most people don’t know which app to install.

Chat

LM Studio

Free, MLX-native, GGUF + MLX

Image

Draw Things

Free, AppKit, Metal FlashAttention

Transcription

MacWhisper

$59 lifetime, Whisper large-v3

Coding

Continue.dev + Zed

Free, points at LM Studio or Ollama

Notes & RAG

Reor

AGPL, local embeddings, local LLM

Document chat

AnythingLLM

Free, workspace RAG, MCP support

Voice / TTS

MLX-Audio + Kokoro

Library-grade, no polished app yet

Agents

Browser-Use + local LLM

Developer toolkit, no consumer app

Mature pick Usable, with caveats Category isn’t ready

Picture the menu bar of a fully-stocked indie-AI Mac in 2026. Five icons. A chat client running a 30-billion-parameter model on the GPU. A dictation app converting your voice to text in faster-than-real-time. An image generator that does FLUX in a minute and SDXL in fifteen seconds. A notes vault with semantic search. A code editor talking to a local server on port 11434. Network off, lid open, plane window. Every answer comes from the laptop on the tray table.

None of those apps existed in a usable form three years ago. Half of them are free. The other half cost less than one month of a frontier cloud subscription. The reason this list is worth writing now, and the reason most Mac users still don’t know what to install, is that the local AI category quietly graduated. Apple Silicon finally has GPU Neural Accelerators in the M5, Apple’s MLX framework is on its 0.31 release with 4,000+ pre-converted models on Hugging Face, and an open-weight 30B model can now beat last year’s GPT-class flagship on most everyday tasks. The hardware is here. The models are here. This post is the install list.

A MacBook Pro open on a clean wooden desk in a sunlit room. — One laptop, eight icons, no proxy. Photo by Bram Van Oost on Unsplash.

One framing note before the picks. This is the local-first list, not the “AI on Mac in general” list. If you want a chat that reaches the absolute frontier of reasoning, the answer is still a cloud product, for now. What changed in the last twelve months is that the 80% of daily AI work most people do (drafting, summarizing, transcribing, sketching, coding) can be done on a 16-to-32 GB M-series Mac with no cloud in the loop, no monthly fee, and no connection to a vendor that might shut down or train on your inputs. That is the prize the picks below are aimed at.

The rubric, before the picks

Every “best AI app” roundup smuggles its rubric. Naming the rubric first is the only way to read the list. Six tests, in order of how often they decide between two near-identical apps.

The rubric · six tests an app has to pass

Test

Why it matters

What passing looks like

Silicon-native

MLX, Core ML, or Metal, not a Python wrapper

Memory and battery both behave

Inference on-device

Network off, model still answers

Verify with Little Snitch or Lulu

RAM at load

7B Q4 needs ~5 GB, 70B Q4 needs ~45 GB

Fits in 16 GB for the everyday model

Model breadth

Multi-provider, multi-modal, bring-your-own weights

Open weights swap in without a re-download dance

Pricing model

One-time, free, or subscription: pick honestly

No surprise tier that re-routes you to a cloud

Offline behavior

Airplane mode is the test, not the marketing copy

Works at altitude, no “sign in to continue”

The two tests that catch most pretenders are the first and the last. A lot of apps marketed as “private AI” quietly call a cloud endpoint for the actually-hard work, then ship a local model for the easy bits to keep the marketing copy honest. Lulu, Little Snitch, or macOS’s own Privacy & Security pane in System Settings will tell you the truth in thirty seconds. Anything that asks you to sign in before you can prompt has flunked the airplane test before the engines start.

Chat: LM Studio if you tinker, Ollama plus Enchanted if you ship

The chat category had the loudest year. The headline change is LM Studio dropping its commercial-use restriction in July 2025: it is now free for personal and at-work use, ships a polished Apple-Silicon-native build with MLX baked in, and exposes an OpenAI-compatible local server on port 1234 so every other tool on your system can talk to it. RAM footprint is whatever the model demands: a 7B Q4 idles around 5 GB, a Qwen3 30B-A3B MoE wants 24 GB loaded. On a 16 GB Mac it is still useful; on 32 GB it is excellent; on 64 GB+ it is the only chat app a power user needs.

The honourable runner-up is Ollama’s native macOS app from July 2025, paired with the open-source Enchanted SwiftUI client on top of it. Ollama added MLX support in March 2026 and now claims a 1.6× prompt-processing speedup and roughly 2× generation speed on Apple Silicon, with the biggest wins on M5 thanks to the new GPU Neural Accelerators. Pick this stack if you prefer the Mac App Store aesthetic, or if you want a CLI for scripting. Pick LM Studio if you want one window that handles model discovery, chat, and a server. For most readers, that is the right pick.

If you genuinely want the Apple-everything answer, the Foundation Models framework shipped at WWDC 2025 and now lets any third-party app call Apple’s on-device ~3B foundation model in three lines of Swift. It is free, always offline, and integrated with Writing Tools across the system. Quality and context length both cap below what a 7B open-weight gives you. It is the right answer for “summarize this paragraph” in a native app. It is not the right answer for a daily-driver chat.

Skip: Msty is fine, but every feature it has lives in one of the two picks above for free. The lifetime tier is real but the gap with LM Studio narrowed to nothing in the last six months.

Image: Draw Things, then a long way down

Image generation is the category with the clearest winner on the Mac. Draw Things is free, in the Mac App Store, written in Swift on a custom inference engine, and uses Metal FlashAttention on the GPU plus Core ML on the Neural Engine. It runs SDXL, FLUX.1, Wan 2.2 video, ControlNet, LoRA, and trains a LoRA on your own photos without leaving the app. The developer ships an update every couple of weeks. There is no subscription. It is, by a wide margin, the first app to install on a new Mac if you make images.

Two runners-up worth knowing. Mochi Diffusion is the Core-ML-first option: smaller memory footprint, slower first run, faster subsequent runs once the model is compiled to .mlmodelc. Use it if you only run Stable Diffusion variants and want the leanest setup. InvokeAI is the cross-platform power tool with the deepest ControlNet workflows; Mac support is real but the BitsAndBytes-quantized FLUX path breaks on Apple Silicon as of May 2026, which makes it the second choice for anyone whose Mac is their only machine.

For RAM planning: SDXL is comfortable on 16 GB. FLUX is comfortable on 24 GB+, possible on 16 GB with Q4 quantization and patience. Real-time interactive video generation on a local Mac is not yet a consumer category. The closest is Wan 2.2 inside Draw Things, and it is closer to a render than a generation.

A Mac mini on a desk next to a Magic Keyboard and a small plant. — Unified memory is the only spec that matters for local AI. Photo by BoliviaInteligente on Unsplash.

Transcription: MacWhisper is the floor

Transcription is the category where on-device beats every cloud product on every axis except marketing budget. The Whisper family of open-weight models is small, fast on Apple Silicon, and accurate enough for legal depositions, medical dictation, and journalism. The polished apps are cheap, the open-source apps are competent, and there is no reason to send audio to anyone’s server unless you genuinely need a platform-grade speaker-diarization pipeline.

Pick: MacWhisper Pro at €59 lifetime on Gumroad (the App Store edition, Whisper Transcription, runs $6.99/month, $29.99/year, or $99.99 lifetime, same engine, sandboxed). Built on whisper.cpp with the Apple Neural Engine path, runs Whisper large-v3 (~3 GB) comfortably on any 16 GB Mac, and finishes a one-hour interview in a few minutes on M4 or later.

Runners-up: Aiko by Sindre Sorhus, a clean App Store purchase that handles large-v3 on macOS and smaller models on iOS; and VoiceInk, an open-source GPL system-wide dictation tool with app-specific config: free if you build from source, $25 for the Solo plan. Underneath the consumer apps, WhisperKit is the Swift package half of them ship on top of.

Skip: Wispr Flow. The product is polished and the dictation quality is excellent, but there is no offline mode: audio and screen context both leave your machine. If you wanted on-device transcription, that is not it.

Over-ear headphones resting next to a laptop and a notebook. — Transcription is the local category with the cleanest answer. Photo by Dan Farrell on Unsplash.

Coding with local models: Continue.dev or Zed

Coding has two viable on-device paths in 2026, both free, both real. The first is Continue.dev as a VS Code extension pointed at LM Studio or Ollama: the official Ollama guide walks through a five-minute setup. The second is Zed’s native Ollama provider: point it at http://localhost:11434, pick a model with the supports_tools capability, and the built-in agent runs against your local stack with no cloud round-trip.

The model picks for May 2026, in rough order: Qwen3-Coder for general completion, DeepSeek-Coder-V2 for explanation-heavy tasks, Codestral 22B if your laptop has the headroom, and Phi-4 14B for autocomplete on 16 GB machines that need the latency. Aider is the CLI option for the same workflow with a different ergonomic; its Ollama documentation flags the default 2K context window as a footgun and tells you to set num_ctx: 32768 in .aider.model.settings.yml on the way in. That is the single most useful five lines of YAML in this post.

A warning that catches people. Cursor advertises support for local models. It is true on a technicality: the local model is the inference backend, but Cursor cannot connect to localhost directly and requires a public HTTPS endpoint, with Cursor in the request path. If you wanted end-to-end on-device, Cursor is not it. Continue.dev and Zed both are.

Notes and RAG: Reor, Obsidian, AnythingLLM

This is the category where the local-first answer beats the cloud answer on outcomes, not just on principle. Your notes are the highest- stakes RAG corpus you own: client work, salary numbers, medical history, half-written essays. The serious picks are all local.

Reor is the closest thing to a turn-key local notes app with RAG built in: llama.cpp inside, Transformers.js for embeddings, LanceDB as the vector store, vault stored as plain Markdown so you keep the files if you ever leave. AGPL, free. Best fit for “I want to write notes and ask them questions, and I want all of that on my disk.”

If you already live in Obsidian, the two plugins to install are Smart Connections for semantic search (the embeddings run locally, no API key required) and Copilot for Obsidian for a chat panel that talks to your local LM Studio or Ollama. The combination is the best second-brain workflow on the Mac and stays offline end-to-end.

For document chat with PDFs, spreadsheets, and the long tail of knowledge-worker files, the pick is AnythingLLM Desktop with a local model backend. Free, MCP-capable, workspace-based, and ships with thirty-plus provider plug-ins so you can fall back to a cloud model on the rare doc that exceeds your local context. The runner-up is GPT4All, whose LocalDocs feature is the simpler entry point if you want one binary, one button, and an answer.

Categories without a great local answer yet

Naming the gaps is how a buyer’s guide earns the picks. Four categories are not yet ready on a local Mac, and a list that pretended otherwise would burn the rest of the recommendations.

Voice and TTS. There is no MacWhisper-equivalent for text-to-speech. The serious work happens in libraries: mlx-audio runs Kokoro (an 82M-parameter Apache-licensed TTS model) on Apple Silicon, and Piper TTS is the ONNX option. Both produce excellent audio if you are willing to script. Neither has a polished consumer Mac app. Apple’s own AVSpeechSynthesizer is the system fallback, and Personal Voice clones a user-supplied voice locally on macOS, but that’s an OS feature, not a third-party app.

Real-time video generation. Wan 2.2 inside Draw Things is the best you can do on a Mac today, and it is closer to an overnight render than a Sora-style interactive experience. The consumer-grade local video category is two hardware generations away.

Browser-driving agents. The polished agents (Claude with computer use, OpenAI’s desktop agent, Perplexity) all run inference in the cloud. The local options are developer libraries: Agent Browser, browser-use, LaVague, each of which will accept an Ollama backend. The category will arrive; it is not here.

Frontier reasoning. A 128 GB M5 Max can run Llama-4 70B Q4 or Qwen3-class MoE models at acceptable speed, and they are excellent. They are not GPT-5 or Claude Opus on the hardest reasoning benchmarks. If your job depends on the absolute ceiling of AI reasoning, keep a cloud subscription for that one task and run everything else local.

How to choose, in four questions

Four questions · a quick decision tree

The question

If yes

If no

Do you want one app for everything, or one app per job?

LM Studio + Draw Things + MacWhisper covers 90% of the day

Pick the dedicated tool in each category

Do you ever ship work that depends on the answer?

Pay the App Store lifetime: MacWhisper, Aiko, or Draw Things

Free first: LM Studio, Ollama, Reor

Is your machine 16 GB or 32 GB+?

32 GB unlocks the 30B-class MoE tier and FLUX comfort

Stay on 7B-to-14B Q4 and SDXL: perfectly capable

Do you need IT-approved, App-Store-sandboxed apps?

Draw Things, MacWhisper, Aiko: all in the App Store

Notarized direct download unlocks LM Studio, Jan, Ollama

If you wanted one answer and only one answer, the default Mac stack in May 2026 is: LM Studio for chat, Draw Things for images, MacWhisper for transcription, Continue.dev for coding, Reor or Obsidian with Smart Connections for notes. Four are free. One is €59 once. Total cost to assemble: the price of a single year of a frontier cloud subscription, paid once, with everything running on your laptop. That is the news.

One disclosure, since it’s our own tool: CSuite is a native app for all three desktops, the Mac included, built on this same idea. It runs open models on your M-series machine and lets you bring your own keys for the cloud frontier, with every file staying on your disk. It ships a curated catalog rather than the whole open field above, so treat it as the polished front door to this workflow, not a replacement for the raw stack. More in the CSuite intro.

The picks above will move. We will refresh this list quarterly: new models, new pricing, new MLX adoption. The frame underneath the picks is the part to keep: name the rubric, run the airplane test, prefer one-time purchases over subscriptions, choose the app that treats your unified memory like a feature rather than a workaround. Serious software has always been local on the Mac. AI is the last category to remember that.

The best on-device AI apps for Mac (2026)

The rubric, before the picks

Chat: LM Studio if you tinker, Ollama plus Enchanted if you ship

Image: Draw Things, then a long way down

Transcription: MacWhisper is the floor

Coding with local models: Continue.dev or Zed

Notes and RAG: Reor, Obsidian, AnythingLLM

Categories without a great local answer yet

How to choose, in four questions

Sora vs Veo vs Kling in 2026: one shutdown, one successor, one survivor

ByteDance models with real examples: Seedream and Seedance

Most AI apps are wrappers, and you're paying the markup

One-time payment. Yours forever.