The best on-device AI apps for Windows (2026)
Your PC hides an idle NPU or a gaming GPU. Which one you have decides your best local AI app. The picks, split by silicon.
Open Task Manager on a laptop bought in the last year, click the Performance tab, and look for a row that did not exist three years ago: NPU. On a gaming tower, the interesting hardware is the graphics card, with 8 to 24 GB of memory that spends most of its life idle between frames. Both chips can run capable AI models without touching the internet. Almost nobody uses them for it.
That is the gap this post is aimed at. There is a good, growing set of Windows apps that run AI on your own machine: private by default, free or close to it, and available on a plane or in a hospital with the Wi-Fi off. The catch is that Windows is not one machine. Its local-AI story splits three ways, and the app that is perfect on one setup is useless on another.
This is the Windows companion to the Mac list. Same job-by-job format, but tuned to a hardware reality the Mac simply does not have: a thin Copilot+ laptop with a neural chip, an NVIDIA tower where CUDA rules, a pile of AMD and Intel and CPU-only machines in between, and a quiet x86-versus-ARM split that decides whether an app will even install. Every pick below names the hardware it assumes.

Your hardware picks the app before you do
On a Mac, the only spec that matters for local AI is how much unified memory you bought. On Windows, you have to answer a harder question first: what is doing the math? There are four possible answers, and they rank cleanly by how much of the ecosystem targets them.
The NVIDIA RTX GPUis the throne. CUDA has been the default target for AI code for a decade, so nearly every local tool runs best, or only, on NVIDIA. If your machine has an RTX card with 8 GB+ of VRAM, you have the fastest local-AI setup Windows offers, and the widest choice of apps.
The NPUis the newcomer. Microsoft’s Copilot+ PC badge requires a neural chip that hits 40 trillion operations per second, plus 16 GB of RAM. Three silicon lines clear that bar: Qualcomm’s Snapdragon X Elite at 45 TOPS, Intel’s Core Ultra 200V at 48, and AMD’s Ryzen AI 300 with an XDNA 2 engine at 50 TOPS. The NPU is superb at sipping power through the AI features baked into Windows. It is not yet where you run your own chat model: almost no third-party app targets it directly.
Then come AMD Radeon GPUs (compute through ROCm, Vulkan, or DirectML, depending on the card and the app) and, at the floor, CPU-only machines, where the llama.cpp engine inside every local app still gives you a usable 7B model. Here is how each chip actually gets used.
One more Windows-only wrinkle: the x86-versus-ARM64 split. A Snapdragon Copilot+ laptop runs a different instruction set than a normal PC. Windows papers over it with an emulator called Prism, which is genuinely good now and even exposes modern instructions like AVX2 to emulated apps. But emulation costs battery and speed, and it cannot load the kernel-mode drivers some GPU tools need. If you own a Snapdragon machine, an ARM64-native build always beats an emulated x64 one. Watch for it.
Chat: LM Studio runs on everything, and that wins
The everyday chat assistant is the category most people want first, and the pick is easy because one app covers every Windows machine. LM Studio ships x64 and ARM64 builds, runs on Snapdragon, Intel, AMD, and NVIDIA alike, and dropped its commercial-use restriction in July 2025, so it is free at work too. It handles model discovery, chat, and an OpenAI-compatible local server in one window. On an RTX card it uses the GPU; on other hardware it falls back to Vulkan or the CPU. A 7B model at 4-bit idles around 5 GB of memory; a 30B mixture-of-experts model wants closer to 24 GB. Start there.
The runner-up is Ollama’s native Windows app, which now runs as a real desktop program with NVIDIA and AMD Radeon support built in. Its edge is the local server on port 11434 that every other tool in this post can point at. Pair it with Jan or Open WebUI for a friendlier chat window, or with NVIDIA’s ChatRTX if you have an RTX card and want a TensorRT-accelerated demo to chat with your own documents.
The honest caveat is for Snapdragon owners. LM Studio installs and runs, but its GPU and NPU acceleration on those chips is still a work in progress: for now you are mostly running on the ARM CPU cores, which is fine for a small model and slow for a large one. The NPU that makes your laptop a Copilot+ PC is barely used by any chat app yet. That is the single biggest gap in the Windows local-AI story, and it is worth knowing before you buy a machine for this.
Image generation: your GPU brand decides the winner
This is the category where the answer changes most with your hardware. On NVIDIA, the serious tool is ComfyUI, a node-based canvas that runs SDXL, FLUX, and video models on CUDA with more control than anything else. If a graph is too much, the friendliest front end is Krita AI Diffusion, which drives a ComfyUI backend from inside a real painting app and wants a card with 6 GB+ of VRAM. Both are free.
On AMD, the standout is Amuse, a free .NET app built with AMD that runs Stable Diffusion through DirectML with no Python setup at all. It leans on your Radeon GPU (the top FLUX model wants a 24 GB card like the 7900 XTX) and is the least painful way for an AMD owner to make images locally. ComfyUI works on Radeon too, through DirectML or a ZLUDA shim, but expect setup and a speed penalty versus the same card’s CUDA-equipped rival.
Be honest about the gap: for the same dollar spent on a GPU, the CUDA path is faster and better supported than the DirectML one, and most new techniques land on NVIDIA first. AMD image generation is real and improving, not yet effortless. Real-time video generation on a local Windows box remains a render, not a live experience, on any brand.

Transcription: the one job solved on every PC
If your NPU makes you nervous and your GPU is modest, here is the category that just works. OpenAI’s Whisper models are small, open, and accurate enough for interviews, lectures, and legal audio, and they run happily on a plain CPU. There is no reason to upload a recording to anyone.
The pick is Buzz, a free open-source app that runs Whisper locally with three engines to choose from, live microphone capture, and export to TXT, SRT, or VTT. Nothing leaves your machine unless you export it. Vibe is the polished alternative: 90-plus languages, a clean interface, and recent Vulkan GPU support that makes even large models run near real-time on a modest laptop.
Copilot+ owners get a bonus that needs no install. Windows’s Live Captions translate audio from 40-plus languages into English captions in any app, on-device and even offline. It is one of the few Windows AI features that runs entirely local and genuinely earns the NPU.
Coding and notes: aim your tools at a local server
Two of the most valuable local-AI jobs share one trick: point a familiar app at the Ollama or LM Studio server already running on your PC, and no keystroke leaves the machine.
For coding, install Continue in VS Code for inline autocomplete and chat, or Cline for an agent that edits files and runs commands with your approval. Both detect your local models automatically; a 7B coder model such as Qwen2.5-Coder is the sweet spot on most hardware. The power move for Windows: run the server on the RTX desktop in the closet, then code from a thin laptop by pointing the extension at that box over your LAN. One GPU, every device.
For notes and documents, the private-first picks are AnythingLLM, which chats across PDFs, Word files, and whole codebases with a local-model backend, and GPT4All, whose LocalDocs feature is the simplest way to ask questions of a folder of files. Both build their search index on your disk. For a heavier corpus, these are the tools that keep your most sensitive material (client work, contracts, medical notes) off every server but your own.

What Windows ships (and where “on-device” bends)
Some of the best local AI on Windows is already installed. Microsoft put real on-device models into the operating system, and on a Copilot+ PC a few of them are excellent. A few others quietly reach for the cloud, and the marketing does not always make the difference obvious.
The genuinely local ones: Live Captions, covered above; the Click to Do screen actions that run on the NPU; and Windows Recall, the searchable timeline of your screen. Recall drew heavy fire at launch and Microsoft rebuilt it: it is now opt-in, its snapshot database is encrypted in a secure enclave, and it unlocks only with Windows Hello. It is local, but filtering of passwords and card numbers is still imperfect, so treat it as powerful and slightly leaky.
The one that bends the label: Cocreator in Paint, the text-to-image feature, is marketed as an on-device NPU experience, yet it requires a Microsoft account and an internet connection for content moderation in the cloud. Useful, but not the private, offline tool the framing implies. If you want a fully local image generator, the picks above still stand.
For the technically curious, Microsoft’s Windows AI Foundry (announced at Build 2025) is the plumbing underneath all of this: awinget-installable runtime called Foundry Local, an inbox small model named Phi Silica you can even fine-tune with LoRA, and Windows ML as the layer that steers a model to the NPU, GPU, or CPU without the app having to care. It is a developer story today, but it is why the built-in features work across four vendors’ chips.
Where it’s not ready, and how to choose
Three gaps keep this from being a solved problem. Frontier reasoning still belongs to the cloud; a local 30B model is excellent but not GPT-5. The NPU, for all its battery magic, is not yet a place you run your own chat model, so a Copilot+ laptop with no discrete GPU is the weakest machine here for heavy local AI. And Snapdragon owners face the thinnest app catalog, since some tools ship x64-only and lean on emulation. None of that is permanent, but all of it is true today.
With that said, the choice comes down to your silicon. Answer four questions and the picks fall out.
The short version: if you have an RTX card, ComfyUI and Ollama make your machine the best local-AI box in this guide. If you have a Copilot+ laptop, LM Studio for chat and Buzz for transcription cover the daily work, and the built-in features earn the NPU while the third-party world catches up. On AMD or a plain CPU, Amuse and LM Studio still get you there with a little more homework. Most of the list is free; the rest costs less than a single year of a cloud subscription.
The frame outlasts the picks, which will move as fast as the hardware: name the accelerator you actually have, prefer the app with a native build for it, and verify that anything sold as “private” stays offline before you trust it. If you want to go deeper on which weights to download, the roundup of open models worth running is the next stop, and the case for personal compute explains why this whole category is moving back onto machines you own.


