Run Qwen locally: one open family for chat, code, vision, and audio
Most labs hand you one good model. Alibaba's Qwen hands you a toolbox — chat, code, vision, audio, search — and almost all of it is a free download.
Pick almost any other open-weight AI lab and you get one thing done well. Meta gives you a strong chat model. DeepSeek gives you a reasoner. Mistral gives you an efficient European workhorse. Alibaba’s Qwen gives you a chat model, a coding model, a vision model, an audio model, and an embedding model — all under one name, one prompt format, and one mostly-permissive license. It is less a model than a toolbox.
That breadth is the case for learning Qwen before any other open family: cover five jobs without mixing five vendors. The catch is that “Qwen” names dozens of models, and typing it into your runtime returns a wall of them. This post is the map — which Qwen variant does which job, how to run each on the machine you already own, the one license line to watch, and the two places the family honestly loses.
Qwen quietly became the most complete open family
Qwen started in 2023 as Alibaba Cloud’s in-house model and spent two years climbing. By 2026 it had become, by several measures, the center of gravity in open AI. The consumer Qwen app passed 234 million users by May 2026, and the open weights spawned over 200,000 community variants on Hugging Face — the largest derivative ecosystem of any non-Llama family.
Numbers aside, the reason to care is coverage. When Alibaba released Qwen3, the base family alone spanned six dense sizes — 0.6B, 1.7B, 4B, 8B, 14B, 32B — plus two mixture-of-experts models, and it spoke 119 languages and dialects. Then the specialists arrived: a coder, a vision model, an omni model that hears and speaks, and an embedding model for search. No other open lab ships that full a set under one roof. If you only have room in your head for one open family, this is the one that pays back the most.

One family covers five jobs other labs split up
Here is the whole point of Qwen in a single table. Each row is a different job; each variant is a member of the same family, so they share a prompt style and behave consistently. Cover these five with any other vendor and you’re gluing together four separate model lines.
Start with chat. The base Qwen3 models are the everyday workhorse — and they ship a clever trick: a hybrid “thinking” switch. Add /think to a prompt and the model reasons step by step before answering; add /no_think and it replies fast. One model, two speeds, no model-swap. The flagship 235B-A22B trades blows with DeepSeek-R1 and Gemini 2.5 Pro on Alibaba’s own benchmark tables — though see the case for writing your own eval before trusting any vendor’s scoreboard.
Code is where Qwen made its loudest claim. Qwen3-Coder was trained on 7.5 trillion tokens, 70% of them code, and its top 480B-A35B build set state-of-the-art results among open models on agentic coding — Alibaba puts it level with Claude Sonnet 4 on tool-use benchmarks. It handles 256K tokens of context natively, enough to hold a real repository in view. There’s also a 30B version that fits a laptop, which matters more for most readers than the giant. See the AI coding tool map for where a local coder sits next to Cursor and Claude Code.
Most of it is Apache 2.0 — and here’s the catch
License is where the open-model world hides its asterisks, so be precise. The open-weight Qwen models — the entire Qwen3 base family, plus Coder, VL, Omni, and Embedding — ship under Apache 2.0. That is the real thing: no 700-million-user ceiling like Llama’s Community License, no “Built with” attribution requirement, no acceptable-use contract bolted on. Download it, fine-tune it, ship it in a paid product, redistribute it — for free.
- Qwen3 dense (0.6B–32B) and MoE (30B / 235B)
- Qwen3-Coder, Qwen3-VL, Qwen3-Omni, Qwen3-Embedding
- Use commercially, fine-tune, self-host, redistribute
- No user cap, no attribution clause, no usage policy
- Qwen-Max / Qwen3-Max / the “-Plus” builds
- Hosted API only — weights are not released
- You rent it; you can’t download or self-host it
- Not what this post is about — skip it for local work
The catch is a tier you can’t run at all. Alibaba keeps its very top models — the Qwen-Max and “-Plus” flagships — as a hosted API, with the weights never released. They’re proprietary, and renting them is the opposite of the independence this post is about. The rule of thumb: if a Qwen model has a parameter count and a Hugging Face page, it’s yours to keep; if it’s only reachable through a hosted endpoint, it isn’t. For local work you simply ignore the Max tier — the open lineup already covers every job in that table. It’s the cleanest license story of any major family, with one clearly fenced exception.

Match the variant — and its size — to your machine
The wall of models is intimidating until you sort it by memory. Almost everything here installs with one Ollama command, and the download size is a fair proxy for the RAM you’ll need. The pattern is the same as the broader local-models field guide: pick the biggest model that fits, leave headroom for context.
On a normal 8–16 GB laptop, qwen3:8b is the daily driver — it answers offline forever after one download. Step up to 24–32 GB of memory and the qwen3:30b mixture-of-experts model is the standout: it carries 30B of weights but activates only about 3B per token, so it reasons like a big model at the speed of a small one. Pair it with qwen3-vl:8b for images and you have a private, multimodal assistant on a single Mac.
Two variants break the one-liner pattern, and it’s worth being honest about both. Qwen3-Coder’s 480B build is a 290 GB download — genuinely server hardware; reach for the 30B version on a laptop. And Qwen3-Omni, the audio model, isn’t a clean Ollama pull yet — you run it through Hugging Face Transformers or vLLM in Python. That’s the one rough edge in an otherwise frictionless family.

On vision: Qwen3-VL runs from a 1.9 GB 2B model up to a 143 GB giant, all with a 256K context window. It reads screenshots, converts design mockups to HTML, does OCR in 32 languages, and can follow video up to two hours long. On audio, Qwen3-Omni transcribes and converses across 19 spoken-input languages and replies in real-time speech, with quality Alibaba benchmarks against Gemini 2.5 Pro. On search, Qwen3-Embedding topped the MTEB multilingual leaderboard at launch and comes in a 639 MB 0.6B size — small enough to run alongside your main model for retrieval.
Where Qwen actually loses
Enthusiasm needs a counterweight, so here are the two honest gaps. The first is ecosystem depth. Llama is still the model every tutorial assumes, every fine-tuning script defaults to, and every cloud wires up first. When a deployment breaks at 2 a.m., the Llama answer is already on a forum; the Qwen one might not be. Qwen’s community is huge and growing, but Llama’s gravity — the reason it remains the safe default — is real, and Qwen hasn’t fully matched it.
The second is the reasoning crown. For the hardest chains of math and logic, DeepSeek’s R-series and its distilled small reasoners are often the sharper open pick, and they ship under an even cleaner MIT license. Qwen3’s thinking mode is strong and far more convenient — one switch, no second model — but “strong and convenient” isn’t “best at the frontier of reasoning.” If a single hard task is your whole job, benchmark Qwen against a dedicated reasoner before committing.
One more thing to set expectations: the family moves fast. Qwen3 gave way to a 3.5 line in early 2026 and a 3.6 line that spring, each still Apache 2.0. That cadence is mostly a gift — your ollama run qwen3 habit keeps working as the weights underneath improve — but it means any specific size or score in this post is a snapshot. Check the current roster before you build something load-bearing on it.
So should you standardize on Qwen?
For most people building with open models in 2026, yes — make Qwen the family you learn first. The logic is simple: one name covers chat, code, vision, audio, and search; almost all of it is genuinely Apache-licensed; and four of those five jobs are a single install command. That combination of breadth and permissiveness is unmatched, and it spares you the tax of learning four vendors’ quirks.
Concretely: put qwen3:8b on your laptop today, step up to the qwen3:30b MoE if you have the memory, add qwen3-vl when you need to read images and qwen3-embedding when you build search. Keep DeepSeek bookmarked for the hardest reasoning and Llama in mind for when ecosystem depth matters more than raw quality. Then notice what you’ve actually done: assembled a private, offline, multi-skill AI stack you own outright — the kind of independence the “AI products are mortal” argument keeps insisting you’ll be glad to have. Qwen just makes it the easy choice instead of the principled one.


