Run DeepSeek locally: the reasoning model in a laptop-sized package
It wiped $600B off Nvidia in a day and you still can't run it. But DeepSeek bottled that reasoning into a 9 GB download — here's which one your laptop handles.
- 404 GB download
- Needs a multi-GPU server
- Not happening on your machine
- 9 GB download
- Runs on a 16 GB laptop
- One command, fully offline
In January 2025 a Chinese lab most people had never heard of released a free AI model and wiped roughly $600 billion off Nvidia’s market value in a single day — the largest one-day loss for any company in U.S. history. The model was DeepSeek-R1. It reasoned like the best paid models, it was open, and it apparently cost a rounding error to train. Wall Street panicked.
Here is the part the headlines skipped: you cannot run that model. The full R1 is a 671-billion-parameter giant that needs a rack of GPUs. But DeepSeek did something quietly generous alongside it — it distilledR1’s reasoning into a set of small models that run on an ordinary laptop, all under the most permissive license in open AI. This post is the map: what DeepSeek actually ships, which piece your hardware can handle, and how to run it tonight.
The model that crashed the market is a 404 GB download
DeepSeek isn’t a startup in the usual sense. It grew out of High-Flyer, a Hangzhou hedge fund that had stockpiled GPUs for quantitative trading, and spun the model work into a sibling lab in 2023. When R1 landed on January 20, 2025, two claims traveled together: it matched OpenAI’s reasoning models, and DeepSeek said the reasoning-training run cost about $294,000 — a figure later detailed in a peer-reviewed Nature paper. Both claims rattled an industry that assumed frontier AI required billions.
Treat the cost story with care. That $294,000 covers only the final reinforcement-learning step — 512 Nvidia H800 chips for under 280 hours. It excludes the underlying V3 base model, which The Register pegs at roughly $5.58 million, bringing the honest total near $5.87 million. And the analyst firm SemiAnalysis argued the lab’s all-in hardware spend ran to $1.6 billion across 50,000 GPUs. The truth sits somewhere between “dirt cheap” and “normal but efficient.” Either way, the model was real, and it was free to download.
Which brings us to the catch. R1 is a mixture-of-experts model: 671 billion total parameters, of which about 37 billion fire on any given token. Efficient for its class — and still a 404 GB download that wants server-grade hardware. The model that made the news is not the model you install.

DeepSeek bottled its reasoning into models you can run
The real gift to anyone with a laptop is distillation. The idea is simple: let the giant R1 work through 800,000 hard problems, record every step of its reasoning, then train a much smaller model to imitate that reasoning. The small student never matches the teacher outright, but it inherits a surprising amount of the thinking style — the habit of working a problem step by step instead of blurting an answer.
DeepSeek released six of these distilled reasoners, built on top of existing open models from two other families. Four are based on Alibaba's Qwen(the 1.5B, 7B, 14B, and 32B), and two on Meta’s Llama (the 8B and 70B). The lab kept iterating, too: the mid-2025 R1-0528 refresh shipped an 8B distill built on the newer Qwen3, now the default you get from ollama run deepseek-r1.
These are not toys. On DeepSeek’s own model card, the distilled 32B beats OpenAI’s o1-mini on math and coding benchmarks — a model that fits a 32 GB machine outscoring a frontier product on the tasks reasoning models exist for. Benchmarks aren’t gospel, so test it on your own work before you trust it — but the gap with the paid tier is far smaller than the price gap suggests.
The R-series thinks out loud; the V-series just answers
DeepSeek runs two main model lines, and the difference matters when you pick one. The V-series — V3 and its successors — is the general-purpose workhorse: fast chat, coding, agent tasks. The R-series is the reasoning specialist. Ask R1 a hard question and it produces a visible chain of thought, a running monologue where it tries approaches, catches mistakes, and corrects course before committing to an answer.
That visible reasoning was R1’s headline novelty — the first widely available open model to show its work the way OpenAI’s o1 did behind a paywall. It comes at a cost: the model spends a lot of tokensthinking before it answers, so replies are slower and, on a hosted API, pricier per question. For a factual lookup that’s waste. For a genuinely hard math, logic, or planning problem, that deliberation is the whole point.
The distills inherit the R-series temperament. They think out loud too, wrapping their reasoning in <think> tags before the final answer. Match that to the job: reach for a distill when the problem is hard and you can wait a few seconds; reach for a plain chat model like Qwen or Llama when you just need a quick reply.

MIT is as open as a model license gets
License is where the open-model world hides its fine print, and here DeepSeek has the cleanest story of any major family. Both the weights and the code ship under the MIT License — the same license that covers countless everyday software libraries. You can run it commercially, fine-tune it, serve it to paying customers, and redistribute it, with no royalty and no revenue share.
Two clauses stand out. First, there is no user ceiling — unlike Llama’s Community License, which adds a 700-million-user limit and a “Built with Llama” attribution rule, and unlike Gemma’s usage-policy strings. Second, the license explicitlypermits distillation — using R1’s outputs to train other models. That permission is exactly why the six small reasoners can exist and why you can build on them freely. For a regulated team or a commercial product, MIT removes the lawyer’s veto that other “open” models invite.
Match the distill to your RAM, then run one command
The practical question is which distill fits your machine. The download size is a fair proxy for the memory you’ll need, and the rule from the broader local-models guide holds: pick the largest one that fits, then leave headroom for context. Reasoning models are especially hungry — all that thinking is tokens, and tokens are memory.
Install Ollama, run one line, and you have a private reasoner that works on a plane. On a typical 16 GB laptop, deepseek-r1:14b (9 GB) is the sweet spot — strong enough for real math and logic, small enough to leave room for your other apps. With 32 GB, step up to deepseek-r1:32b, the one that out-benchmarks o1-mini. The 1.5B and 7B exist for weak hardware, but their reasoning is noticeably shakier; treat them as a last resort, not a first choice.
Prefer not to run anything locally? The same models are dirt cheap as a hosted API. DeepSeek’s current flagship bills around $0.14 per million input tokens and $0.28 per million output, with a ~90% discount on cached input — a fraction of what the big U.S. labs charge. But the whole appeal of the distills is that you don’t need the API at all.

ollama run and a frontier-grade reasoner answers entirely on your machine — no key, no meter, no network. Photo by Mohammad Rahmani on Unsplash.Where DeepSeek loses
Three honest caveats. First, a distill is notthe full R1. It inherits the reasoning style, not the breadth of knowledge — ask the 14B an obscure factual question and it’ll stumble where the 671B would not. The distills are reasoning specialists, narrow and deep, not all-purpose assistants.
Second, DeepSeek is a one-trick family compared with its rivals. There is no DeepSeek vision model, no audio model, no embedding model to run locally — areas where Qwen ships a whole toolbox. DeepSeek does one thing, reasoning, better than almost anyone, and leaves the rest to others.
Third, the family moves fast and the lines blur. The V-series has absorbed a “thinking mode,” a long-rumored R2 hasn’t shipped as of mid-2026, and the newest V4 models target data centers, not laptops. Any specific size or score here is a snapshot — check the current Ollama library before you build something load-bearing on it.
So which one should you install?
If you want a private reasoner on the machine you already own, the answer is short. On a 16 GB laptop, run deepseek-r1:14b. On a 32 GB machine, run deepseek-r1:32b and enjoy beating a paid frontier model on math. Keep a plain chat model — Qwen or Llama — around for quick, non-reasoning tasks, and let DeepSeek handle the hard ones.
The larger point is the one the market missed in its panic. The headline-grabbing 671B model was never the gift to ordinary users — the distills were. Under an MIT license, DeepSeek handed anyone with a laptop a reasoning engine that rivals what others charge for, with no key to revoke and no subscription to cancel. The model that crashed the market is out of reach. Its brain, in a 9 GB package, is one command away.


