Local AIOpen SourceTutorialJune 26, 20269 min read

Run DeepSeek locally: the reasoning model in a laptop-sized package

It wiped $600B off Nvidia in a day and you still can't run it. But DeepSeek bottled that reasoning into a 9 GB download — here's which one your laptop handles.

By Atul

Same brain. Two very different downloads.

MIT licensed

You can’t run the model that crashed the market. You can run its reasoning on a laptop.

What broke Nvidia’s stock

DeepSeek-R1 · 671B

404 GB download
Needs a multi-GPU server
Not happening on your machine

What fits on your machine

deepseek-r1:14b

9 GB download
Runs on a 16 GB laptop
One command, fully offline

DeepSeek distilled the reasoning of its headline model into small ones you can actually install. This post is the map of which to grab.

In January 2025 a Chinese lab most people had never heard of released a free AI model and wiped roughly $600 billion off Nvidia’s market value in a single day — the largest one-day loss for any company in U.S. history. The model was DeepSeek-R1. It reasoned like the best paid models, it was open, and it apparently cost a rounding error to train. Wall Street panicked.

Here is the part the headlines skipped: you cannot run that model. The full R1 is a 671-billion-parameter giant that needs a rack of GPUs. But DeepSeek did something quietly generous alongside it — it distilledR1’s reasoning into a set of small models that run on an ordinary laptop, all under the most permissive license in open AI. This post is the map: what DeepSeek actually ships, which piece your hardware can handle, and how to run it tonight.

The model that crashed the market is a 404 GB download

DeepSeek isn’t a startup in the usual sense. It grew out of High-Flyer, a Hangzhou hedge fund that had stockpiled GPUs for quantitative trading, and spun the model work into a sibling lab in 2023. When R1 landed on January 20, 2025, two claims traveled together: it matched OpenAI’s reasoning models, and DeepSeek said the reasoning-training run cost about $294,000 — a figure later detailed in a peer-reviewed Nature paper. Both claims rattled an industry that assumed frontier AI required billions.

Treat the cost story with care. That $294,000 covers only the final reinforcement-learning step — 512 Nvidia H800 chips for under 280 hours. It excludes the underlying V3 base model, which The Register pegs at roughly $5.58 million, bringing the honest total near $5.87 million. And the analyst firm SemiAnalysis argued the lab’s all-in hardware spend ran to $1.6 billion across 50,000 GPUs. The truth sits somewhere between “dirt cheap” and “normal but efficient.” Either way, the model was real, and it was free to download.

Which brings us to the catch. R1 is a mixture-of-experts model: 671 billion total parameters, of which about 37 billion fire on any given token. Efficient for its class — and still a 404 GB download that wants server-grade hardware. The model that made the news is not the model you install.

Rows of server racks in a data center. — The full 671B R1 lives here, not on your desk. The useful trick was shrinking its reasoning to fit elsewhere. Photo by Eric Stoynov on Unsplash.

DeepSeek bottled its reasoning into models you can run

The real gift to anyone with a laptop is distillation. The idea is simple: let the giant R1 work through 800,000 hard problems, record every step of its reasoning, then train a much smaller model to imitate that reasoning. The small student never matches the teacher outright, but it inherits a surprising amount of the thinking style — the habit of working a problem step by step instead of blurting an answer.

DeepSeek released six of these distilled reasoners, built on top of existing open models from two other families. Four are based on Alibaba's Qwen(the 1.5B, 7B, 14B, and 32B), and two on Meta’s Llama (the 8B and 70B). The lab kept iterating, too: the mid-2025 R1-0528 refresh shipped an 8B distill built on the newer Qwen3, now the default you get from ollama run deepseek-r1.

These are not toys. On DeepSeek’s own model card, the distilled 32B beats OpenAI’s o1-mini on math and coding benchmarks — a model that fits a 32 GB machine outscoring a frontier product on the tasks reasoning models exist for. Benchmarks aren’t gospel, so test it on your own work before you trust it — but the gap with the paid tier is far smaller than the price gap suggests.

The 20 GB distill vs. a frontier mini model

R1-Distill-Qwen-32BOpenAI o1-mini

AIME 2024 (math)72.6 vs 63.6

MATH-50094.3 vs 90

GPQA Diamond62.1 vs 60

LiveCodeBench57.2 vs 53.8

Pass@1 scores from DeepSeek’s R1 model card. The distilled 32B — small enough for a 32 GB machine — edges out OpenAI’s o1-mini on every one of these.

The R-series thinks out loud; the V-series just answers

DeepSeek runs two main model lines, and the difference matters when you pick one. The V-series — V3 and its successors — is the general-purpose workhorse: fast chat, coding, agent tasks. The R-series is the reasoning specialist. Ask R1 a hard question and it produces a visible chain of thought, a running monologue where it tries approaches, catches mistakes, and corrects course before committing to an answer.

That visible reasoning was R1’s headline novelty — the first widely available open model to show its work the way OpenAI’s o1 did behind a paywall. It comes at a cost: the model spends a lot of tokensthinking before it answers, so replies are slower and, on a hosted API, pricier per question. For a factual lookup that’s waste. For a genuinely hard math, logic, or planning problem, that deliberation is the whole point.

The distills inherit the R-series temperament. They think out loud too, wrapping their reasoning in <think> tags before the final answer. Match that to the job: reach for a distill when the problem is hard and you can wait a few seconds; reach for a plain chat model like Qwen or Llama when you just need a quick reply.

A person deep in thought over a chess board. — A reasoning model is the one that stops to think before it moves. That deliberation is worth the wait on hard problems — and pure overhead on easy ones. Photo by JESHOOTS.COM on Unsplash.

Three lines, one family

Line

What it's for

Examples

Where it runs

V-series

General chat, coding, agents

V3 · V3.1 · V3.2 · V4 Flash / Pro

Server / hosted API

R-series

Reasoning, math, planning — thinks out loud

R1 · R1-0528

Server / hosted API

Distills

R-series reasoning, shrunk to run local

1.5B · 7B · 8B · 14B · 32B · 70B

Laptop / desktop GPU

The top two lines are big mixture-of-experts models built for data centers. The bottom line is the one this post is about — and the one most readers can actually run.

MIT is as open as a model license gets

License is where the open-model world hides its fine print, and here DeepSeek has the cleanest story of any major family. Both the weights and the code ship under the MIT License — the same license that covers countless everyday software libraries. You can run it commercially, fine-tune it, serve it to paying customers, and redistribute it, with no royalty and no revenue share.

Two clauses stand out. First, there is no user ceiling — unlike Llama’s Community License, which adds a 700-million-user limit and a “Built with Llama” attribution rule, and unlike Gemma’s usage-policy strings. Second, the license explicitlypermits distillation — using R1’s outputs to train other models. That permission is exactly why the six small reasoners can exist and why you can build on them freely. For a regulated team or a commercial product, MIT removes the lawyer’s veto that other “open” models invite.

Match the distill to your RAM, then run one command

The practical question is which distill fits your machine. The download size is a fair proxy for the memory you’ll need, and the rule from the broader local-models guide holds: pick the largest one that fits, then leave headroom for context. Reasoning models are especially hungry — all that thinking is tokens, and tokens are memory.

The distill ladder, by memory

RAM

Command

Size

What you get

8 GB

ollama run deepseek-r1:7b

4.7 GB

Entry reasoner — light math, logic, drafts

16 GB

ollama run deepseek-r1:14b

9 GB

The sweet spot — strong reasoning, fits most laptops

32 GB

ollama run deepseek-r1:32b

20 GB

Beats o1-mini on math benchmarks

48 GB+

ollama run deepseek-r1:70b

43 GB

The heavy local pick — needs real headroom

Sizes are the Ollama 4-bit builds. Leave a few gigabytes of headroom for context — reasoning models burn through it fast.

Install Ollama, run one line, and you have a private reasoner that works on a plane. On a typical 16 GB laptop, deepseek-r1:14b (9 GB) is the sweet spot — strong enough for real math and logic, small enough to leave room for your other apps. With 32 GB, step up to deepseek-r1:32b, the one that out-benchmarks o1-mini. The 1.5B and 7B exist for weak hardware, but their reasoning is noticeably shakier; treat them as a last resort, not a first choice.

Prefer not to run anything locally? The same models are dirt cheap as a hosted API. DeepSeek’s current flagship bills around $0.14 per million input tokens and $0.28 per million output, with a ~90% discount on cached input — a fraction of what the big U.S. labs charge. But the whole appeal of the distills is that you don’t need the API at all.

A laptop on a desk at night displaying code on screen. — One `ollama run` and a frontier-grade reasoner answers entirely on your machine — no key, no meter, no network. Photo by Mohammad Rahmani on Unsplash.

Where DeepSeek loses

Three honest caveats. First, a distill is notthe full R1. It inherits the reasoning style, not the breadth of knowledge — ask the 14B an obscure factual question and it’ll stumble where the 671B would not. The distills are reasoning specialists, narrow and deep, not all-purpose assistants.

Second, DeepSeek is a one-trick family compared with its rivals. There is no DeepSeek vision model, no audio model, no embedding model to run locally — areas where Qwen ships a whole toolbox. DeepSeek does one thing, reasoning, better than almost anyone, and leaves the rest to others.

Third, the family moves fast and the lines blur. The V-series has absorbed a “thinking mode,” a long-rumored R2 hasn’t shipped as of mid-2026, and the newest V4 models target data centers, not laptops. Any specific size or score here is a snapshot — check the current Ollama library before you build something load-bearing on it.

So which one should you install?

If you want a private reasoner on the machine you already own, the answer is short. On a 16 GB laptop, run deepseek-r1:14b. On a 32 GB machine, run deepseek-r1:32b and enjoy beating a paid frontier model on math. Keep a plain chat model — Qwen or Llama — around for quick, non-reasoning tasks, and let DeepSeek handle the hard ones.

The larger point is the one the market missed in its panic. The headline-grabbing 671B model was never the gift to ordinary users — the distills were. Under an MIT license, DeepSeek handed anyone with a laptop a reasoning engine that rivals what others charge for, with no key to revoke and no subscription to cancel. The model that crashed the market is out of reach. Its brain, in a 9 GB package, is one command away.

Run DeepSeek locally: the reasoning model in a laptop-sized package

The model that crashed the market is a 404 GB download

DeepSeek bottled its reasoning into models you can run

The R-series thinks out loud; the V-series just answers

MIT is as open as a model license gets

Match the distill to your RAM, then run one command

Where DeepSeek loses

So which one should you install?

Run Qwen locally: one open family for chat, code, vision, and audio

Meta Llama, explained: which model is for what, and how to run it

What is an embedding? How AI turns meaning into numbers

One-time payment. Yours forever.