Local AIOpen SourceTutorialJune 30, 20269 min read

Run OLMo locally: the only AI model that's open all the way down

Every 'open' AI model hands you a sealed engine. OLMo hands you the factory: the data, the code, the recipe, the weights.

By Atul

What each “open” family actually hands you

Apache 2.0

Everyone ships the weights. One family ships everything else.

Family

Weights

Training data

Training code

Recipe

OLMo 3

✓

Llama

✓

—

Qwen

✓

—

Gemma

✓

—

Mistral

✓

—

Most “open” models open one column. OLMo 3 opens all four: the weights, the 9-trillion-token dataset, the training code, and the recipe that turns one into the other. It is the only all-violet row.

Almost every AI model that calls itself “open” hands you one thing: a file of weights. That file is the finished engine. It runs, but it tells you nothing about the fuel it burned or the factory that built it. You cannot see what it was trained on, you cannot rebuild it, and you cannot check why it answers the way it does. You get to drive. You do not get to look under the hood.

OLMo, from the non-profit Allen Institute for AI (Ai2), is the family that hands you the whole factory. Not just the weights, but the 9-trillion-token dataset they were trained on, the code that did the training, the recipe at every step, and a checkpoint saved at each milestone along the way. It is the answer to a quiet objection that has followed open-weight AI for years: that “open weights” was never really open source. This is the map of that family: what makes it different, what it costs you in raw capability, and the one command that runs it.

“Open weights” isn’t open source

Start with a distinction most coverage skips. When Meta, Alibaba, or Google release a model you can download, they release the weights: the billions of numbers the model learned. That is genuinely useful. You can run it offline, fine-tune it, and ship it. But the weights are the output of a process, and the process stays hidden. What text went in? In what proportions? What was filtered out, and by what rule? You are not told.

The Open Source Initiative, the body that has defined “open source” for software since 1998, spells out the gap. Real open source grants four freedoms: to use, study, modify, and share. Open weights, the OSI argues, cover only two of them. You can use the model and share it, but you cannot truly study or modify it, because the training code and data that would let you are missing. A car you can drive but never open is not a car you understand.

The hidden column is the training data, and its absence hides real questions. Was a benchmark accidentally included in the training set, inflating the scores? Is there copyrighted or private text in there? Why does the model believe a particular wrong thing? With a sealed model you cannot check any of it. You are trusting a vendor’s summary of a process you are not allowed to see.

This is not pedantry. The Llama license caps you at 700 million users and never reveals its data. Parts of Mistral’s catalog are research-only. Across the field, “open” has quietly come to mean “you can download it,” which is a far smaller promise than the word implies. OLMo exists to make the full promise.

An open notebook filled with handwritten working, evoking a model that shows its full method. — Open weights show the answer. OLMo shows the working. Photo by Kelly Sikkema on Unsplash.

OLMo opens what every other family seals

Walk the columns of the table at the top, one at a time, and you see what “fully open” actually means in practice. The weights are public, like everyone else’s. The training datais not a vague description but a real download: Dolma 3, a corpus of roughly 9.3 trillion tokens of web pages, books, code, and academic papers, released, in Ai2’s words, “without any license restrictions.” The first Dolma release was already 3 trillion tokens; the third generation more than tripled it.

The training code is open too, and not as a token gesture. Ai2 ships the actual machinery: a distributed training framework (Olmo-core), the post-training pipeline (Open Instruct), the evaluation harness (OLMES), and data tools for cleaning and deduplication. The recipe is the technical report and training logs that document every decision. And then the part nobody else offers: intermediate checkpoints saved at each stage, so you can fork the model not just at the end but partway through its education.

Ai2 has a name for this: the “model flow”, the full lifecycle of a model rather than its frozen end state. Instead of one set of final weights, OLMo 3 gives you every checkpoint, dataset, and dependency needed to recreate or redirect it. It is the difference between being handed a finished cake and being handed the recipe, the ingredients, and photos of the kitchen at every step.

A long row of books in a library, standing in for an openly published training corpus. — Dolma 3 is the training corpus published in full: roughly 9.3 trillion tokens you can actually read. Photo by Zetong Li on Unsplash.

The OLMo 3 lineup: same family, four jobs

Model

Sizes

What it's for

OLMo 3-Base

7B / 32B

The raw pretrained model: a clean slate to fine-tune or study

OLMo 3-Instruct

7B / 32B

Everyday chat assistant: follows instructions, answers questions

OLMo 3-Think

7B / 32B

Shows its working: long chain-of-thought for math, code, logic

OLMo 3-RL Zero

Research kit: reinforcement learning straight from the base model

Every variant ships at 7B and 32B (RL Zero at 7B), all under Apache 2.0, all with a 64K-token context window. Sizes and roster from the OLMo 3 release and the Ollama library.

OLMo 3 caught up enough that openness is the deciding factor

For a long time the honest knock on OLMo was that full transparency cost too much capability to be worth it. You could inspect everything, but the model trailed the open-weight leaders by a wide margin. The version released in November 2025, OLMo 3, is where that excuse runs thin.

OLMo 3-Think 32B is the first fully open model of its size to reason in explicit, visible chains of thought, the same step-by-step style behind DeepSeek-R1 and the frontier reasoning models. On a competition-math benchmark it scores 96.1%, and on a code-generation test 91.4%. According to an independent analysis by researcher Nathan Lambert, the Think models land within one or two points of Alibaba’s Qwen3 at the same sizes. That is a remarkable place to be for a model that hides nothing.

OLMo 3-Think 32B, scored on its own merits (% correct)

MATH

96.1

HumanEvalPlus

91.4

IFEval

AIME 2024

76.8

Scores from the OLMo 3 technical release. The reasoning model lands within one to two points of Qwen3 32B overall, per an independent analysis by Nathan Lambert — close enough that, for most work, the thing that sets it apart is no longer the benchmark.

The base model tells the same story before any reasoning tricks. OLMo 3-Base 32B is, by Ai2’s evaluations, the strongest fully open base model available, scoring 66.5% on the HumanEval coding test and 80.5% on grade-school math, comfortably ahead of earlier open efforts and within range of the open-weight leaders. The prior generation, OLMo 2 32B, was already the first fully open model to beat OpenAI’s GPT-3.5-Turbo and GPT-4o mini on a suite of academic benchmarks. Each release has narrowed the distance.

Two more numbers matter. OLMo 3 was trained on up to 1,024 H100 GPUs, the kind of run that used to be a trade secret, and its context window jumped to 64K tokens, sixteen times larger than OLMo 2’s. The point is not that OLMo dethrones the leaderboard. It is that the gap has closed far enough that, for most everyday work, you no longer trade away much by choosing the transparent option. The benchmark stops being the reason to skip it.

What full openness actually buys you

Transparency sounds like a virtue for its own sake. It is not. It cashes out in concrete things you can do that an open-weights model will not let you.

You can audit it. If a model refuses a question, leans a certain way, or repeats a falsehood, OLMo lets you trace the behavior back toward the data that produced it; Ai2 ships a tool, OlmoTrace, built for exactly that. With a sealed model you can only guess. For anyone in a regulated field, the ability to show why a system answered as it did is not a nicety, it is the job.

You can reproduce it. A result you cannot rebuild is a claim, not a finding. Because the data, code, and checkpoints are public, a university lab can retrain OLMo from scratch, change one ingredient, and measure what moved. Ai2 even ships the unglamorous plumbing that makes the claims trustworthy: a deduplication tool and a decontamination tool, decon, that strips test questions out of the training set so the scores are not quietly inflated. That is ordinary science, and until OLMo it was nearly impossible to do on a modern language model.

You can learn from it. OLMo is the only major family where a student can read the whole pipeline, from raw corpus to finished chatbot, and understand how a real model is built. The checkpoints turn it into a time-lapse of an education. No other family lets you watch the model learn.

Rows of labelled archive boxes on shelves, evoking saved checkpoints and an auditable record. — Every training milestone is saved and downloadable, an auditable record instead of a single sealed file. Photo by Luke Caunt on Unsplash.

There is a trust angle too. When a model and its data are both public, you are not taking a vendor’s word for what went in. That matters in a world where models confidently invent things, and where the provenance of an answer can decide whether you can use it at all.

Pick by your RAM, then run one command

Running OLMo is no harder than running any other local model. The fastest route is Ollama, the same one-command runner the rest of the local-models guide uses. Install it, then match the model to the memory you have. Download size is a fair proxy for the RAM you will need.

Match the model to the memory you have

RAM

Command

Size

What you get

8 GB

ollama run olmo-3:7b

4.5 GB

The everyday pick: chat and reasoning on a modest laptop

16 GB

ollama run olmo-3:7b-think

4.5 GB

Same size, but it thinks step by step before answering

32 GB+

ollama run olmo-3:32b

19 GB

The flagship: frontier-class fully open reasoning

Download sizes are Ollama’s 4-bit builds. The 32B wants a machine with comfortably more than 19 GB free; when in doubt, drop to the 7B and leave headroom for context.

For most people on a normal laptop, olmo-3:7b is the answer: a 4.5 GB download that chats and reasons comfortably on 8 GB of memory. Want it to think out loud through a hard problem? The think variant does step-by-step reasoning at the same size. If you have a workstation with 32 GB or more, olmo-3:32bis the fully open flagship. Prefer a graphical app? LM Studio pulls the same models, and on a Mac, Apple’s MLX runs them fastest. Every one is Apache 2.0, so whatever you build on top is yours to keep, sell, and ship, with no user ceiling and no fine print.

The family has MoE and multimodal cousins

OLMo is the text core, but the same open-everything philosophy runs through Ai2’s other lines. OLMoE is the mixture-of-experts entry: 7 billion total parameters but only 1 billion active at a time, which makes it fast and light while keeping the full open-data, open-code, open-checkpoint treatment. It is the on-device option when you want speed without giving up transparency.

Molmois the multimodal side, models that see as well as read. The December 2025 release, Molmo 2, ships at 4B and 8B, with one variant built on the open OLMo backbone, and its 8B model reportedly beats Google’s Gemini 3 on certain video-tracking tasks. All of it is Apache 2.0 with the data and code released alongside. If you have read the other family explainers and wondered whether anything in AI is open the way open-source software is, this is the corner of the field where the answer is yes, across text, sparse models, and vision alike.

Where it loses, and why you’d still run it

Be honest about the ceiling. OLMo is not trying to be the single best model in the world, and it is not. For the hardest frontier problems, the broadest world knowledge, or raw benchmark supremacy, a closed flagship or a top open-weight model like Qwen will still edge ahead. If your only question is “which model scores highest,” OLMo is not your answer, and Ai2 does not pretend otherwise.

But that is the wrong question for a growing set of people. If you need to audit a model, reproduce a result, teach how one works, or stand behind an answer in a regulated setting, OLMo is the only family that makes those things possible, and it is now good enough that you give up little to get them. Install olmo-3:7b if your laptop is modest, olmo-3:32b if you have the memory, and olmo-3:7b-think if you want it to reason out loud. The other families hand you a sealed engine. OLMo hands you the factory, and these days the factory runs nearly as fast.

Run OLMo locally: the only AI model that's open all the way down

“Open weights” isn’t open source

OLMo opens what every other family seals

OLMo 3 caught up enough that openness is the deciding factor

What full openness actually buys you

Pick by your RAM, then run one command

The family has MoE and multimodal cousins

Where it loses, and why you’d still run it

Run Phi locally: Microsoft's small models that beat far bigger ones

Government can switch off a frontier AI model. In June, one did, twice.

Run Mistral locally: Europe's open family, and the license lines you can't cross

One-time payment. Yours forever.