TutorialLocal AIOpen SourceJuly 4, 20269 min read

Run Poolside Laguna locally: a coding model your source never leaves

It scores 70.9% on a hard coding benchmark from a 20 GB download that runs on your laptop, so your source code never leaves the building.

By Atul

Poolside Laguna · code-native, local

Open weights

A coding model that scores like the cloud, and never leaves the building.

Your machine · offline OK

Laguna XS 2.1

70.9% SWE-bench Verified20 GB downloadruns on a Mac

code stays

Cloud coding API

Every keystroke ships to someone else’s servers. For a lot of teams, that’s not allowed.

Poolside released Laguna as open weights. The small one downloads in 20 GB and answers on a laptop, with your source never crossing the network.

Picture a developer at a bank. She has a bug to fix, and the smartest tool for the job is an AI coding assistant that reads her whole repository and proposes the patch. She can’t use it. The moment her source code leaves the network, it becomes a compliance problem: customer logic, security controls, and internal systems, all pasted into a server her employer doesn’t control. So she does it the slow way, by hand, while everyone at a startup down the road codes twice as fast.

That trade, speed versus control, was the price of admission for AI coding. Then Poolside, a foundation-model lab, released Laguna as open weights. The small model in the family scores 70.9% on SWE-bench Verified, a hard benchmark of real GitHub bug fixes, and downloads in 20 GB. It runs on a MacBook with 36 GB of memory, offline, with nothing sent anywhere. For the first time, the developer at the bank has an answer that her legal team can also sign off on. This is the lineup, how it was built, and how to run it.

A cloud assistant is a non-starter when your code can’t leave

Most AI coding tools work by shipping your code to a data center. Cursor, GitHub Copilot, the cloud version of every popular assistant: they read your files by uploading them, run the model on rented GPUs, and stream the answer back. For a solo developer on a side project, that’s fine. For a hospital, a defense contractor, or a trading desk, it’s a line they are not allowed to cross. Their source code is regulated intellectual property, and “send it to a third-party API” is the exact sentence their security review exists to catch.

This is not a hypothetical hang-up. It is the same wall that has been quietly closing off cloud AI for regulated work, only sharper, because source code is the crown jewels. A leaked customer record is a bad day. A leaked codebase is your competitor’s head start and your auditor’s nightmare at once. The structural problem is that a cloud-only assistant can never satisfy a “nothing leaves the building” policy, no matter how good its privacy promises sound. The data still moves.

A local model removes the question entirely. If the weights live on your machine and the inference happens on your CPU or GPU, there is no outbound request to audit, because there is no outbound request. The code the model reads and the patch it writes never touch a network. That is the whole reason a team in a locked-down shop would pick a local coding model over a more capable cloud one: not because local is trendy, but because it is the only option that is allowed.

Privacy is the headline, but running the model yourself buys three more things a subscription can’t. It works with the network cable unplugged, which is the actual state of an air-gapped defense box or a classified environment. It has no per-token bill, so a developer who runs the model all day costs the same as one who runs it twice, and no finance team has to model “what if everyone starts using it.” And it has no rate limit and no vendor on the other end who can change the price, deprecate the model, or read the prompts. For a bank, a defense contractor, or a hospital, those are not luxuries. They are the checkboxes that decide whether a tool ever gets approved.

A rack of servers in a private server room, standing in for on-premise infrastructure. — For a regulated team, the point isn’t the rack. It’s that the code stays inside the walls they already control. Photo by Kevin Ache on Unsplash.

Laguna was trained on whether the code runs

Poolside was founded in 2023 by Jason Warner, the former CTO of GitHub, and Eiso Kant. The pitch was narrow on purpose: build a foundation model for software engineering, not a general chatbot that also codes. In October 2025, Nvidia said it would invest up to $1 billion in the company, valuing it around $12 billion, per Tech Startups. The lab’s calling card is how it trains.

Most models learn to code by reading code. Laguna also learns by running it. Poolside describes a reinforcement-learning system where the training loop spins up sandboxed containers, drops its own coding agent into each one, and sets it loose on a real task: read the files, run the tests, edit a patch, run the tests again. The model is scored on the outcome, did the suite pass, not on whether the code merely looks plausible. Those graded attempts, called trajectories, feed back into training over days of continuous runs, using an approach Poolside details in its technical write-up.

This is where being a code-only lab pays off. A general model that also writes code has to spend its training budget on poetry, trivia, and small talk it will rarely use in an editor. Laguna spends all of it on the one job, which is how a 33-billion-parameter model keeps pace with coders many times its size. The trade is that Laguna is not a chatbot; ask it for a dinner recipe and a bigger general model will do better. Point it at a failing test suite and the specialization shows.

That is worth pausing on, because it explains the scores later. SWE-bench Verified, the benchmark Laguna is measured against, is a set of real bugs from open-source projects; a model passes an item only when its patch makes the project’s own test suite go green. Terminal-Bench, another number Poolside reports, checks whether a model can drive a command line to finish a task. Both reward doing over describing, which is exactly what execution-feedback training practices.

The difference matters for the kind of work developers actually hand an assistant. A model trained on execution feedback has, in effect, practiced the loop of trying a fix and checking it, which is most of what “agentic” coding is. It fits neatly into the agent-shaped tools taking over the category, where the model isn’t just autocompleting a line but planning a multi-step change across a repository. Laguna carries a 256K-token context window, roughly a small codebase at once, so it has room to read before it acts.

Source code displayed on a dark screen, evoking a model trained on running and testing code. — Laguna’s edge is that it was graded on tests passing, not on how the code reads. Photo by ANOOF C on Unsplash.

The family is a laptop model and a server model

Laguna comes in two weights, and the split maps cleanly onto who runs it. The XS line is the local one. Laguna XS 2.1, released July 2, 2026, is a 33-billion-parameter Mixture-of-Experts model with only 3 billion parameters active per token. That architecture is the trick: the model holds a lot of knowledge, but fires a small slice of it each step, so it runs at the speed and memory cost of a much smaller model. Poolside says it fits on a Mac with 36 GB of RAM. Its April predecessor, XS.2, is the same shape.

Laguna M.1 is the heavy one: 225 billion parameters, 23 billion active, trained in-house on 30 trillion tokens across 6,144 Nvidia H200 GPUs. It is not a laptop model. It wants a multi-GPU server and a serving stack like vLLM or SGLang. The intended use is a shared endpoint, one machine in the building that the whole engineering team points its tools at, with the code still never leaving the network.

Two sizes, both agentic, both open

Model

Params

Ctx

Min hardware

What it's for

Laguna XS 2.1

33B / 3B active

256K

Mac, 36 GB RAM

The local default: agentic coding on your own laptop

Laguna XS.2

33B / 3B active

256K

Mac, 36 GB RAM

The April predecessor, Apache 2.0 licensed

Laguna M.1

225B / 23B active

256K

Multi-GPU server

The heavy model: a shared endpoint for a whole team

Parameters, context, and hardware notes from the Poolside blog and the Hugging Face model cards. Both are Mixture-of-Experts models: the second number is how many parameters actually fire per token, which is why a 33B model runs on a laptop.

For most readers the decision is simple. If you want a coding model on your own machine, XS 2.1 is the one you install. If you are standing up a private coding server for a team, M.1 is the model that justifies the hardware. Both speak the same agentic dialect and carry the same long context; the only real variable is how much silicon you can point at it.

The scores hold up, and the footprint is the story

Benchmarks first, honestly. On SWE-bench Verified, XS 2.1 scores 70.9% and the big M.1 reaches 74.6%. Those are strong numbers for open weights, but they are not the top of the table. The largest open coders, DeepSeek V4-Pro, Qwen, GLM, land around 80%, and the best closed frontier models sit higher still. If you rank purely on the benchmark, Laguna is very good, not best in class. The queue of leaders is real and it is worth being clear about it.

The number that reframes the comparison is what it costs to run each one. DeepSeek V4-Pro’s 80.6% comes from a 1.6-trillion-parameter model that only exists on a cloud cluster. Qwen3-Coder-Next hits 71.3%, a hair above Laguna, but wants roughly 46 GB of GPU memory, meaning a server-class card most people don’t own. Laguna XS 2.1 posts nearly the same score from a 20 GB file on a laptop you already have. That is the argument: not that it wins the benchmark, but that it comes within a rounding error of models several times its size while fitting where your code is allowed to stay.

SWE-bench Verified · % of real GitHub issues resolved

DeepSeek V4-Pro

80.6

1.6T params · cloud clusterdatacenter

Devstral 2

72.2

123B · ~62 GB VRAMbig GPU

Qwen3-Coder-Next

71.3

80B · ~46 GB VRAMbig GPU

Laguna XS 2.1

70.9

33B · 20 GB · runs on a Maclaptop

Devstral Small 2

24B · ~12 GBlaptop

Laguna XS 2.1 lands within a couple of points of models that need a server-class GPU, from a file a fifth their size. The only other laptop-class coder near it, Devstral Small, is a point and a half back. Scores from the Poolside model card and a 2026 self-hosting roundup.

Be fair to the alternatives, because this is a real field. Qwen’s coder lineand Mistral’s Devstral are excellent open models, and Devstral Small in particular is another genuinely laptop-class coder. Laguna’s distinct pitch is the pairing: a model built specifically for agentic, run-the-tests coding, tuned to sit small enough for local hardware, with a bigger sibling for teams that need more. If your priority is the single highest benchmark score and you have a datacenter, look at the giants. If your priority is a capable coder that runs where the network can’t reach, Laguna is aimed straight at you. The broader map of what runs where lives in the local-models roundup.

The big model earns its keep on the same axis. M.1’s 74.6% on SWE-bench Verified and 45.8% on Terminal-Bench put it among the stronger open coders, and unlike the trillion-parameter leaders it is small enough for a single well-equipped server rather than a rented cluster. If you want to size the models before buying hardware, Poolside runs a hosted API at $0.10 per million input tokens and $0.20 per million output, cheap enough to trial both against your own tasks, then move the winner in-house once you know it earns the machine.

Close-up of server hardware components, standing in for a private on-premise coding server. — M.1 is the model you put on hardware like this: one private box, a shared endpoint, no code leaving the room. Photo by Đào Hiếu on Unsplash.

Running it takes one command

The fastest path is Ollama, the same one-command runner the rest of the local-model guides use. Install it, then pull Laguna. The default 4-bit build is a 20 GB download; higher-precision tags trade disk and memory for sharper answers.

Pick the precision your memory allows

Command

Size

What you get

ollama run laguna-xs-2.1

20 GB

4-bit default: the everyday pick for a 32–36 GB Mac

ollama run laguna-xs-2.1:q8_0

36 GB

8-bit: sharper answers if you have the memory

ollama run laguna-xs-2.1:bf16

67 GB

Full precision: workstation or Mac Studio territory

Download sizes and tags from Ollama’s Laguna page. Every variant carries the full 256K context. Pick the largest that leaves a few gigabytes free for the code you feed it.

Ollama serves the model on a local port with an OpenAI-compatible API, which is the detail that makes it useful rather than a toy. Point an editor extension at that endpoint and you have a private coding assistant inside your IDE. Continue.dev and Zed both accept a local Ollama backend; Tabby is a self-hosted option built for exactly this, a coding server your whole team connects to. In every case the request goes to your machine, or your team’s machine, and stops there. If you prefer to skip Ollama, Poolside ships GGUF builds for llama.cpp and the raw weights for vLLM and SGLang, which is the route you take to serve the big M.1 to a team.

There is a second move the open weights unlock that a cloud API never will: fine-tuning on your own codebase. Because you hold the raw weights, a team can further train Laguna on its private repositories, its internal conventions, its house style, and keep the resulting model on the same locked-down hardware. The training data, the most sensitive code you own, stays put the whole time. That is the enterprise version of the pitch, and it is only possible when the model lives on your side of the wall.

One practical note on memory: the download size is a fair proxy for the RAM the weights will occupy, and you want headroom on top for the context you feed it. A 20 GB model on a 36 GB Mac leaves comfortable room for a long file and the model’s working memory. If your machine is tighter, the 4-bit build is the one to start with, and you can always step up later.

The license lets you keep what it writes

Open weights are worthless if the license won’t let you ship what you build, so read this part before you commit a team to it. The two XS releases split on license. Laguna XS.2, the April model, is Apache 2.0, one of the most permissive licenses in software: use it, modify it, fine-tune it, run it commercially, with no user cap and no royalty. The newer XS 2.1 moved to OpenMDW-1.1, a permissive license written specifically for model weights, which likewise allows commercial and non-commercial use. M.1 is Apache 2.0.

In plain English: whatever Laguna writes for you is yours, and nothing in the license forces your code back out to Poolside. That is the clean version of “open” that not every model offers. It also lines up with the deeper reason to run AI on hardware you control, which is that a tool you own outright can’t be priced up, deprecated, or switched off from someone else’s billing dashboard.

The takeaway is short. If your code can leave your network and you just want the best benchmark score, a large cloud coder still wins, and you should use one. If it can’t, or you’d rather it didn’t, Laguna is the first code-native model that scores in the same neighborhood while running entirely on your side of the wall. Run ollama run laguna-xs-2.1 on a capable Mac to try the local model, and reach for M.1 on a server when a whole team needs one. Either way, the source never leaves the building.

Run Poolside Laguna locally: a coding model your source never leaves

A cloud assistant is a non-starter when your code can’t leave

Laguna was trained on whether the code runs

The family is a laptop model and a server model

The scores hold up, and the footprint is the story

Running it takes one command

The license lets you keep what it writes

The best on-device AI apps for Linux (2026)

The best on-device AI apps for Windows (2026)

What is quantization? How a giant AI model fits on your laptop

One-time payment. Yours forever.