ExplainerEmbeddingsSearchJune 23, 202610 min read

What is an embedding? How AI turns meaning into numbers

Spotify hands you thirty unheard songs and a third are keepers. The trick: it turns meaning into coordinates, then measures the distance.

By Atul

Meaning, plotted as a map

similar words land near each other

An embedding turns each word into a point. The model never read a dictionary — it learned positions where neighbors share meaning. Real embeddings have hundreds of dimensions; this is two of them.

Every Monday, Spotify hands you a playlist of thirty songs you’ve never heard, and somehow a third of them are keepers. No human picked them for you. The app didn’t match the genre tag or the artist name. It did something stranger: it turned your taste into a set of coordinates, turned millions of songs into coordinates too, and handed you the closest ones.

That move — turning meaning into numbers you can measure distance between — is the single most useful trick in modern AI, and almost nobody outside the field has a name for it. The name is embedding. It quietly powers search, recommendations, the “chat with your documents” feature you’ve tried, spam filters, and the part of every AI assistant that decides what’s relevant.

An embedding turns a piece of text — a word, a sentence, a whole file — into a list of numbers that captures its meaning, so that things which mean similar things end up close together. Once meaning is a location, “find related” becomes “find nearby,” and a computer can do that in milliseconds. Here’s how it works, and why it’s under so much of what you use.

An embedding is a location, not a definition

Start with a map. Every town on a map is a pair of numbers — latitude and longitude. You don’t need a description of a town to know it’s near another; you just compare coordinates. Towns that are close on the map tend to share weather, accents, sports teams. The position carries information.

An embedding does the same thing for meaning. An embedding model reads a chunk of text and outputs its coordinates — not two numbers, but hundreds or thousands of them. Each is a position along some learned axis of meaning. You’ll never know exactly what axis number 419 stands for, but together the numbers place “dog” near “puppy,” far from “invoice,” and somewhere in between for “wolf.”

Nobody hand-writes those positions. The model learns them by reading an enormous amount of text and noticing which words keep similar company. “Coffee” and “tea” show up in the same kinds of sentences, so they drift together. The output is a vector — the technical word for an ordered list of numbers — and that vector is the embedding.

A dense field of stars scattered across a dark night sky. — Picture every word as a star. The embedding is its position; the constellations are meanings that drifted together. Photo by Astroby krishna on Unsplash.

Closeness in that space means closeness in meaning

Here’s the payoff that makes the whole idea worth the trouble: once every text is a point, “how similar are these two things?” becomes “how far apart are these two points?” That’s a question arithmetic can answer instantly, at any scale.

The usual measure is cosine similarity— it compares the direction two vectors point, ignoring how long they are. Point the same way, score near 1, nearly identical meaning. Point at right angles, score near 0, unrelated. It’s a single multiplication and a sum, so a machine can rank a query against millions of stored vectors and return the closest in a blink.

This is why search by meaning beats search by keyword. Type “how do I get my money back” into an old keyword system and it hunts for the words “money” and “back.” A document that says “refunds are processed within 14 days” shares none of those words, so the keyword system misses it. Embed both, though, and the question lands right next to the answer, because they mean the same thing. No shared vocabulary required.

You can do arithmetic on meaning

The result that first made researchers sit up came in 2013, from a team led by Tomáš Mikolov at Google. Their system, word2vec, produced embeddings so well-structured you could do algebra on them. Take the vector for “king,” subtract “man,” add “woman,” and the nearest point in the whole space is “queen.”

Sit with that. The model was never told what a king is, or that gender exists. It just read text. Yet the directionyou travel to get from “man” to “woman” turned out to be roughly the same direction that gets you from “king” to “queen” — and from “uncle” to “aunt.” Relationships became geometry. That a documented regularity, reported by Mikolov and colleagues in a companion paper on linguistic regularities, showed meaning had shape was the moment embeddings stopped being a curiosity.

Meaning you can add and subtract

king−man+woman=queen

The 2013 result that put embeddings on the map: take the vector for king, subtract man, add woman, and the nearest point is queen. Direction in the space encodes relationships — here, royalty held constant while gender flips.

The cleaner versions of this trick are a little stage-managed — the original words are excluded from the answer, or the result lands close but not exactly on “queen.” The point isn’t that the algebra is perfect. It’s that meaning, encoded as position, has enough structure that a straight line through the space means something. Today’s sentence and document embeddings inherit that structure at a far larger scale.

Modern embeddings pack meaning into ~1,500 numbers

Word2vec embedded single words. The models you’d use now embed whole sentences and documents, and they’re a commodity. OpenAI’s text-embedding-3-small turns any passage into 1,536 numbers for about $0.02 per million tokens — pennies to index an entire knowledge base. Its larger sibling uses 3,072 numbers for finer distinctions, at $0.13 per million.

Google’s Gemini Embedding 001 also produces 3,072-dimensional vectors and topped the multilingual Massive Text Embedding Benchmark — MTEB, the standard scoreboard — at launch. The number of dimensions is the dial that matters: more numbers can capture finer shades of meaning, but every number is storage you pay for and time you spend comparing. Most teams don’t need the biggest one.

Four embedding models, 2026

“Dimensions” is how many numbers each vector holds. More can mean finer meaning — and more storage to pay for.

Model

Dimensions

Price

What it's for

OpenAI text-embedding-3-small

1,536

$0.02 / M tokens

The cheap workhorse; pennies to index a knowledge base.

OpenAI text-embedding-3-large

3,072

$0.13 / M tokens

Higher quality, 6.5× the price of small.

Gemini Embedding 001

3,072

API, hosted

Topped the multilingual MTEB leaderboard at launch.

EmbeddingGemmaLocal

768 → 128

Free, open weights

308M params, runs in under 200 MB of RAM. On your machine.

OpenAI text-embedding-3-small

Dims: 1,536 · Price: $0.02 / M tokens

The cheap workhorse; pennies to index a knowledge base.

OpenAI text-embedding-3-large

Dims: 3,072 · Price: $0.13 / M tokens

Higher quality, 6.5× the price of small.

Gemini Embedding 001

Dims: 3,072 · Price: API, hosted

Topped the multilingual MTEB leaderboard at launch.

EmbeddingGemmaLocal

Dims: 768 → 128 · Price: Free, open weights

308M params, runs in under 200 MB of RAM. On your machine.

Prices from OpenAI; on-device specs from Google.

Almost every “smart” feature is embeddings underneath

Once you can measure distance between meanings, a surprising number of products turn out to be the same machine wearing different labels. Semantic search is “find the nearest documents to this query.” RAGis that search, with the results handed to a model to answer from. Recommendations are “find the nearest items to what you liked.”

Spotify’s recommendations literally work this way. The company built and open-sourced Annoy, a tool for finding a vector’s nearest neighbors among millions, and later replaced it with a faster one, Voyager. Every song and every listener is a point; the next track is a nearby point you haven’t played. The same trick deduplicates support tickets, groups thousands of reviews into themes, and routes a message to the right team — all by distance, no rules written by hand.

A record spinning on a turntable, the needle tracking the groove. — Your “more like this” playlist is nearest-neighbor search: each track a point, the next pick the closest one you haven’t heard. Photo by János Venczák on Unsplash.

Six features, one trick underneath

Semantic search

Find passages that mean the same thing, even with no shared words.

RAG

Fetch the right documents for a model to answer from, by meaning.

Recommendations

Nearest neighbors to what you liked — songs, products, articles.

Deduplication

Two support tickets, different words, same problem — flagged as one.

Classification

Route a message by which labelled examples it sits closest to.

Clustering

Group thousands of reviews into themes nobody had to name first.

Every one is the same move: embed everything, then measure distance. Change what you embed and the same machinery serves a new feature.

At true scale, systems don’t check every point — that’s too slow across billions of vectors. They use approximatenearest-neighbor search, with index structures (HNSW is the popular one) that find almost-certainly-the-closest matches while skipping most of the space. That’s the engine inside a vector database: store millions of embeddings, return the nearest to any query in milliseconds.

Shorter vectors, nearly the same meaning

A 3,072-number vector for every paragraph in a large archive adds up fast — in storage, in memory, in the time each comparison takes. So a clever training trick has become standard: build the embedding so its most important information sits in the early numbers, and you can chop off the rest with little loss.

It’s called Matryoshka representation learning, after the nesting dolls. One model gives you a full-size vector, but you can keep just the first 768 or 256 numbers and still get most of the accuracy. In testing, a Matryoshka embedding truncated to 128 numbers often matches an ordinary one trained at 512 — a 4× cut in storage for roughly the same quality. OpenAI’s and Google’s newest models both expose this as a simple dimensions setting.

The practical upshot: you’re no longer forced to choose precision or thrift up front. Index at full size, serve at a smaller one, and dial the trade-off to the job. For most search and RAG setups, a shortened vector is indistinguishable from the full one and a fraction of the cost.

The most private embeddings run on your machine

For years, “use embeddings” meant “send your text to an API.” Indexing your own files — the contracts, the notes, the years of email — meant uploading exactly the material you have the strongest reasons to keep private. That trade-off is now optional.

Google’s EmbeddingGemma, released in September 2025, is a 308-million-parameter embedding model that runs in under 200 MB of RAM — small enough for a laptop or even a phone — while ranking as the top open multilingual embedder under 500M on MTEB. It produces vectors entirely on-device, so the documents you index never touch a server you don’t own.

A macro photograph of a dense black circuit board. — Indexing your files no longer needs a data center. A 308M-parameter embedder fits in 200 MB of RAM and runs where the documents already live. Photo by Alexandre Debiève on Unsplash.

That matters because the most valuable embeddings are over your stuff, not the public web. A local embedder plus a local model is a complete “search and answer over my files” system that works offline and leaks nothing — often the only architecture you’re allowed to shipin regulated work. It’s the bet CSuite is built on.

Strip away the math and an embedding is a simple idea with deep reach: give every piece of meaning an address, and “what’s related” turns into “what’s nearby.” The model didn’t learn definitions. It learned where things sit. Once you see that meaning has a map, half of what AI does — the searching, the recommending, the looking-things-up — stops looking like magic and starts looking like measuring distance.

Embeddings: quick answers

Is an embedding the same as a token?

No, though they’re cousins. A tokenis a chunk of text the model reads. An embedding is the list of numbers that captures what a token — or a sentence, or a whole document — means. Tokens are the input; embeddings are the meaning the model assigns them.

Do embeddings stop a model from hallucinating?

Not by themselves. Embeddings power the retrieval step in RAG, which finds real documents to ground an answer. That cuts made-up answerssharply — but only if the retrieval pulls the right passages. Bad embeddings, or bad chunks, and the model is grounded in the wrong text.

Why are there so many numbers in one vector?

Each number is a learned direction of meaning — loosely, “how royal,” “how plural,” “how negative.” You need hundreds of them to separate millions of distinct ideas. 1,536 is a common size; some models use 3,072 for finer distinctions.

Can I make embeddings without sending data to a cloud?

Yes. Open models like EmbeddingGemma run in under 200 MB of RAM, so the text you index never leaves your machine. For private files — contracts, notes, medical records — that’s the whole point.

What is an embedding? How AI turns meaning into numbers

An embedding is a location, not a definition

Closeness in that space means closeness in meaning

You can do arithmetic on meaning

Modern embeddings pack meaning into ~1,500 numbers

Almost every “smart” feature is embeddings underneath

Shorter vectors, nearly the same meaning

The most private embeddings run on your machine

Embeddings: quick answers

What is prompt injection? The flaw every AI agent ships with

What is MCP? The standard that lets AI actually do things

Production got free. Taste got expensive.

One-time payment. Yours forever.