What is an embedding? How AI turns meaning into numbers
Spotify hands you thirty unheard songs and a third are keepers. The trick: it turns meaning into coordinates, then measures the distance.
Every Monday, Spotify hands you a playlist of thirty songs you’ve never heard, and somehow a third of them are keepers. No human picked them for you. The app didn’t match the genre tag or the artist name. It did something stranger: it turned your taste into a set of coordinates, turned millions of songs into coordinates too, and handed you the closest ones.
That move — turning meaning into numbers you can measure distance between — is the single most useful trick in modern AI, and almost nobody outside the field has a name for it. The name is embedding. It quietly powers search, recommendations, the “chat with your documents” feature you’ve tried, spam filters, and the part of every AI assistant that decides what’s relevant.
An embedding turns a piece of text — a word, a sentence, a whole file — into a list of numbers that captures its meaning, so that things which mean similar things end up close together. Once meaning is a location, “find related” becomes “find nearby,” and a computer can do that in milliseconds. Here’s how it works, and why it’s under so much of what you use.
An embedding is a location, not a definition
Start with a map. Every town on a map is a pair of numbers — latitude and longitude. You don’t need a description of a town to know it’s near another; you just compare coordinates. Towns that are close on the map tend to share weather, accents, sports teams. The position carries information.
An embedding does the same thing for meaning. An embedding model reads a chunk of text and outputs its coordinates — not two numbers, but hundreds or thousands of them. Each is a position along some learned axis of meaning. You’ll never know exactly what axis number 419 stands for, but together the numbers place “dog” near “puppy,” far from “invoice,” and somewhere in between for “wolf.”
Nobody hand-writes those positions. The model learns them by reading an enormous amount of text and noticing which words keep similar company. “Coffee” and “tea” show up in the same kinds of sentences, so they drift together. The output is a vector — the technical word for an ordered list of numbers — and that vector is the embedding.

Closeness in that space means closeness in meaning
Here’s the payoff that makes the whole idea worth the trouble: once every text is a point, “how similar are these two things?” becomes “how far apart are these two points?” That’s a question arithmetic can answer instantly, at any scale.
The usual measure is cosine similarity— it compares the direction two vectors point, ignoring how long they are. Point the same way, score near 1, nearly identical meaning. Point at right angles, score near 0, unrelated. It’s a single multiplication and a sum, so a machine can rank a query against millions of stored vectors and return the closest in a blink.
This is why search by meaning beats search by keyword. Type “how do I get my money back” into an old keyword system and it hunts for the words “money” and “back.” A document that says “refunds are processed within 14 days” shares none of those words, so the keyword system misses it. Embed both, though, and the question lands right next to the answer, because they mean the same thing. No shared vocabulary required.
You can do arithmetic on meaning
The result that first made researchers sit up came in 2013, from a team led by Tomáš Mikolov at Google. Their system, word2vec, produced embeddings so well-structured you could do algebra on them. Take the vector for “king,” subtract “man,” add “woman,” and the nearest point in the whole space is “queen.”
Sit with that. The model was never told what a king is, or that gender exists. It just read text. Yet the directionyou travel to get from “man” to “woman” turned out to be roughly the same direction that gets you from “king” to “queen” — and from “uncle” to “aunt.” Relationships became geometry. That a documented regularity, reported by Mikolov and colleagues in a companion paper on linguistic regularities, showed meaning had shape was the moment embeddings stopped being a curiosity.
The cleaner versions of this trick are a little stage-managed — the original words are excluded from the answer, or the result lands close but not exactly on “queen.” The point isn’t that the algebra is perfect. It’s that meaning, encoded as position, has enough structure that a straight line through the space means something. Today’s sentence and document embeddings inherit that structure at a far larger scale.
Modern embeddings pack meaning into ~1,500 numbers
Word2vec embedded single words. The models you’d use now embed whole sentences and documents, and they’re a commodity. OpenAI’s text-embedding-3-small turns any passage into 1,536 numbers for about $0.02 per million tokens — pennies to index an entire knowledge base. Its larger sibling uses 3,072 numbers for finer distinctions, at $0.13 per million.
Google’s Gemini Embedding 001 also produces 3,072-dimensional vectors and topped the multilingual Massive Text Embedding Benchmark — MTEB, the standard scoreboard — at launch. The number of dimensions is the dial that matters: more numbers can capture finer shades of meaning, but every number is storage you pay for and time you spend comparing. Most teams don’t need the biggest one.
Almost every “smart” feature is embeddings underneath
Once you can measure distance between meanings, a surprising number of products turn out to be the same machine wearing different labels. Semantic search is “find the nearest documents to this query.” RAGis that search, with the results handed to a model to answer from. Recommendations are “find the nearest items to what you liked.”
Spotify’s recommendations literally work this way. The company built and open-sourced Annoy, a tool for finding a vector’s nearest neighbors among millions, and later replaced it with a faster one, Voyager. Every song and every listener is a point; the next track is a nearby point you haven’t played. The same trick deduplicates support tickets, groups thousands of reviews into themes, and routes a message to the right team — all by distance, no rules written by hand.

At true scale, systems don’t check every point — that’s too slow across billions of vectors. They use approximatenearest-neighbor search, with index structures (HNSW is the popular one) that find almost-certainly-the-closest matches while skipping most of the space. That’s the engine inside a vector database: store millions of embeddings, return the nearest to any query in milliseconds.
Shorter vectors, nearly the same meaning
A 3,072-number vector for every paragraph in a large archive adds up fast — in storage, in memory, in the time each comparison takes. So a clever training trick has become standard: build the embedding so its most important information sits in the early numbers, and you can chop off the rest with little loss.
It’s called Matryoshka representation learning, after the nesting dolls. One model gives you a full-size vector, but you can keep just the first 768 or 256 numbers and still get most of the accuracy. In testing, a Matryoshka embedding truncated to 128 numbers often matches an ordinary one trained at 512 — a 4× cut in storage for roughly the same quality. OpenAI’s and Google’s newest models both expose this as a simple dimensions setting.
The practical upshot: you’re no longer forced to choose precision or thrift up front. Index at full size, serve at a smaller one, and dial the trade-off to the job. For most search and RAG setups, a shortened vector is indistinguishable from the full one and a fraction of the cost.
The most private embeddings run on your machine
For years, “use embeddings” meant “send your text to an API.” Indexing your own files — the contracts, the notes, the years of email — meant uploading exactly the material you have the strongest reasons to keep private. That trade-off is now optional.
Google’s EmbeddingGemma, released in September 2025, is a 308-million-parameter embedding model that runs in under 200 MB of RAM — small enough for a laptop or even a phone — while ranking as the top open multilingual embedder under 500M on MTEB. It produces vectors entirely on-device, so the documents you index never touch a server you don’t own.

That matters because the most valuable embeddings are over your stuff, not the public web. A local embedder plus a local model is a complete “search and answer over my files” system that works offline and leaks nothing — often the only architecture you’re allowed to shipin regulated work. It’s the bet CSuite is built on.
Strip away the math and an embedding is a simple idea with deep reach: give every piece of meaning an address, and “what’s related” turns into “what’s nearby.” The model didn’t learn definitions. It learned where things sit. Once you see that meaning has a map, half of what AI does — the searching, the recommending, the looking-things-up — stops looking like magic and starts looking like measuring distance.


