What Are Vector Embeddings? And Why They Matter in AI

If you’ve spent time exploring machine learning or AI recently, you’ve probably heard the term “embedding” — vector embedding, word embedding, image embedding… But what exactly are embeddings? And why are they everywhere?

In this post, we’ll break down what vector embeddings really are, why they’re useful, and how you can understand as human-beings. Along the way, we’ll use facial recognition models as a concrete example — but embeddings are also critical in systems like reverse image search, recommendation engines, and large language models (LLMs) like GPT.

🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

What Is a Vector Embedding?

In traditional machine learning, we often build classification models that are trained to recognize a fixed set of categories.

For example:
– In binary classification, we might ask: Is this a hot dog or not?
– In multiclass classification, we might extend that to: Is this a hot dog, pizza, burger, or pasta?

These models can be very powerful. But they have a major limitation: if you want to add a new category (e.g., taco), you need to retrain the model from scratch — often using a large dataset.

That’s where vector embeddings come in.

Instead of assigning inputs to rigid class labels, a model can output a vector embedding — a list of numbers that captures the essence or features of the input (whether it’s a food image, a face, a sentence, etc.). You can imagine that we would have infinite number of classes in our classification models.

For instance, if you pass an image of a burger through a deep neural network, it might produce a vector like:

embedding = [0.21, -0.57, 0.89, ..., 0.04]  # e.g., 512 dimensions

This vector doesn’t just say “burger” — it encodes visual and semantic features (like round shape, color, texture, etc.).

Here’s the powerful part: once you have embeddings, you don’t need to retrain the model to handle new examples. If you introduce a new food item (say taco), you can extract its embedding and compare it to other embeddings — using simple distance metrics.

So instead of retraining a classification model every time the world changes, you can use a pre-trained embedding model that generalizes to any kind of food, person, sentence, etc.

Where Are Embeddings Used?

Embeddings are used everywhere in modern AI. Here are a few real-world examples:

1. Facial Recognition

A deep learning model extracts embeddings from face images. These embeddings represent unique facial features in a numeric form. When two embeddings are similar, the system considers the faces to belong to the same person.

2. Reverse Image Search

When you upload an image to search for similar ones (e.g., Google Lens), the system converts your image into an embedding. It then compares that vector to millions of others to find visually similar results.

3. Large Language Models (LLMs)

LLMs like GPT use embeddings to represent words, sentences, and even entire documents. For example, the phrase “I’m feeling great today!” is turned into a vector that captures its sentiment and meaning — making it comparable to similar phrases like “I’m doing well.”

Why Are Embeddings Hard to Understand?

Here’s the tricky part: embeddings often live in high-dimensional spaces — with hundreds or even thousands of dimensions.

Humans naturally think in 2D, 3D (or 4D when you include time). We understand maps, graphs, and physical space. But try to imagine 512 dimensions? It’s impossible. Our brains aren’t wired for that.

Yet, for a computer, operating in high-dimensional spaces is totally normal. It can easily calculate similarities, distances, and clusters in those spaces using linear algebra.

So while embeddings are intuitive in what they represent, they’re not directly interpretable or visualizable — unless we simplify them.

Let’s Make Them Undestandable

To help us see embeddings, we can use dimensionality reduction techniques like PCA (Principal Component Analysis). PCA takes high-dimensional vectors and projects them down to lower dimensional space (2D in our experiment) — preserving the most important relationships between them.

When we apply PCA to a set of embeddings, we can plot them on a graph. Here’s what typically happens:

Similar items form tight clusters.
Different items spread out across the graph.

This gives us a visual intuition for how the model understands data.

BTW, PCA doesn’t have to reduce the dimension to 2D or 3D. For instance, VGG-Face represents facial images as 4096-dimensional vectors, then VGG researchers used a dimension reduction method (possibly PCA) to reduce it to 1024-dimensions before verification.

Example: Face Embeddings in 2D

Let’s take facial recognition as an example.

Say we extract 512-dimensional embeddings for multiple face images using the FaceNet model. Then we apply PCA to reduce the vectors to 2D.

When we plot them:

All embeddings from the same person appear as a tight cluster.
Embeddings from different people are distant from each other.

This confirms the key idea: similar data → similar vectors → close points. And this pattern holds whether you’re dealing with faces, texts, products, or users.

Preparing Embeddings

I will use unit test items of DeepFace library. Normally, images aren’t coming with labels for identities but I did it for you.

database = {
    "angelina_jolie": [
        "dataset/img1.jpg",
        "dataset/img2.jpg",
        "dataset/img4.jpg",
        "dataset/img5.jpg",
        "dataset/img6.jpg",
        "dataset/img7.jpg",
        "dataset/img10.jpg",
        "dataset/img11.jpg",
    ],
    "jennifer_aniston": [
        "dataset/img3.jpg",
        "dataset/img12.jpg",
        "dataset/img53.jpg",
        "dataset/img54.jpg",
        "dataset/img55.jpg",
        "dataset/img56.jpg",
    ],
    "scarlett_johansson": [
        "dataset/img9.jpg",
        "dataset/img47.jpg",
        "dataset/img48.jpg",
        "dataset/img49.jpg",
        "dataset/img50.jpg",
        "dataset/img51.jpg",
    ],
    "mark_zuckerberg": [
        "dataset/img13.jpg",
        "dataset/img14.jpg",
        "dataset/img15.jpg",
        "dataset/img57.jpg",
        "dataset/img58.jpg",
    ],
    "jack_dorsey": [
        "dataset/img16.jpg",
        "dataset/img17.jpg",
        "dataset/img59.jpg",
        "dataset/img61.jpg",
        "dataset/img62.jpg",
    ],
    "elon_musk": [
        "dataset/img18.jpg",
        "dataset/img19.jpg",
        "dataset/img67.jpg",
    ],
    "marissa_mayer": [
        "dataset/img22.jpg",
        "dataset/img23.jpg",
    ],
    "sundar_pichai": [
        "dataset/img24.jpg",
        "dataset/img25.jpg",
    ],
    "katty_perry": [
        "dataset/img26.jpg",
        "dataset/img27.jpg",
        "dataset/img28.jpg",
        "dataset/img42.jpg",
        "dataset/img43.jpg",
        "dataset/img44.jpg",
        "dataset/img45.jpg",
        "dataset/img46.jpg",
    ],
    "matt_damon": [
        "dataset/img29.jpg",
        "dataset/img30.jpg",
        "dataset/img31.jpg",
        "dataset/img32.jpg",
        "dataset/img33.jpg",
    ],
    "leonardo_dicaprio": [
        "dataset/img34.jpg",
        "dataset/img35.jpg",
        "dataset/img36.jpg",
        "dataset/img37.jpg",
    ],
    "george_clooney": [
        "dataset/img38.jpg",
        "dataset/img39.jpg",
        "dataset/img40.jpg",
        "dataset/img41.jpg",
    ],
}

Then, will use DeepFace to represent those images as vector embeddings.

model_name = "Facenet512"

# store each identity's many embeddings in vector_database dict
vector_database = {}
for identity, images in tqdm(database.items()):
    target_embeddings = []
    for image in images:
        emb = DeepFace.represent(
            img_path=image,
            model_name=model_name,
            detector_backend="mtcnn"
        )[0]["embedding"]
        target_embeddings.append(emb)
    vector_database[identity] = target_embeddings

# store all identities' embeddings in single target_identities list
target_identities = []
target_embeddings = []
for identity, embeddings in vector_database.items():
    for embedding in embeddings:
        target_embeddings.append(embedding)
        target_identities.append(identity)

# store corresponding identity for target_identities
image_sources = []
for identity, images in tqdm(database.items()):
    for image in images:
        image_sources.append(image.split("/")[-1].split(".")[0])

PCA

Once we feed all embeddings to PCA model, it will give us a list of x and y coordinates for each embedding.

pca = PCA(n_components=2)
vectors_2d = pca.fit_transform(target_embeddings)
x, y = zip(*vectors_2d)

Visualizing

Let’s demonstrate each embedding in 2D space. Also, we will use different color for each identity. In that way, we will be able to understand the clusters.

printed_labels = set()

plt.figure(figsize=(8, 6))
for i, (x, y) in enumerate(vectors_2d):
    target_identity = target_identities[i]

    target_idx = None
    for idx, (identity, _) in enumerate(database.items()):
        if target_identity == identity:
            target_idx = idx

    plt.scatter(x, y, color=colors[target_idx])

    # plt.text(x + 0.02, y + 0.02, image_sources[i], fontsize=12, color=colors[target_idx])
    if target_identity not in printed_labels:
        plt.text(x + 0.02, y + 0.02, target_identity, fontsize=12, color=colors[target_idx])
        printed_labels.add(target_identity)

plt.title(f"PCA-reduced 2D representations for {model_name} with indices")
plt.xlabel("PC1")
plt.ylabel("PC2")
plt.grid(True)
plt.show()

When we run this experiment for same dataset with different models, we will see obvious clusters.

TL;DR — Key Takeaways

Vector embeddings are numeric representations of complex data like images or text. They allow computers to compare and reason about data using simple math. Embeddings are used in facial recognition, reverse image search, LLMs, recommendation systems, and more.

Although embeddings live in high-dimensional spaces, we can project them into 2D, 3D or even 4D using PCA or t-SNE to visualize their structure. Similar embeddings should form clusters — making it easier to interpret how models “see” the world.

Final Thoughts

Embeddings are one of the most elegant ideas in AI — turning messy, unstructured data into structured, comparable vectors. Understanding how they work — and how to visualize them — is a big step toward understanding modern machine learning.

I pushed the source code of this experiment to GitHub. You can support this work by starring the repo.

Support this blog financially if you do like!

What Are Vector Embeddings? And Why They Matter in AI

What Is a Vector Embedding?

Where Are Embeddings Used?

1. Facial Recognition

2. Reverse Image Search

3. Large Language Models (LLMs)

Why Are Embeddings Hard to Understand?

Let’s Make Them Undestandable

Example: Face Embeddings in 2D

Preparing Embeddings

PCA

Visualizing

TL;DR — Key Takeaways

Final Thoughts

Related

Leave a Reply Cancel reply

What Is a Vector Embedding?

Where Are Embeddings Used?

1. Facial Recognition

2. Reverse Image Search

3. Large Language Models (LLMs)

Why Are Embeddings Hard to Understand?

Let’s Make Them Undestandable

Example: Face Embeddings in 2D

Preparing Embeddings

PCA

Visualizing

TL;DR — Key Takeaways

Final Thoughts

Related

Leave a Reply Cancel reply

Discover more from Sefik Ilkin Serengil