Skip to main content

Beyond keywords: Semantic search and vector embeddings for AI data retrieval

Learn and explore why keyword search isn’t enough for AI optimization

Traditional keyword search works by matching exact terms. For simple lookups, keyword matching can suffice. When you ask a question, however, you want an answer. Not just an answer, but an answer that makes sense. An answer with context. An answer with meaning.

Think of the following search query.

Why is the sky blue?

Keyword matching

When using keyword matching, the algorithm literally looks for density when ranking its matches. To a human, the snippet below makes no sense. A pure keyword matching algorithm would actually rank this pretty high.

blue blue blue. The sky is blue. The sky is blue because blue blue blue. Why do you think the sky is blue? The sky is blue because donuts! Blue blue sky blue.

In this guide, we'll tell you why the sky is blue. Before we tell you why the sky is blue, we need to answer a couple of basic questions.

- What is the sky?
- What does blue mean?
- Why are colors as we see them just an illusion?
- How do you code the blue sky?

...

The snippet you just read was a complete waste of your time. You’ll never get it back. However, this nonsense is packed with keywords like why, sky and blue. Keyword matching is useful but primitive. It’s easy to trick the algorithm into ranking garbage content higher. Over time, Search Engine Optimization (SEO) content shifted hard into keyword packing.

Garbage content loaded with keywords does better than the actual answer. This has led to a tragic decline of literary quality. For over a decade, writers have been told that keyword counts are more important than actually writing good content.

Keyword packing is often about tricking the algorithm. By design, keyword search is flawed. The SEO writer’s primary duty is often to exploit these flaws. Sadly, good writing becomes a secondary concern.

Semantic search takes a completely different route for matching. The algorithm understands the meaning of the query. With a bit of abstraction, we can break down what actually happens.

  • ?: The algorithm knows that the user is asking a question.
  • Why is the sky blue:
    • The user knows that the sky is blue.
    • The user knows what blue is. The user knows what the sky is.
    • The user wants to understand the relationship between the sky and blue.
  • The model searches through data that highlights this relationship until it finds an answer with context.

Then, you receive a result similar to the snippet below.

The sky is blue because of how sunlight scatters in Earth's atmosphere. Blue light has a shorter wavelength and is scattered more by air molecules than other colors, causing it to be dispersed throughout the sky.

Semantic search is used to find a real answer. It can highlight relationships between two objects like sky and blue. Our semantic answer has low keyword density — but density doesn’t matter here, context does. The user gets a clear answer with no fluff or mention of donuts.

What are embeddings and how do they work?

At the core of semantic search is something called embedding. The search query gets embedded inside a vector so the machine can compare it to other vectors based on meaning, not words. Think of our question from earlier.

Why is the sky blue?

Below, our text is embedded in a vector. It looks more programmatic, but it’s actually still an abstraction.

["why", "is", "the", "sky", "blue"]

The vector below is still an example, but it shows how AI models interpret the actual data. Each word or phrase is represented by a set of floating point numbers. This conveys the actual meaning to the model.

[0.134, -0.982, 0.512, 0.088, ..., 0.274]

Whenever you talk to an AI model, your text gets converted into embeddings like the one you see above. Storing plaintext information forces the model to encode and decode everything constantly — at scale, it would be unsustainable. Humans experience our own version of this bottleneck. When your brain thinks, it sends electrical impulses between neurons. This is why thinking is fast and speaking is slow.

These embeddings allow the AI model to “think” — metaphorically speaking — in its optimized native format. When we store a ton of these embeddings in a searchable location, it’s called a vector database.

Storing embeddings in vector databases

Once you’ve embedded your data (scraped articles, product descriptions, customer reviews, etc.), you need to make it searchable for the machine. Traditional databases like SQL and NoSQL databases are great for keyword matching but terrible for semantic search.

If you’re new to computer science, think of a vector as just a long list. Instead of a list of singular objects, we have a searchable list of objects and each object contains multiple data fields. The AI model can compare our query against a list of objects that best match it.

{
  "id": "doc_001",
  "vector": [0.134, -0.982, 0.512, 0.088, ..., 0.274],
  "metadata": {
    "title": "Why is the sky blue?",
    "source": "weather-guide.com"
  }
}

The database contains a list of JSON-like objects. Each object contains a field for its own embedded vector. The model uses this field to understand the context of the data.

Implementing semantic search: Step-by-step

Now that we know how embeddings and vector databases work conceptually, let’s build a basic implementation. We need two main features — semantic search and a vector database. We’ll use Sentence Transformers to embed our data and Faiss to search it.

Before we get started, you’ll need to install some dependencies via pip.

Install Sentence Transformers

pip install sentence-transformers

Install Faiss

pip install faiss-cpu

1. Generate your embeddings

We can use the Sentence Transformers library to generate embeddings from text data. In the code below, we make a series of statements and stick them inside an array called docs. We then use model.encode() to encode each of our documents.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")

docs = [
    "The sky is blue because of how sunlight scatters.",
    "Dogs are domesticated mammals.",
    "The capital of France is Paris."
]

doc_embeddings = model.encode(docs)

2. Store embeddings in a Faiss index

The Faiss index allows us to convert the embeddings in a searchable database. faiss.IndexFlatL2() is used to create our index. doc_embeddings.shape[1] is used to specify the dimensions of the vector, as required by Faiss. We then use index.add() to add our embeddings to the index.

import faiss
import numpy as np

index = faiss.IndexFlatL2(doc_embeddings.shape[1])
index.add(np.array(doc_embeddings))

3. Embed the query

Now that we’ve got our searchable index, we need to create a query. In the snippet below, we create our query and encode it.

query = "Why is the sky blue?"
query_vector = model.encode([query])

4. Search for the top k most relevant results

In our final snippet here, we actually perform a search for the top k matches to the query. To keep things simple, we use the top one.

k = 1
distances, indices = index.search(np.array(query_vector), k)
print(docs[indices[0][0]])

Putting it all together

Below is a full script — composed of the snippets we went through above. Feel free to run the code yourself. Just make sure you ran the pip install commands from the beginning of this section.

from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

#load the embedding model
model = SentenceTransformer("all-MiniLM-L6-v2")

#sample documents for us to search, we only have three -- keep it simple
docs = [
    "The sky is blue because of how sunlight scatters.",
    "Dogs are domesticated mammals.",
    "The capital of France is Paris."
]

#generate embeddings
doc_embeddings = model.encode(docs)

#create a searchable faiss index
index = faiss.IndexFlatL2(doc_embeddings.shape[1])
index.add(np.array(doc_embeddings))

#create a query and encode it
query = "Why is the sky blue?"
query_vector = model.encode([query])

#search for the top-k closest documents
k = 1
distances, indices = index.search(np.array(query_vector), k)

#print the result to the console
print("Query:", query)
print("Top match:", docs[indices[0][0]])

Depending on your installation settings, you might receive a few debugging messages in the console as well, but pay attention to the last few lines of the output.

Query: Why is the sky blue?
Top match: The sky is blue because of how sunlight scatters.

Even with a simple script like this, our model is able to actually perform the search and find the relevant data.

Integrating semantic search with LLMs (RAG)

When you combine semantic search with a Large Language Model (LLM), you can create a Retrieval-Augmented Generation (RAG) pipeline. The user asks the model a question it doesn’t understand. The model then utilizes semantic search to find the answer before producing its output.

Workflow of a user query with a semantic search powered LLM

Without external data access, an LLM will do its best to give an answer — even if it has no clue what it’s talking about. When an LLM makes up an answer, this is called hallucination.

If you ask an LLM something it doesn’t know, its first instinct is to hallucinate an answer.

Why is the sky blue?

A model that doesn’t know the answer might say the sky is blue because... and proceed to cite a fictional scientific study justifying the color of the sky. A poorly trained model might even cite the keyword filled garbage from the beginning of this article, “The sky is blue because donuts!”

When we combine our AI model with the system created earlier, the model receives the proper information, "The sky is blue because of how sunlight scatters" and then generates output based on this external information The sky is blue due to the scattering of the sun's light.

RAG connects an LLM to external data sources. When combined with semantic search, an LLM can make sense of external data before responding to the user.

Use Cases: E-commerce, healthcare, customer support and research

Semantic search is not a niche tool. Semantic search is already taking over entire industries that once relied on keyword matching.

  • E-commerce: Customers often don’t search for the exact product name. When you search laptops on Amazon, you’re more likely to type in laptop than HP OmniBook X 14.
  • Healthcare: Between reports and submission forms, patients and providers generate vast amounts of text. Imagine a doctor using the following query, “What are the effects of this medication on patients using insulin and blood thinners?” A keyword search would run into major problems here, missing relevant records phrased differently and leaving the doctor with incomplete or scattered results.
  • Customer Support: Agents often deal with repetitive questions and differing answers, “Yes, I paid my bill” or “No, the machine isn’t turned on.” Semantic search allows agents to quickly find and verify individual customer information.
  • Research and Knowledge Work: Academics often deal with vast archives where even automated keyword matching is still very inefficient. Semantic search is replacing keyword search the way keyword search replaced physical lookups in a library.

When contextual understanding improves search quality, semantic search is quickly overtaking keyword search. This trend is likely to continue longterm.

Scaling and optimizing for performance

Semantic search is powerful, but like all software, performance becomes a bottleneck as a system grows. Searching through 100 documents is a quick process. Searching through a million becomes a real issue.

Here are some strategies to help you achieve solid performance at scale.

  • Approximate Nearest Neighbor (ANN) Algorithms: When dealing with enormous databases, vector search doesn’t scale. ANN algorithms trade a little bit of accuracy for a huge step up in performance.
  • Chunk Your Data: Giant documents aren’t easy to search. When you cut a 10,000 word document into 1,000 word chunks, it’s easier for the model to parse the search results.
  • Store Metadata: When each document contains accurate summaries or proper metadata, matches can be found without parsing full documents.

Challenges: Bias, latency, cost and privacy

Just like any other tool, semantic search comes with tradeoffs. Here are some challenges to watch out for. If you notice them, you can address them before they balloon into larger problems.

  • Bias: Your model reflects your training data. If your training data is biased, so-is your model. If your model is biased, those biases can show up even in your search results.
  • Latency: Real-time search only works if your vector database is structured properly. The larger the database, the slower the search. The larger each document, the slower the search.
  • Cost: Embeddings, storage and queries all add up. When using a managed service, hosting costs can quickly outrun your actual product growth. Keep your architecture as lean and efficient as possible.
  • Privacy: Don’t feed sensitive data into third party models. If your data lives in the healthcare, finance or legal industries — maintaining data privacy is a valid concern. No need to introduce unnecessary points of failure.

Conclusion

Semantic search changes the game. Information is no longer graded solely on keyword density. As time goes on, semantic search will continue to grow. Models can now explain why the sky is blue.and multimodal AI systems. The foundation of visual AI excellence lies not just in sophisticated algorithms but in the quality, diversity and ethical integrity of the datasets that fuel them.