With TopK, you can implement vector-powered semantic search in just a few lines of code.

TopK comes with built-in embeddings and reranking, removing the need for third-party embedding models or custom reranking solutions.

In the following example, we’ll:

1

Define a collection schema

Define a collection schema for semantic search.

2

Add documents

Add documents to the collection.

3

Query the collection with semantic search

Perform a semantic search.

Define a collection schema

Semantic search is enabled by adding a semantic_index() to a text field in the collection schema:

from topk_sdk.schema import text, semantic_index

client.collections().create(
    "books",
    schema={
        "title": text().required().index(semantic_index()),
    },
)

This configuration automatically generates embeddings as well as enables keyword search for the specified text fields.

If you want to use your own embeddings instead of TopK’s built-in semantic_index(), see Vector Search.

Add documents to the collection

Let’s add some documents to the collection:

client.collection("books").upsert(
    [
        {"_id": "gatsby", "title": "The Great Gatsby"},
        {"_id": "1984", "title": "1984"},
        {"_id": "catcher", "title": "The Catcher in the Rye"}
    ],
)

To search for documents based on semantic similarity, use the semantic_similarity() function:

from topk_sdk.query import select, field, fn

docs = client.collection("books").query(
    select(
        "title",
        title_similarity=fn.semantic_similarity("title", "classic American novel")
    )
    .topk(field("title_similarity"), 10)
    .rerank()
)

# Example results:

[
  {
    "_id": "2",
    "title": "The Catcher in the Rye",
    "title_similarity": 0.9497610926628113,
    "_rank": 0,
    "_rerank_score": 0.048159245401620865,
  },
  {
    "_id": "1",
    "title": "The Great Gatsby",
    "title_similarity": 0.9480283856391907,
    "_rank": 1,
    "_rerank_score": 0.02818089909851551,
  }
]

Let’s break down the example above:

  1. The semantic_similarity() function computes the similarity between the query "classic American novel" and the text value stored in the title field for each document.
  2. TopK performs automatic query embedding under the hood using the model specified in the semantic_index() function.
  3. The results are ranked based on similarity, and the top 10 most relevant documents are returned.
  4. The optional .rerank() call uses a reranking model to improve relevance of the results. For more information, see our Reranking guide.

This works out of the box—no need to manage embeddings, external APIs, or third-party reranking models.

For certain use cases, you might want to use a combination of keyword search and semantic search:

from topk_sdk.query import select, fn, match

docs = client.collection("books").query(
    select(
        "title",
        title_similarity=fn.semantic_similarity("title", "catcher"),
        text_score=fn.bm25_score()  # Keyword-based relevance
    )
    .filter(match("classic"))  # Ensure the book contains the keyword "classic" in any of the text-indexed fields
    .topk(field("title_similarity") * 0.7 + field("text_score") * 0.3, 10) # Add 70% weight to semantic similarity and 30% weight to keyword relevance
    .rerank()
)

This example above combines keyword relevance (BM25) with semantic similarity,
ensuring your search results capture both exact matches and contextual meaning with a custom scoring function that’s best suited for your use case.