Vector search is the essential feature of TopK. With that in mind, it is designed to:

  • Stay above 95% recall — your application (e.g. recommendation, image search, semantic search) rarely misses relevant results.
  • Provide consistent low latency (<100ms/p99)
  • Support large-scale as well as multi-tenant configurations

Prerequisites

Define a schema with a vector field e.g. f32_vector() and add a vector_index() to it:

from topk_sdk.schema import text, f32_vector, vector_index

client.collections().create(
    "books",
    schema={
        "title": text().required(),
        "title_embedding": f32_vector(dimension=1536).required().index(vector_index(distance="cosine"))
    },
)

When defining a vector field, you need to specify the of the vector.

To perform a vector search on this field, index it with a vector index and specify the parameter.

Find the closest neighbors

To find the top-k closest neighbors of the query vector, use the vector_distance() function.

It computes a numeric value(depending on the vector distance metric specified in the vector index) which you can use to sort the results:

from topk_sdk.query import select, fn

docs = client.collection("books").query(
    select(
        "title",
        published_year=field("published_year"),
        # Compute vector similarity between the vector embedding of the string "epic fantasy adventure"
        # and the embedding stored in the `title_embedding` field.
        title_similarity=fn.vector_distance("title_embedding", [0.1, 0.2, 0.3, ...])
    )
    # Return top 10 results
    # sort: smaller euclidean distance = closer; larger cosine similarity = closer
    # if using euclidean distance, sort in ascending order(asc=True)
    .topk(field("title_similarity"), 10)
)

# Example results:
[
  {
    "_id": "2",
    "title": "Lord of the Rings",
    "title_similarity": 0.8150404095649719
  },
  {
    "_id": "1",
    "title": "The Catcher in the Rye",
    "title_similarity": 0.7825378179550171,
  }
]

Let’s break down the example above:

  1. Compute the cosine similarity between the query embedding and the title_embedding field using the vector_distance() function.
  2. Store the computed cosine similarity in the title_similarity field.
  3. Return the top 10 results sorted by the title_similarity field in a descending order.

Combine vector search with metadata filtering

Vector search can be easily combined with metadata filtering by adding a filter() stage to the query:

from topk_sdk.query import select, fn

docs = client.collection("books").query(
    select(
        "title",
        title_similarity=fn.vector_distance("title_embedding", [0.1, 0.2, 0.3, ...]),
        published_year=field("published_year"),
    )
    .filter(field("published_year") > 2000)
    .topk(field("title_similarity"), 10)
)