Collections are the core data structure in TopK. They store documents and provide the interface for querying them efficiently.

Creating a collection

To create a collection in TopK, call the create() function on client.collections().

The create() function takes two parameters:

name
string
required

The name of the collection.

schema
HashMap<String, FieldSpec>
required

Schema definition that describes the document structure.

Below is an example of creating a collection named books:

from topk_sdk.schema import int, text, f32_vector, vector_index, keyword_index, semantic_index

client.collections().create(
    "books",
    schema={
        "title": text().required().index(keyword_index()),
        "title_embedding": f32_vector(dimension=1536)
            .required()
            .index(vector_index(metric="euclidean")),
        "published_year": int().required(),
    },
)

Schema

Collection schema in TopK is a map of field names and field specifications.

TopK supports the following field data types:

int()

int() function is used to define an integer:

from topk_sdk.schema import int

"published_year": int()

float()

float() function is used to define a float:

from topk_sdk.schema import float

"price": float()

bool()

bool() function is used to define a boolean:

from topk_sdk.schema import bool

"is_published": bool()

text()

text() function is used to define a text:

from topk_sdk.schema import text

"title": text()

f32_vector()

f32_vector() function is used to define a vector field with 32-bit floating point values.

from topk_sdk.schema import f32_vector

"title_embedding": f32_vector(dimension=1536)

To configure the float vector dimension, pass a dimension parameter to the f32_vector() function:

dimension
int
required

The dimension of the vector.

The vector dimension will be validated when upserting documents. Passing a vector with a different dimension will result in an error.

u8_vector()

u8_vector() function is used to define a vector field with u8 values.

from topk_sdk.schema import u8_vector

"title_embedding": u8_vector(dimension=1536)

To configure the vector dimension, pass a dimension parameter to the u8_vector() function:

dimension
int
required

The dimension of the vector.

binary_vector()

binary_vector() function is used to define a binary vector packed into u8 values. You can pass vector dimension as a parameter (required, greater than 0) which will be validated when upserting documents.

Binary vector dimension is defined in terms of the number of bytes. This means that for a 1024-bit binary vector, the dimension topk expects is 128 (1024 / 8).

from topk_sdk.schema import binary_vector

"title_embedding": binary_vector(dimension=128)

To configure the binary vector dimension, pass a dimension parameter to the binary_vector() function:

dimension
int
required

The dimension of the vector.

bytes()

bytes() is used to define a bytes field in the schema.

from topk_sdk.schema import bytes

"image": bytes()

Properties

required()

required() is used to mark a field as required. All fields are optional by default.

"title": text().required()

Functions

index()

index() function is used to create an index on a field.

This function accepts a single parameter specifying the index type:

semantic_index()

This function is used to create both a keyword and a vector on a given field. This allows you to do both semantic search and keyword search over the same field. Note that semantic_index() can only be called over text() data type.

from topk_sdk.schema import semantic_index

"title": text().index(semantic_index())

Optionally, you can pass a model parameter and embedding_type parameter to the semantic_index() function:

model
string
default:"cohere/embed-multilingual-v3"

Embedding model to use for semantic search. Currently, these two models are supported:

  • cohere/embed-english-v3
  • cohere/embed-multilingual-v3 (default)
embedding_type
string
default:"float32"

TopK supports the following embedding types for Cohere models:

  • float32
  • uint8
  • binary

vector_index()

This function is used to create vector index on a vector field. You can add a vector index on f32_vector, u8_vector, or binary_vector fields.

from topk_sdk.schema import vector_index, f32_vector

"title_embedding": f32_vector(dimension=1536).index(vector_index(metric="cosine"))

You must specify a metric when calling vector_index(). This parameter determines how vector similarity is calculated:

metric
string
required

Supported vector distance metrics:

  • euclidean
  • cosine
  • dot_product
  • hamming (only supported for binary_vector() type)

keyword_index()

This function is used to create a keyword index on a text field:

from topk_sdk.schema import keyword_index

"title": text().index(keyword_index())

Adding a keyword index allows you to perform keyword search on this field.