Create a collection

Collections are the core data structure in TopK. They store documents and provide the interface for querying them efficiently.

Creating a collection

To create a collection in TopK, call the create() function on client.collections().

The create() function takes two parameters:

name

string

required

The name of the collection.

schema

HashMap<String, FieldSpec>

required

Schema definition that describes the document structure.

Below is an example of creating a collection named books:

from topk_sdk.schema import int, text, f32_vector, vector_index, keyword_index, semantic_index

client.collections().create(
    "books",
    schema={
        "title": text().required().index(keyword_index()),
        "title_embedding": f32_vector(dimension=1536)
            .required()
            .index(vector_index(metric="euclidean")),
        "published_year": int().required(),
    },
)

Schema

Collection schema in TopK is a map of field names and field specifications.

TopK supports the following field data types:

int()

int() function is used to define an integer:

from topk_sdk.schema import int

"published_year": int()

float()

float() function is used to define a float:

from topk_sdk.schema import float

"price": float()

bool()

bool() function is used to define a boolean:

from topk_sdk.schema import bool

"is_published": bool()

text()

text() function is used to define a text:

from topk_sdk.schema import text

"title": text()

f32_vector()

f32_vector() function is used to define a vector field with 32-bit floating point values.

from topk_sdk.schema import f32_vector

"title_embedding": f32_vector(dimension=1536)

To configure the float vector dimension, pass a dimension parameter to the f32_vector() function:

dimension

int

required

The dimension of the vector.

The vector dimension will be validated when upserting documents. Passing a vector with a different dimension will result in an error.

u8_vector()

u8_vector() function is used to define a vector field with u8 values.

from topk_sdk.schema import u8_vector

"title_embedding": u8_vector(dimension=1536)

To configure the vector dimension, pass a dimension parameter to the u8_vector() function:

dimension

int

required

The dimension of the vector.

binary_vector()

binary_vector() function is used to define a binary vector packed into u8 values. You can pass vector dimension as a parameter (required, greater than 0) which will be validated when upserting documents.

Binary vector dimension is defined in terms of the number of bytes. This means that for a 1024-bit binary vector, the dimension topk expects is 128 (1024 / 8).

from topk_sdk.schema import binary_vector

"title_embedding": binary_vector(dimension=128)

To configure the binary vector dimension, pass a dimension parameter to the binary_vector() function:

dimension

int

required

The dimension of the vector.

bytes()

bytes() is used to define a bytes field in the schema.

from topk_sdk.schema import bytes

"image": bytes()

Properties

required()

required() is used to mark a field as required. All fields are optional by default.

"title": text().required()

Functions

index()

index() function is used to create an index on a field.

This function accepts a single parameter specifying the index type:

`semantic_index()`

This function is used to create both a keyword and a vector on a given field. This allows you to do both semantic search and keyword search over the same field. Note that semantic_index() can only be called over text() data type.

from topk_sdk.schema import semantic_index

"title": text().index(semantic_index())

Optionally, you can pass a model parameter and embedding_type parameter to the semantic_index() function:

model

string

default:"cohere/embed-multilingual-v3"

Embedding model to use for semantic search. Currently, these two models are supported:

cohere/embed-english-v3
cohere/embed-multilingual-v3 (default)

embedding_type

string

default:"float32"

TopK supports the following embedding types for Cohere models:

float32
uint8
binary

`vector_index()`

This function is used to create vector index on a vector field. You can add a vector index on f32_vector, u8_vector, or binary_vector fields.

from topk_sdk.schema import vector_index, f32_vector

"title_embedding": f32_vector(dimension=1536).index(vector_index(metric="cosine"))

You must specify a metric when calling vector_index(). This parameter determines how vector similarity is calculated:

metric

string

required

Supported vector distance metrics:

euclidean
cosine
dot_product
hamming (only supported for binary_vector() type)

`keyword_index()`

This function is used to create a keyword index on a text field:

from topk_sdk.schema import keyword_index

"title": text().index(keyword_index())

Adding a keyword index allows you to perform keyword search on this field.

Get Started

Guides

Document API

Collection API

Create a collection

Creating a collection

Schema

int()

float()

bool()

text()

f32_vector()

u8_vector()

binary_vector()

bytes()

Properties

required()

Functions

index()

`semantic_index()`

`vector_index()`

`keyword_index()`

Get Started

Guides

Document API

Collection API

​Creating a collection

​Schema

​int()

​float()

​bool()

​text()

​f32_vector()

​u8_vector()

​binary_vector()

​bytes()

​Properties

​required()

​Functions

​index()

​semantic_index()

​vector_index()

​keyword_index()

Creating a collection

Schema

int()

float()

bool()

text()

f32_vector()

u8_vector()

binary_vector()

bytes()

Properties

required()

Functions

index()

`semantic_index()`

`vector_index()`

`keyword_index()`