Create a collection
Collections are the core data structure in TopK. They store documents and provide the interface for querying them efficiently.
Creating a collection
To create a collection in TopK, call the create()
function on client.collections()
.
The create()
function takes two parameters:
The name of the collection.
Schema definition that describes the document structure.
Below is an example of creating a collection named books
:
Schema
Collection schema in TopK is a map of field names and field specifications.
TopK supports the following field data types:
int()
int()
function is used to define an integer:
float()
float()
function is used to define a float:
bool()
bool()
function is used to define a boolean:
text()
text()
function is used to define a text:
f32_vector()
f32_vector()
function is used to define a vector field with 32-bit floating point values.
To configure the float vector dimension, pass a dimension
parameter to the f32_vector()
function:
The dimension of the vector.
The vector dimension will be validated when upserting documents. Passing a vector with a different dimension will result in an error.
u8_vector()
u8_vector()
function is used to define a vector field with u8
values.
To configure the vector dimension, pass a dimension
parameter to the u8_vector()
function:
The dimension of the vector.
binary_vector()
binary_vector()
function is used to define a binary vector packed into u8
values. You can pass vector dimension
as a parameter (required, greater than 0) which will be validated when upserting documents.
Binary vector dimension is defined in terms of the number of bytes. This means that for a 1024-bit binary vector, the dimension topk
expects is 128 (1024 / 8).
To configure the binary vector dimension, pass a dimension
parameter to the binary_vector()
function:
The dimension of the vector.
bytes()
bytes()
is used to define a bytes field in the schema.
Properties
required()
required()
is used to mark a field as required. All fields are optional
by default.
Functions
index()
index()
function is used to create an index on a field.
This function accepts a single parameter specifying the index type:
semantic_index()
This function is used to create both a keyword and a vector on a given field. This allows you to do both semantic search and keyword search over the same field. Note that semantic_index()
can only be called over text()
data type.
Optionally, you can pass a model
parameter and embedding_type
parameter to the semantic_index()
function:
Embedding model to use for semantic search. Currently, these two models are supported:
cohere/embed-english-v3
cohere/embed-multilingual-v3
(default)
TopK supports the following embedding types for Cohere models:
float32
uint8
binary
vector_index()
This function is used to create vector index on a vector field. You can add a vector index on f32_vector
, u8_vector
, or binary_vector
fields.
You must specify a metric
when calling vector_index()
. This parameter determines how vector similarity is calculated:
Supported vector distance metrics:
euclidean
cosine
dot_product
hamming
(only supported forbinary_vector()
type)
keyword_index()
This function is used to create a keyword index on a text field:
Adding a keyword index allows you to perform keyword search on this field.