TopK documents are JSON-like objects containing key-value pairs.

Upsert function

To upsert documents, pass a list of documents to the upsert() function:

client.collection("books").upsert(
    [
        {
            "_id": "book-1",
            "title": "The Great Gatsby",
            "published_year": 1925,
            "title_embedding": [0.12, 0.67, 0.82, 0.53, ...]
        },
        {
            "_id": "book-2",
            "title": "To Kill a Mockingbird",
            "published_year": 1960,
            "title_embedding": [0.42, 0.53, 0.65, 0.33, ...]
        },
        {
            "_id": "book-3",
            "title": "1984",
            "published_year": 1949,
            "title_embedding": [0.59, 0.33, 0.71, 0.61, ...]
        }
    ]
)
  • Every document must have a string _id field.
  • If a document with the specified _id doesn’t exist, a new document will be inserted.
  • If a document with the same _id already exists, the existing document will be replaced with the new one.

The upsert() function does not perform a partial update or merge - the entire document is being replaced.

Supported types

TopK documents are a flat structure of key-value pairs.

The following value types are supported:

TypePython TypeJavaScript TypeHelper Function
Stringstrstring-
Integerintnumber-
Floatfloatnumber-
Booleanboolboolean-
Float32 vectorList[float]number[]f32_vector()
U8 vectoruse helperuse helperu8_vector()
Binary vectoruse helperuse helperbinary_vector()
Bytesuse helperuse helperbytes()

Here’s an example of a creating a collection with all supported types and inserting a document:


from topk_sdk.schema import (
    int,
    text,
    float,
    bool,
    f32Vector,
    u8Vector,
    binaryVector,
    bytes,
)

client.collections().create(
    "books",
    schema={
        "title": text(),
        "published_year": int(),
        "price": float(),
        "is_published": bool(),
        "float_embedding": f32Vector(dimension=1536).index(vector_index(metric="cosine")),
        "u8_embedding": u8Vector(dimension=1536).index(vector_index(metric="euclidean")),
        "binary_embedding": binaryVector(dimension=1536).index(vector_index(metric="hamming")),
        "bytes": bytes(),
    },
)

Insert a document with all supported types:

import f32_vector, u8_vector, binary_vector, bytes from topk_sdk.data;

client.collection("books").upsert([
  {
    "_id": "1",
    "title": "The Great Gatsby",
    "published_year": 1925,
    "price": 10.99,
    "is_published": true,
    "float_embedding": f32_vector([0.12, 0.67, 0.82, 0.53]),
    "u8_embedding": u8_vector([0, 1, 2, 3]),
    "binary_embedding": binary_vector([0, 1, 1, 0]),
    "bytes": bytes([0, 1, 1, 0]),
  },
]);

Helper functions

In TopK, a vector is represented as a flat array of numbers. To differentiate between different types of vectors, use provided helper functions.

TopK supports the following types of vectors:

  • Float32 vectors
  • U8 vectors
  • Binary vectors

If no helper function is used, the vector is assumed to be a float32 vector.

f32_vector()

To pass a float32 vector, use the f32_vector() helper function:

import f32_vector from topk_sdk.data;

f32_vector([0.12, 0.67, 0.82, 0.53])

u8_vector()

To pass a u8 vector, use the u8_vector() helper function:

import u8_vector from topk_sdk.data;

u8_vector([0, 1, 2, 3])

binary_vector()

To pass a binary vector, use the binary_vector() helper function:

import binary_vector from topk_sdk.data;

binary_vector([0, 1, 1, 0])

bytes()

To pass a byte object, use the bytes() helper function:

import bytes from topk_sdk.data;

bytes([0, 1, 1, 0])