True Hybrid search
Combine multiple retrieval techniques in a single query.
With TopK’s true hybrid search, you can combine multiple retrieval techniques such as:
- vector search
- multi-vector search
- keyword search
- metadata filtering
— all in a single query.
How TopK differs from other “hybrid” search systems
Most databases that offer hybrid search maintain separate vector and keyword indexes. When a query is executed they:
- Run two separate queries for both indexes
- Collect the top results from each query (e.g. first 100 + 100 candidates)
- Use techniques like Reciprocal Rank Fusion (RRF) to merge and rerank these two sets of results
This approach is fundamentally probabilistic - the final top-k results are not guaranteed to be the actual best candidates because some potential candidates might be missed if they don’t appear in either index’s top results.
TopK is different. It runs through a single index(vector + keyword), ensuring that our “top 100” results are the actual top 100 - not just a probabilistic approximation:
With TopK, you can:
- Retrieve documents based on multiple embeddings — Multi-vector retrieval
- Combine semantic similarity(e.g vector search) with keyword search — True Hybrid Retrieval
- Filter documents by their metadata
- Apply custom scoring functions blending multiple ranking factors — Custom scoring
Implementing Hybrid Search (Vector + Keyword)
Hybrid retrieval combines semantic similarity (vector-based search) with exact keyword matching. This approach ensures that documents with direct keyword matches are considered alongside those that are semantically similar to the query.
Let’s define a collection with one keyword_index()
and one semantic_index()
:
In the following example we’ll perform a hybrid search that combines keyword and vector(semantic) search in a single query:
Let’s break down the example above:
- We retrieve documents based on semantic meaning (
content_similarity
) and keyword matching (text_score
). - The
filter()
ensures that documents contain at least one relevant keyword. - The
topk()
function weights the scores, prioritizing semantic meaning (60%) while still considering keyword matches (40%).
This balances precision and recall, capturing both exact keyword matches and meaningful context.
Implementing Complex Search(Keyword + Vector + Filtering + Reranking)
In TopK, you can combine keyword search, vector search, filtering and reranking in a single query. This allows you to fetch the truly most relevant results while maintaining a steady performance - no overfetching.
As you might have noticed, we are also sorting the top-k results using a custom scoring function. You can read more about custom scoring functions in the following section.
Custom Scoring Functions
TopK allows you to define custom scoring functions by combining:
- Semantic similarity score
- Keyword score(BM25)
- Vector distance
- “Bring-your-own” precomputed importance score
Defining a Collection with Custom Scoring Fields
Querying with a Custom Scoring Function
Let’s break down the example above:
- First, we retrieve documents based on both semantic similarity (
content_score
) and precomputed importance (importance_score
). - Then, the
topk()
function gives 80% weight to content score and 20% weight to document importance. - Sorting by a custom scoring function allows us to boost more critical documents, ensuring that highly relevant but less “important” content doesn’t dominate.