FastEmbed on Qdrant - Vector Database

Quickstart

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

How to Generate Text Embedings with FastEmbed

Install FastEmbed

pip install fastembed

Just for demo purposes, you will use Lists and NumPy to work with sample data.

from typing import List
import numpy as np

Load default model

In this example, you will use the default text embedding model, BAAI/bge-small-en-v1.5.

from fastembed import TextEmbedding

Add sample data

Now, add two sample documents. Your documents must be in a list, and each document must be a string

documents: List[str] = [
 "FastEmbed is lighter than Transformers & Sentence-Transformers.",
 "FastEmbed is supported by and maintained by Qdrant.",
]

Download and initialize the model. Print a message to verify the process.

FastEmbed & Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Using FastEmbed with Qdrant for Vector Search

Install Qdrant Client and FastEmbed

pip install "qdrant-client[fastembed]>=1.14.2"

Initialize the client

Qdrant Client has a simple in-memory mode that lets you try semantic search locally.

from qdrant_client import QdrantClient, models

client = QdrantClient(":memory:") # Qdrant is running from RAM.

Add data

Now you can add two sample documents, their associated metadata, and a point id for each.

docs = [
 "Qdrant has a LangChain integration for chatbots.",
 "Qdrant has a LlamaIndex integration for agents.",
]
metadata = [
 {"source": "langchain-docs"},
 {"source": "llamaindex-docs"},
]
ids = [42, 2]

Create a collection

Qdrant stores vectors and associated metadata in collections. Collection requires vector parameters to be set during creation. In this tutorial, we’ll be using BAAI/bge-small-en to compute embeddings.

Working with miniCOIL

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

How to use miniCOIL, Qdrant’s Sparse Neural Retriever

miniCOIL is an open-sourced sparse neural retrieval model that acts as if a BM25-based retriever understood the contextual meaning of keywords and ranked results accordingly.

miniCOIL scoring is based on the BM25 formula scaled by the semantic similarity between matched keywords in a query and a document. $$ \text{miniCOIL}(D,Q) = \sum_{i=1}^{N} \text{IDF}(q_i) \cdot \text{Importance}^{q_i}_{D} \cdot {\color{YellowGreen}\text{Meaning}^{q_i \times d_j}} \text{, where keyword } d_j \in D \text{ equals } q_i $$

Working with SPLADE

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

How to Generate Sparse Vectors with SPLADE

SPLADE is a novel method for learning sparse text representation vectors, outperforming BM25 in tasks like information retrieval and document classification. Its main advantage is generating efficient and interpretable sparse vectors, making it effective for large-scale text data.

Setup

First, install FastEmbed.

pip install -q fastembed

Next, import the required modules for sparse embeddings and Python’s typing module.

from fastembed import SparseTextEmbedding, SparseEmbedding

You may always check the list of all supported sparse embedding models.

Working with ColBERT

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

How to Generate ColBERT Multivectors with FastEmbed

ColBERT

ColBERT is an embedding model that produces a matrix (multivector) representation of input text, generating one vector per token (a token being a meaningful text unit for a machine learning model). This approach allows ColBERT to capture more nuanced input semantics than many dense embedding models, which represent an entire input with a single vector. By producing more granular input representations, ColBERT becomes a strong retriever. However, this advantage comes at the cost of increased resource consumption compared to traditional dense embedding models, both in terms of speed and memory.

Reranking with FastEmbed

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

How to use rerankers with FastEmbed

Rerankers

A reranker is a model that improves the ordering of search results. A subset of documents is initially retrieved using a fast, simple method (e.g., BM25 or dense embeddings). Then, a reranker – a more powerful, precise, but slower and heavier model – re-evaluates this subset to refine document relevance to the query.

Rerankers analyze token-level interactions between the query and each document in depth, making them expensive to use but precise in defining relevance. They trade speed for accuracy, so they are best used on a limited candidate set rather than the entire corpus.

Multi-Vector Postprocessing

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Multi-Vector Postprocessing

FastEmbed’s postprocessing module provides techniques for transforming and optimizing embeddings after generation. These postprocessing methods can improve search performance, reduce storage requirements, or adapt embeddings for specific use cases.

Currently, the postprocessing module includes MUVERA (Multi-Vector Retrieval Algorithm) for speeding up multi-vector embeddings. Additional postprocessing techniques are planned for future releases.

MUVERA

MUVERA transforms variable-length sequences of vectors into fixed-dimensional single-vector representations. These approximations can be used for fast initial retrieval using traditional vector search methods like HNSW. Once you’ve retrieved a small set of candidates quickly, you can then rerank them using the original multi-vector representations for maximum accuracy.