Search Engineering on Qdrant - Vector Database

Hybrid Search with Reranking

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Qdrant Hybrid Search with Reranking

Time: 40 min	Level: Intermediate

Hybrid search combines dense and sparse retrieval to deliver precise and comprehensive results. By adding reranking with ColBERT, you can further refine search outputs for maximum relevance.

In this guide, we’ll show you how to implement hybrid search with reranking in Qdrant, leveraging dense, sparse, and late interaction embeddings to create an efficient, high-accuracy search system. Let’s get started!

Overview

Let’s start by breaking down the architecture:

Multivectors and Late Interaction

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Multivector Representations for Reranking in Qdrant

Time: 30 min	Level: Intermediate

Multivector Representations are one of the most powerful features of Qdrant. However, most people don’t use them effectively, resulting in massive RAM overhead, slow inserts, and wasted compute.

In this tutorial, you’ll discover how to effectively use multivector representations in Qdrant.

What are Multivector Representations?

In most vector engines, each document is represented by a single vector - an approach that works well for short texts but often struggles with longer documents. Single vector representations perform pooling of the token-level embeddings, which obviously leads to losing some information.

Semantic Search Basics

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Semantic Search Basics with Qdrant

Time: 30 min	Level: Beginner	Output: GitHub

This tutorial shows you how to build and deploy your own neural search service to look through descriptions of companies from startups-list.com and pick the most similar ones to your query. The website contains the company names, descriptions, locations, and a picture for each entry.

A neural search service uses artificial neural networks to improve the accuracy and relevance of search results. Besides offering simple keyword results, this system can retrieve results by meaning. It can understand and interpret complex search queries and provide more contextually relevant output, effectively enhancing the user’s search experience.

Semantic Search for Code

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Semantic Search for Code with Qdrant

Time: 45 min	Level: Intermediate

You too can enrich your applications with Qdrant semantic search. In this tutorial, we describe how you can use Qdrant to navigate a codebase, to help you find relevant code snippets. As an example, we will use the Qdrant source code itself, which is mostly written in Rust.

The approach

We want to search codebases using natural semantic queries, and searching for code based on similar logic. You can set up these tasks with embeddings:

Collaborative Filtering

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Build a Recommendation System with Collaborative Filtering using Qdrant

Time: 45 min	Level: Intermediate

Every time Spotify recommends the next song from a band you’ve never heard of, it uses a recommendation algorithm based on other users’ interactions with that song. This type of algorithm is known as collaborative filtering.

Unlike content-based recommendations, collaborative filtering excels when the objects’ semantics are loosely or unrelated to users’ preferences. This adaptability is what makes it so fascinating. Movie, music, or book recommendations are good examples of such use cases. After all, we rarely choose which book to read purely based on the plot twists.

Hybrid Search with FastEmbed

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Hybrid Search with Qdrant’s FastEmbed

Time: 20 min	Level: Beginner	Output: GitHub

This tutorial shows you how to build and deploy your own hybrid search service to look through descriptions of companies from startups-list.com and pick the most similar ones to your query. The website contains the company names, descriptions, locations, and a picture for each entry.

As we have already written on our blog, there is no single definition of hybrid search. In this tutorial we are covering the case with a combination of dense and sparse embeddings. The former ones refer to the embeddings generated by such well-known neural networks as BERT, while the latter ones are more related to a traditional full-text search approach.

Multivector Document Retrieval

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Qdrant Multivector Document Retrieval with ColPali/ColQwen

Time: 30 min	Level: Intermediate	Output: GitHub

Efficient PDF documents retrieval is a common requirement in tasks like (agentic) retrieval-augmented generation (RAG) and many other search-based applications. At the same time, setting up PDF documents retrieval is rarely possible without additional challenges.

Many traditional PDF retrieval solutions rely on optical character recognition (OCR) together with use case-specific heuristics to handle visually complex elements like tables, images and charts. These algorithms are often non-transferable – even within the same domain – with their task-customized parsing and chunking strategies, labor-intensive, prone to errors, and difficult to scale.

Retrieval Quality Evaluation

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Evaluate Retrieval Quality with Qdrant

Time: 30 min	Level: Intermediate

Semantic search pipelines are as good as the embeddings they use. If your model cannot properly represent input data, similar objects might be far away from each other in the vector space. No surprise, that the search results will be poor in this case. There is, however, another component of the process which can also degrade the quality of the search results. It is the ANN algorithm itself.

Static Embeddings

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Static Embeddings in Practice

Time: 20 min	Level: Intermediate

In the world of resource-constrained computing, a quiet revolution is taking place. While transformers dominate leaderboards with their impressive capabilities, static embeddings are making an unexpected comeback, offering remarkable speed improvements with surprisingly small quality trade-offs. We evaluated how Qdrant users can benefit from this renaissance, and the results are promising.

What makes static embeddings different?

Transformers are often seen as the only way to go when it comes to embeddings. The use of attention mechanisms helps to capture the relationships between the input tokens, so each token gets a vector representation that is context-aware and defined not only by the token itself but also by the surrounding tokens. Transformer-based models easily beat the quality of the older methods, such as word2vec or GloVe, which could only create a single vector embedding per each word. As a result, the word “bank” would have identical representation in the context of “river bank” and “financial institution”.