Qdrant Articles on Qdrant - Vector Database

Distance-based data exploration

info@qdrant.tech (Andrey Vasnetsov) — Tue, 11 Mar 2025 12:00:00 +0300

Hidden Structure

When working with large collections of documents, images, or other arrays of unstructured data, it often becomes useful to understand the big picture. Examining data points individually is not always the best way to grasp the structure of the data.

Datapoints without context, pretty much useless

As numbers in a table obtain meaning when plotted on a graph, visualising distances (similar/dissimilar) between unstructured data items can reveal hidden structures and patterns.

Modern Sparse Neural Retrieval: From Theory to Practice

info@qdrant.tech (Andrey Vasnetsov) — Wed, 23 Oct 2024 00:00:00 +0000

Finding enough time to study all the modern solutions while keeping your production running is rarely feasible. Dense retrievers, hybrid retrievers, late interaction… How do they work, and where do they fit best? If only we could compare retrievers as easily as products on Amazon!

We explored the most popular modern sparse neural retrieval models and broke them down for you. By the end of this article, you’ll have a clear understanding of the current landscape in sparse neural retrieval and how to navigate through complex, math-heavy research papers with sky-high NDCG scores without getting overwhelmed.

Qdrant Summer of Code 2024 - ONNX Cross Encoders in Python

info@qdrant.tech (Andrey Vasnetsov) — Mon, 14 Oct 2024 08:00:00 +0300

Introduction

Hi everyone! I’m Huong (Celine) Hoang, and I’m thrilled to share my experience working at Qdrant this summer as part of their Summer of Code 2024 program. During my internship, I worked on integrating cross-encoders into the FastEmbed library for re-ranking tasks. This enhancement widened the capabilities of the Qdrant ecosystem, enabling developers to build more context-aware search applications, such as question-answering systems, using Qdrant’s suite of libraries.

This project was both technically challenging and rewarding, pushing me to grow my skills in handling large-scale ONNX (Open Neural Network Exchange) model integrations, tokenization, and more. Let me take you through the journey, the lessons learned, and where things are headed next.

What is a Vector Database?

info@qdrant.tech (Andrey Vasnetsov) — Wed, 09 Oct 2024 09:29:33 -0300

An Introduction to Vector Databases

Most of the millions of terabytes of data we generate each day is unstructured. Think of the meal photos you snap, the PDFs shared at work, or the podcasts you save but may never listen to. None of it fits neatly into rows and columns.

Unstructured data lacks a strict format or schema, making it challenging for conventional databases to manage. Yet, this unstructured data holds immense potential for AI, machine learning, and modern search engines.

What is Vector Quantization?

info@qdrant.tech (Andrey Vasnetsov) — Wed, 25 Sep 2024 09:29:33 -0300

Vector quantization is a data compression technique used to reduce the size of high-dimensional data. Compressing vectors reduces memory usage while maintaining nearly all of the essential information. This method allows for more efficient storage and faster search operations, particularly in large datasets.

When working with high-dimensional vectors, such as embeddings from providers like OpenAI, a single 1536-dimensional vector requires 6 KB of memory.

With 1 million vectors needing around 6 GB of memory, as your dataset grows to multiple millions of vectors, the memory and processing demands increase significantly.

Two Approaches to Helping AI Agents Use Your API (And Why You Need Both)

info@qdrant.tech (Andrey Vasnetsov) — Wed, 28 Jan 2026 00:00:00 -0800

AI coding agents fail in predictable ways when working with APIs. Two recent approaches from Mintlify and Armin Ronacher attack different failure modes. Understanding both reveals something useful about how agents should interact with developer tools.

Two Failure Modes

When an agent writes code against your API, it can fail because:

It doesn’t know what it doesn’t know. The agent uses a deprecated method, misconfigures a parameter, or violates a constraint that isn’t obvious from type signatures. This is the “known unknowns” problem: things the API maintainer knows but the agent doesn’t.

Vector Search Resource Optimization Guide

info@qdrant.tech (Andrey Vasnetsov) — Sun, 09 Feb 2025 00:00:00 +0000

What’s in This Guide?

Resource Management Strategies: If you are trying to scale your app on a budget - this is the guide for you. We will show you how to avoid wasting compute resources and get the maximum return on your investment.

Performance Improvement Tricks: We’ll dive into advanced techniques like indexing, compression, and partitioning. Our tips will help you get better results at scale, while reducing total resource expenditure.

A Complete Guide to Filtering in Vector Search

info@qdrant.tech (Andrey Vasnetsov) — Tue, 10 Sep 2024 00:00:00 +0000

Imagine you sell computer hardware. To help shoppers easily find products on your website, you need to have a user-friendly search engine.

If you’re selling computers and have extensive data on laptops, desktops, and accessories, your search feature should guide customers to the exact device they want - or at least a very similar match.

When storing data in Qdrant, each product is a point, consisting of an id, a vector and payload:

Qdrant Internals: Immutable Data Structures

info@qdrant.tech (Andrey Vasnetsov) — Tue, 20 Aug 2024 10:45:00 +0200

Data Structures 101

Those who took programming courses might remember that there is no such thing as a universal data structure. Some structures are good at accessing elements by index (like arrays), while others shine in terms of insertion efficiency (like linked lists).

Hardware-optimized data structure

However, when we move from theoretical data structures to real-world systems, and particularly in performance-critical areas such as vector search, things become more complex. Big-O notation provides a good abstraction, but it doesn’t account for the realities of modern hardware: cache misses, memory layout, disk I/O, and other low-level considerations that influence actual performance.

miniCOIL: on the Road to Usable Sparse Neural Retrieval

info@qdrant.tech (Andrey Vasnetsov) — Tue, 13 May 2025 00:00:00 +0300

Have you ever heard of sparse neural retrieval? If so, have you used it in production?

It’s a field with excellent potential – who wouldn’t want to use an approach that combines the strengths of dense and term-based text retrieval? Yet it’s not so popular. Is it due to the common curse of “What looks good on paper is not going to work in practice”??

This article describes our path towards sparse neural retrieval as it should be – lightweight term-based retrievers capable of distinguishing word meanings.

Relevance Feedback in Informational Retrieval

info@qdrant.tech (Andrey Vasnetsov) — Thu, 27 Mar 2025 00:00:00 +0300

A problem well stated is a problem half solved.

This quote applies as much to life as it does to information retrieval.

With a well-formulated query, retrieving the relevant document becomes trivial. In reality, however, most users struggle to precisely define what they are searching for.

While users may struggle to formulate a perfect request — especially in unfamiliar topics — they can easily judge whether a retrieved answer is relevant or not.

Built for Vector Search

info@qdrant.tech (Andrey Vasnetsov) — Mon, 17 Feb 2025 10:00:00 +0300

Any problem with even a bit of complexity requires a specialized solution. You can use a Swiss Army knife to open a bottle or poke a hole in a cardboard box, but you will need an axe to chop wood — the same goes for software.

In this article, we will describe the unique challenges vector search poses and why a dedicated solution is the best way to tackle them.

Any* Embedding Model Can Become a Late Interaction Model... If You Give It a Chance!

info@qdrant.tech (Andrey Vasnetsov) — Wed, 14 Aug 2024 00:00:00 +0000

* At least any open-source model, since you need access to its internals.

You Can Adapt Dense Embedding Models for Late Interaction

Qdrant 1.10 introduced support for multi-vector representations, with late interaction being a prominent example of this model. In essence, both documents and queries are represented by multiple vectors, and identifying the most relevant documents involves calculating a score based on the similarity between the corresponding query and document embeddings. If you’re not familiar with this paradigm, our updated Hybrid Search article explains how multi-vector representations can enhance retrieval quality.

Optimizing Memory for Bulk Uploads

info@qdrant.tech (Andrey Vasnetsov) — Thu, 13 Feb 2025 00:00:00 +0000

Optimizing Memory Consumption During Bulk Uploads

Efficient memory management is a constant challenge when you’re dealing with large-scale vector data. In high-volume ingestion scenarios, even seemingly minor configuration choices can significantly impact stability and performance.

Let’s take a look at the best practices and recommendations to help you optimize memory usage during bulk uploads in Qdrant. We’ll cover scenarios with both dense and sparse vectors, helping your deployments remain performant even under high load and avoiding out-of-memory errors.

Introducing Gridstore: Qdrant's Custom Key-Value Store

info@qdrant.tech (Andrey Vasnetsov) — Wed, 05 Feb 2025 00:00:00 +0000

Why We Built Our Own Storage Engine

Databases need a place to store and retrieve data. That’s what Qdrant’s key-value storage does—it links keys to values.

When we started building Qdrant, we needed to pick something ready for the task. So we chose RocksDB as our embedded key-value store.

It is mature, reliable, and well-documented.

Over time, we ran into issues. Its architecture required compaction (uses LSMT), which caused random latency spikes. It handles generic keys, while we only use it for sequential IDs. Having lots of configuration options makes it versatile, but accurately tuning it was a headache. Finally, interoperating with C++ slowed us down (although we will still support it for quite some time 😭).

What is Agentic RAG? Building Agents with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Fri, 22 Nov 2024 00:00:00 +0000

Standard Retrieval Augmented Generation follows a predictable, linear path: receive a query, retrieve relevant documents, and generate a response. In many cases that might be enough to solve a particular problem. In the worst case scenario, your LLM will just decide to not answer the question, because the context does not provide enough information.

On the other hand, we have agents. These systems are given more freedom to act, and can take multiple non-linear steps to achieve a certain goal. There isn’t a single definition of what an agent is, but in general, it is an application that uses LLM and usually some tools to communicate with the outside world. LLMs are used as decision-makers which decide what action to take next. Actions can be anything, but they are usually well-defined and limited to a certain set of possibilities. One of these actions might be to query a vector database, like Qdrant, to retrieve relevant documents, if the context is not enough to make a decision. However, RAG is just a single tool in the agent’s arsenal.

Hybrid Search Revamped - Building with Qdrant's Query API

info@qdrant.tech (Andrey Vasnetsov) — Thu, 25 Jul 2024 00:00:00 +0000

It’s been over a year since we published the original article on how to build a hybrid search system with Qdrant. The idea was straightforward: combine the results from different search methods to improve retrieval quality. Back in 2023, you still needed to use an additional service to bring lexical search capabilities and combine all the intermediate results. Things have changed since then. Once we introduced support for sparse vectors, the additional search service became obsolete, but you were still required to combine the results from different methods on your end.

What is RAG: Understanding Retrieval-Augmented Generation

info@qdrant.tech (Andrey Vasnetsov) — Tue, 19 Mar 2024 09:29:33 -0300

Retrieval-augmented generation (RAG) integrates external information retrieval into the process of generating responses by Large Language Models (LLMs). It searches a database for information beyond its pre-trained knowledge base, significantly improving the accuracy and relevance of the generated responses.

Language models have exploded on the internet ever since ChatGPT came out, and rightfully so. They can write essays, code entire programs, and even make memes (though we’re still deciding on whether that’s a good thing).

BM42: New Baseline for Hybrid Search

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jul 2024 12:00:00 +0300

For the last 40 years, BM25 has served as the standard for search engines. It is a simple yet powerful algorithm that has been used by many search engines, including Google, Bing, and Yahoo.

Qdrant 1.8.0: Enhanced Search Capabilities for Better Results

info@qdrant.tech (Andrey Vasnetsov) — Wed, 06 Mar 2024 00:00:00 -0800

Unlocking Next-Level Search: Exploring Qdrant 1.8.0’s Advanced Search Capabilities

Qdrant 1.8.0 is out!. This time around, we have focused on Qdrant’s internals. Our goal was to optimize performance so that your existing setup can run faster and save on compute. Here is what we’ve been up to:

Faster sparse vectors: Hybrid search is up to 16x faster now!
CPU resource management: You can allocate CPU threads for faster indexing.
Better indexing performance: We optimized text indexing on the backend.

Faster search with sparse vectors

Search throughput is now up to 16 times faster for sparse vectors. If you are using Qdrant for hybrid search, this means that you can now handle up to sixteen times as many queries. This improvement comes from extensive backend optimizations aimed at increasing efficiency and capacity.

Optimizing RAG Through an Evaluation-Based Methodology

info@qdrant.tech (Andrey Vasnetsov) — Wed, 12 Jun 2024 00:00:00 +0000

In today’s fast-paced, information-rich world, AI is revolutionizing knowledge management. The systematic process of capturing, distributing, and effectively using knowledge within an organization is one of the fields in which AI provides exceptional value today.

The potential for AI-powered knowledge management increases when leveraging Retrieval Augmented Generation (RAG), a methodology that enables LLMs to access a vast, diverse repository of factual information from knowledge stores, such as vector databases.

This process enhances the accuracy, relevance, and reliability of generated text, thereby mitigating the risk of faulty, incorrect, or nonsensical results sometimes associated with traditional LLMs. This method not only ensures that the answers are contextually relevant but also up-to-date, reflecting the latest insights and data available.

Is RAG Dead? The Role of Vector Databases in Vector Search | Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Tue, 27 Feb 2024 00:00:00 +0000

Is RAG Dead? The Role of Vector Databases in AI Efficiency and Vector Search

When Anthropic came out with a context window of 100K tokens, they said: “Vector search is dead. LLMs are getting more accurate and won’t need RAG anymore.”

Google’s Gemini 1.5 now offers a context window of 10 million tokens. Their supporting paper claims victory over accuracy issues, even when applying Greg Kamradt’s NIAH methodology.

It’s over. RAG (Retrieval Augmented Generation) must be completely obsolete now. Right?

Optimizing OpenAI Embeddings: Enhance Efficiency with Qdrant's Binary Quantization

info@qdrant.tech (Andrey Vasnetsov) — Wed, 21 Feb 2024 13:12:08 -0800

OpenAI Ada-003 embeddings are a powerful tool for natural language processing (NLP). However, the size of the embeddings are a challenge, especially with real-time search and retrieval. In this article, we explore how you can use Qdrant’s Binary Quantization to enhance the performance and efficiency of OpenAI embeddings.

In this post, we discuss:

The significance of OpenAI embeddings and real-world challenges.
Qdrant’s Binary Quantization, and how it can improve the performance of OpenAI embeddings
Results of an experiment that highlights improvements in search efficiency and accuracy
Implications of these findings for real-world applications
Best practices for leveraging Binary Quantization to enhance OpenAI embeddings

If you’re new to Binary Quantization, consider reading our article which walks you through the concept and how to use it with Qdrant

How to Implement Multitenancy and Custom Sharding in Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Tue, 06 Feb 2024 13:21:00 +0000

Scaling Your Machine Learning Setup: The Power of Multitenancy and Custom Sharding in Qdrant

We are seeing the topics of multitenancy and distributed deployment pop-up daily on our Discord support channel. This tells us that many of you are looking to scale Qdrant along with the rest of your machine learning setup.

Whether you are building a bank fraud-detection system, RAG for e-commerce, or services for the federal government - you will need to leverage a multitenant architecture to scale your product. In the world of SaaS and enterprise apps, this setup is the norm. It will considerably increase your application’s performance and lower your hosting costs.

Data Privacy with Qdrant: Implementing Role-Based Access Control (RBAC)

info@qdrant.tech (Andrey Vasnetsov) — Tue, 18 Jun 2024 08:00:00 -0300

Data stored in vector databases is often proprietary to the enterprise and may include sensitive information like customer records, legal contracts, electronic health records (EHR), financial data, and intellectual property. Moreover, strong security measures become critical to safeguarding this data. If the data stored in a vector database is not secured, it may open a vulnerability known as “embedding inversion attack,” where malicious actors could potentially reconstruct the original data from the embeddings themselves.

Discovery needs context

info@qdrant.tech (Andrey Vasnetsov) — Wed, 31 Jan 2024 08:00:00 -0300

Discovery needs context

When Christopher Columbus and his crew sailed to cross the Atlantic Ocean, they were not looking for the Americas. They were looking for a new route to India because they were convinced that the Earth was round. They didn’t know anything about a new continent, but since they were going west, they stumbled upon it.

They couldn’t reach their target, because the geography didn’t let them, but once they realized it wasn’t India, they claimed it a new “discovery” for their crown. If we consider that sailors need water to sail, then we can establish a context which is positive in the water, and negative on land. Once the sailor’s search was stopped by the land, they could not go any further, and a new route was found. Let’s keep these concepts of target and context in mind as we explore the new functionality of Qdrant: Discovery search.

What are Vector Embeddings? - Revolutionize Your Search Experience

info@qdrant.tech (Andrey Vasnetsov) — Tue, 06 Feb 2024 15:29:33 -0300

Embeddings are numerical machine learning representations of the semantic of the input data. They capture the meaning of complex, high-dimensional data, like text, images, or audio, into vectors. Enabling algorithms to process and analyze the data more efficiently.

You know when you’re scrolling through your social media feeds and the content just feels incredibly tailored to you? There’s the news you care about, followed by a perfect tutorial with your favorite tech stack, and then a meme that makes you laugh so hard you snort.

What is a Sparse Vector? How to Achieve Vector-based Hybrid Search

info@qdrant.tech (Andrey Vasnetsov) — Sat, 09 Dec 2023 13:00:00 +0300

Think of a library with a vast index card system. Each index card only has a few keywords marked out (sparse vector) of a large possible set for each book (document). This is what sparse vectors enable for text.

What are sparse and dense vectors?

Sparse vectors are like the Marie Kondo of data—keeping only what sparks joy (or relevance, in this case).

Consider a simplified example of 2 documents, each with 200 words. A dense vector would have several hundred non-zero values, whereas a sparse vector could have, much fewer, say only 20 non-zero values.

Qdrant 1.7.0 has just landed!

info@qdrant.tech (Andrey Vasnetsov) — Sun, 10 Dec 2023 10:00:00 +0000

Please welcome the long-awaited Qdrant 1.7.0 release. Except for a handful of minor fixes and improvements, this release brings some cool brand-new features that we are excited to share! The latest version of your favorite vector search engine finally supports sparse vectors. That’s the feature many of you requested, so why should we ignore it? We also decided to continue our journey with vector similarity beyond search. The new Discovery API covers some utterly new use cases. We’re more than excited to see what you will build with it! But there is more to it! Check out what’s new in Qdrant 1.7.0!

Deliver Better Recommendations with Qdrant’s new API

info@qdrant.tech (Andrey Vasnetsov) — Wed, 25 Oct 2023 09:46:00 +0000

The most popular use case for vector search engines, such as Qdrant, is Semantic search with a single query vector. Given the query, we can vectorize (embed) it and find the closest points in the index. But Vector Similarity beyond Search does exist, and recommendation systems are a great example. Recommendations might be seen as a multi-aim search, where we want to find items close to positive and far from negative examples. This use of vector databases has many applications, including recommendation systems for e-commerce, content, or even dating apps.

Vector Search as a dedicated service

info@qdrant.tech (Andrey Vasnetsov) — Thu, 30 Nov 2023 10:00:00 +0300

Ever since the data science community discovered that vector search significantly improves LLM answers, various vendors and enthusiasts have been arguing over the proper solutions to store embeddings.

Some say storing them in a specialized engine (aka vector database) is better. Others say that it’s enough to use plugins for existing databases.

Here are just a few of them.

This article presents our vision and arguments on the topic . We will:

FastEmbed: Qdrant's Efficient Python Library for Embedding Generation

info@qdrant.tech (Andrey Vasnetsov) — Wed, 18 Oct 2023 10:00:00 +0300

Data Science and Machine Learning practitioners often find themselves navigating through a labyrinth of models, libraries, and frameworks. Which model to choose, what embedding size, and how to approach tokenizing, are just some questions you are faced with when starting your work. We understood how many data scientists wanted an easier and more intuitive means to do their embedding work. This is why we built FastEmbed, a Python library engineered for speed, efficiency, and usability. We have created easy to use default workflows, handling the 80% use cases in NLP embedding.

Google Summer of Code 2023 - Polygon Geo Filter for Qdrant Vector Database

info@qdrant.tech (Andrey Vasnetsov) — Thu, 12 Oct 2023 08:00:00 +0300

Introduction

Greetings, I’m Zein Wen, and I was a Google Summer of Code 2023 participant at Qdrant. I got to work with an amazing mentor, Arnaud Gourlay, on enhancing the Qdrant Geo Polygon Filter. This new feature allows users to refine their query results using polygons. As the latest addition to the Geo Filter family of radius and rectangle filters, this enhancement promises greater flexibility in querying geo data, unlocking interesting new use cases.

Binary Quantization - Vector Search, 40x Faster

info@qdrant.tech (Andrey Vasnetsov) — Mon, 18 Sep 2023 13:00:00 +0300

Optimizing High-Dimensional Vectors with Binary Quantization

Qdrant is built to handle typical scaling challenges: high throughput, low latency and efficient indexing. Binary quantization (BQ) is our latest attempt to give our customers the edge they need to scale efficiently. This feature is particularly excellent for collections with large vector lengths and a large number of points.

Our results are dramatic: Using BQ will reduce your memory consumption and improve retrieval speeds by up to 40x.

Food Discovery Demo

info@qdrant.tech (Andrey Vasnetsov) — Tue, 05 Sep 2023 11:32:00 +0000

Not every search journey begins with a specific destination in mind. Sometimes, you just want to explore and see what’s out there and what you might like. This is especially true when it comes to food. You might be craving something sweet, but you don’t know what. You might be also looking for a new dish to try, and you just want to see the options available. In these cases, it’s impossible to express your needs in a textual query, as the thing you are looking for is not yet defined. Qdrant’s semantic search for images is useful when you have a hard time expressing your tastes in words.

Google Summer of Code 2023 - Web UI for Visualization and Exploration

info@qdrant.tech (Andrey Vasnetsov) — Mon, 28 Aug 2023 08:00:00 +0300

Introduction

Hello everyone! My name is Kartik Gupta, and I am thrilled to share my coding journey as part of the Google Summer of Code 2023 program. This summer, I had the incredible opportunity to work on an exciting project titled “Web UI for Visualization and Exploration” for Qdrant, a vector search engine. In this article, I will take you through my experience, challenges, and achievements during this enriching coding journey.

Qdrant Summer of Code 2024 - WASM based Dimension Reduction

info@qdrant.tech (Andrey Vasnetsov) — Sat, 31 Aug 2024 10:39:48 +0000

Introduction

Hello, everyone! I’m Jishan Bhattacharya, and I had the incredible opportunity to intern at Qdrant this summer as part of the Qdrant Summer of Code 2024. Under the mentorship of Andrey Vasnetsov, I dived into the world of performance optimization, focusing on enhancing vector visualization using WebAssembly (WASM). In this article, I’ll share the insights, challenges, and accomplishments from my journey — one filled with learning, experimentation, and plenty of coding adventures.

Semantic Search As You Type

info@qdrant.tech (Andrey Vasnetsov) — Mon, 14 Aug 2023 00:00:00 +0100

Qdrant is one of the fastest vector search engines out there, so while looking for a demo to show off, we came upon the idea to do a search-as-you-type box with a fully semantic search backend. Now we already have a semantic/keyword hybrid search on our website. But that one is written in Python, which incurs some overhead for the interpreter. Naturally, I wanted to see how fast I could go using Rust.

Vector Similarity: Going Beyond Full-Text Search | Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Tue, 08 Aug 2023 08:00:00 +0300

Vector Similarity: Unleashing Data Insights Beyond Traditional Search

When making use of unstructured data, there are traditional go-to solutions that are well-known for developers:

Full-text search when you need to find documents that contain a particular word or phrase.
Vector search when you need to find documents that are semantically similar to a given query.

Sometimes people mix those two approaches, so it might look like the vector similarity is just an extension of full-text search. However, in this article, we will explore some promising new techniques that can be used to expand the use-case of unstructured data and demonstrate that vector similarity creates its own stack of data exploration tools.

Serverless Semantic Search

info@qdrant.tech (Andrey Vasnetsov) — Wed, 12 Jul 2023 10:00:00 +0100

Do you want to insert a semantic search function into your website or online app? Now you can do so - without spending any money! In this example, you will learn how to create a free prototype search engine for your own non-commercial purposes.

Ingredients

A Rust toolchain
cargo lambda (install via package manager, download binary or cargo install cargo-lambda)
The AWS CLI
Qdrant instance (free tier available)
An embedding provider service of your choice (see our Embeddings docs. You may be able to get credits from AI Grant, also Cohere has a rate-limited non-commercial free tier)
AWS Lambda account (12-month free tier available)

What you’re going to build

You’ll combine the embedding provider and the Qdrant instance to a neat semantic search, calling both services from a small Lambda function.

Introducing Qdrant 1.3.0

info@qdrant.tech (Andrey Vasnetsov) — Mon, 26 Jun 2023 00:00:00 +0000

A brand-new Qdrant 1.3.0 release comes packed with a plethora of new features, performance improvements and bux fixes:

Asynchronous I/O interface: Reduce overhead by managing I/O operations asynchronously, thus minimizing context switches.
Oversampling for Quantization: Improve the accuracy and performance of your queries while using Scalar or Product Quantization.
Grouping API lookup: Storage optimization method that lets you look for points in another collection using group ids.
Qdrant Web UI: A convenient dashboard to help you manage data stored in Qdrant.
Temp directory for Snapshots: Set a separate storage directory for temporary snapshots on a faster disk.
Other important changes

Your feedback is valuable to us, and are always tying to include some of your feature requests into our roadmap. Join our Discord community and help us build Qdrant!.

Qdrant under the hood: io_uring

info@qdrant.tech (Andrey Vasnetsov) — Wed, 21 Jun 2023 09:45:00 +0200

With Qdrant version 1.3.0 we introduce the alternative io_uring based async uring storage backend on Linux-based systems. Since its introduction, io_uring has been known to improve async throughput wherever the OS syscall overhead gets too high, which tends to occur in situations where software becomes IO bound (that is, mostly waiting on disk).

Input+Output

Around the mid-90s, the internet took off. The first servers used a process- per-request setup, which was good for serving hundreds if not thousands of concurrent request. The POSIX Input + Output (IO) was modeled in a strictly synchronous way. The overhead of starting a new process for each request made this model unsustainable. So servers started forgoing process separation, opting for the thread-per-request model. But even that ran into limitations.

Product Quantization in Vector Search | Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Tue, 30 May 2023 09:45:00 +0200

Product Quantization Demystified: Streamlining Efficiency in Data Management

Qdrant 1.1.0 brought the support of Scalar Quantization, a technique of reducing the memory footprint by even four times, by using int8 to represent the values that would be normally represented by float32.

The memory usage in vector search might be reduced even further! Please welcome Product Quantization, a brand-new feature of Qdrant 1.2.0!

What is Product Quantization?

Product Quantization converts floating-point numbers into integers like every other quantization method. However, the process is slightly more complicated than Scalar Quantization and is more customizable, so you can find the sweet spot between memory usage and search precision. This article covers all the steps required to perform Product Quantization and the way it’s implemented in Qdrant.

Scalar Quantization: Background, Practices & More | Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Mon, 27 Mar 2023 10:45:00 +0100

Efficiency Unleashed: The Power of Scalar Quantization

High-dimensional vector embeddings can be memory-intensive, especially when working with large datasets consisting of millions of vectors. Memory footprint really starts being a concern when we scale things up. A simple choice of the data type used to store a single number impacts even billions of numbers and can drive the memory requirements crazy. The higher the precision of your type, the more accurately you can represent the numbers. The more accurate your vectors, the more precise is the distance calculation. But the advantages stop paying off when you need to order more and more memory.

On Unstructured Data, Vector Databases, New AI Age, and Our Seed Round.

info@qdrant.tech (Andrey Vasnetsov) — Wed, 19 Apr 2023 00:42:00 +0000

Vector databases are here to stay. The New Age of AI is powered by vector embeddings, and vector databases are a foundational part of the stack. At Qdrant, we are working on cutting-edge open-source vector similarity search solutions to power fantastic AI applications with the best possible performance and excellent developer experience.

Our 7.5M seed funding – led by Unusual Ventures, awesome angels, and existing investors – will help us bring these innovations to engineers and empower them to make the most of their unstructured data and the awesome power of LLMs at any scale.

Using LangChain for Question Answering with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Tue, 31 Jan 2023 10:53:20 +0100

Streamlining Question Answering: Simplifying Integration with LangChain and Qdrant

Building applications with Large Language Models doesn’t have to be complicated. A lot has been going on recently to simplify the development, so you can utilize already pre-trained models and support even complex pipelines with a few lines of code. LangChain provides unified interfaces to different libraries, so you can avoid writing boilerplate code and focus on the value you want to bring.

Minimal RAM you need to serve a million vectors

info@qdrant.tech (Andrey Vasnetsov) — Wed, 07 Dec 2022 10:18:00 +0000

When it comes to measuring the memory consumption of our processes, we often rely on tools such as htop to give us an indication of how much RAM is being used. However, this method can be misleading and doesn’t always accurately reflect the true memory usage of a process.

Question Answering as a Service with Cohere and Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Tue, 29 Nov 2022 15:45:00 +0100

Bi-encoders are probably the most efficient way of setting up a semantic Question Answering system. This architecture relies on the same neural model that creates vector embeddings for both questions and answers. The assumption is, both question and answer should have representations close to each other in the latent space. It should be like that because they should both describe the same semantic concept. That doesn’t apply to answers like “Yes” or “No” though, but standard FAQ-like problems are a bit easier as there is typically an overlap between both texts. Not necessarily in terms of wording, but in their semantics.

Introducing Qdrant 1.2.x

info@qdrant.tech (Andrey Vasnetsov) — Wed, 24 May 2023 10:45:00 +0200

A brand-new Qdrant 1.2 release comes packed with a plethora of new features, some of which were highly requested by our users. If you want to shape the development of the Qdrant vector database, please join our Discord community and let us know how you use it!

New features

As usual, a minor version update of Qdrant brings some interesting new features. We love to see your feedback, and we tried to include the features most requested by our community.

Finding errors in datasets with Similarity Search

info@qdrant.tech (Andrey Vasnetsov) — Mon, 18 Jul 2022 10:18:00 +0000

Nowadays, people create a huge number of applications of various types and solve problems in different areas. Despite such diversity, they have something in common - they need to process data. Real-world data is a living structure, it grows day by day, changes a lot and becomes harder to work with.

In some cases, you need to categorize or label your data, which can be a tough problem given its scale. The process of splitting or labelling is error-prone and these errors can be very costly. Imagine that you failed to achieve the desired quality of the model due to inaccurate labels. Worse, your users are faced with a lot of irrelevant items, unable to find what they need and getting annoyed by it. Thus, you get poor retention, and it directly impacts company revenue. It is really important to avoid such errors in your data.

Q&A with Similarity Learning

info@qdrant.tech (Andrey Vasnetsov) — Tue, 28 Jun 2022 08:57:07 +0000

Question-answering system with Similarity Learning and Quaterion

Many problems in modern machine learning are approached as classification tasks. Some are the classification tasks by design, but others are artificially transformed into such. And when you try to apply an approach, which does not naturally fit your problem, you risk coming up with over-complicated or bulky solutions. In some cases, you would even get worse performance.

Imagine that you got a new task and decided to solve it with a good old classification approach. Firstly, you will need labeled data. If it came on a plate with the task, you’re lucky, but if it didn’t, you might need to label it manually. And I guess you are already familiar with how painful it might be.

Why Rust?

info@qdrant.tech (Andrey Vasnetsov) — Thu, 11 May 2023 10:00:00 +0100

Building Qdrant in Rust

Looking at the github repository, you can see that Qdrant is built in Rust. Other offerings may be written in C++, Go, Java or even Python. So why does Qdrant chose Rust? Our founder Andrey had built the first prototype in C++, but didn’t trust his command of the language to scale to a production system (to be frank, he likened it to cutting his leg off). He was well versed in Java and Scala and also knew some Python. However, he considered neither a good fit:

Layer Recycling and Fine-tuning Efficiency

info@qdrant.tech (Andrey Vasnetsov) — Tue, 23 Aug 2022 13:00:00 +0300

A recent paper by Allen AI has attracted attention in the NLP community as they cache the output of a certain intermediate layer in the training and inference phases to achieve a speedup of ~83% with a negligible loss in model performance. This technique is quite similar to the caching mechanism in Quaterion, but the latter is intended for any data modalities while the former focuses only on language models despite presenting important insights from their experiments. In this post, I will share our findings combined with those, hoping to provide the community with a wider perspective on layer recycling.

Fine Tuning Similar Cars Search

info@qdrant.tech (Andrey Vasnetsov) — Tue, 28 Jun 2022 13:00:00 +0300

Supervised classification is one of the most widely used training objectives in machine learning, but not every task can be defined as such. For example,

Your classes may change quickly —e.g., new classes may be added over time,
You may not have samples from every possible category,
It may be impossible to enumerate all the possible classes during the training time,
You may have an essentially different task, e.g., search or retrieval.

All such problems may be efficiently solved with similarity learning.

Metric Learning Tips & Tricks

info@qdrant.tech (Andrey Vasnetsov) — Sat, 15 May 2021 10:18:00 +0000

How to train object matching model with no labeled data and use it in production

Currently, most machine-learning-related business cases are solved as a classification problems. Classification algorithms are so well studied in practice that even if the original problem is not directly a classification task, it is usually decomposed or approximately converted into one.

However, despite its simplicity, the classification task has requirements that could complicate its production integration and scaling. E.g. it requires a fixed number of classes, where each class should have a sufficient number of training samples.

Metric Learning for Anomaly Detection

info@qdrant.tech (Andrey Vasnetsov) — Wed, 04 May 2022 13:00:00 +0300

Anomaly detection is a thirsting yet challenging task that has numerous use cases across various industries. The complexity results mainly from the fact that the task is data-scarce by definition.

Similarly, anomalies are, again by definition, subject to frequent change, and they may take unexpected forms. For that reason, supervised classification-based approaches are:

Data-hungry - requiring quite a number of labeled data;
Expensive - data labeling is an expensive task itself;
Time-consuming - you would try to obtain what is necessarily scarce;
Hard to maintain - you would need to re-train the model repeatedly in response to changes in the data distribution.

These are not desirable features if you want to put your model into production in a rapidly-changing environment. And, despite all the mentioned difficulties, they do not necessarily offer superior performance compared to the alternatives. In this post, we will detail the lessons learned from such a use case.

Triplet Loss - Advanced Intro

info@qdrant.tech (Andrey Vasnetsov) — Thu, 24 Mar 2022 15:12:00 +0300

What is Triplet Loss?

Triplet Loss was first introduced in FaceNet: A Unified Embedding for Face Recognition and Clustering in 2015, and it has been one of the most popular loss functions for supervised similarity or metric learning ever since. In its simplest explanation, Triplet Loss encourages that dissimilar pairs be distant from any similar pairs by at least a certain margin value. Mathematically, the loss value can be calculated as $L=max(d(a,p) - d(a,n) + m, 0)$, where:

Neural Search 101: A Complete Guide and Step-by-Step Tutorial

info@qdrant.tech (Andrey Vasnetsov) — Thu, 10 Jun 2021 10:18:00 +0000

Neural Search 101: A Comprehensive Guide and Step-by-Step Tutorial

Information retrieval technology is one of the main technologies that enabled the modern Internet to exist. These days, search technology is the heart of a variety of applications. From web-pages search to product recommendations. For many years, this technology didn’t get much change until neural networks came into play.

In this guide we are going to find answers to these questions:

Filterable HNSW

info@qdrant.tech (Andrey Vasnetsov) — Sun, 24 Nov 2019 22:44:08 +0300

If you need to find some similar objects in vector space, provided e.g. by embeddings or matching NN, you can choose among a variety of libraries: Annoy, FAISS or NMSLib. All of them will give you a fast approximate neighbors search within almost any space.

But what if you need to introduce some constraints in your search? For example, you want search only for products in some category or select the most similar customer of a particular brand. I did not find any simple solutions for this. There are several discussions like this, but they only suggest to iterate over top search results and apply conditions consequently after the search.

Introducing Qdrant 0.11

info@qdrant.tech (Andrey Vasnetsov) — Wed, 26 Oct 2022 13:55:00 +0200

We are excited to announce the release of Qdrant v0.11, which introduces a number of new features and improvements.

Replication

One of the key features in this release is replication support, which allows Qdrant to provide a high availability setup with distributed deployment out of the box. This, combined with sharding, enables you to horizontally scale both the size of your collections and the throughput of your cluster. This means that you can use Qdrant to handle large amounts of data without sacrificing performance or reliability.

Qdrant 0.10 released

info@qdrant.tech (Andrey Vasnetsov) — Mon, 19 Sep 2022 13:30:00 +0200

Qdrant 0.10 is a new version that brings a lot of performance improvements, but also some new features which were heavily requested by our users. Here is an overview of what has changed.

Storing multiple vectors per object

Previously, if you wanted to use semantic search with multiple vectors per object, you had to create separate collections for each vector type. This was even if the vectors shared some other attributes in the payload. With Qdrant 0.10, you can now store all of these vectors together in the same collection, which allows you to share a single copy of the payload. This makes it easier to use semantic search with multiple vector types, and reduces the amount of work you need to do to set up your collections.

Vector Search in constant time

info@qdrant.tech (Andrey Vasnetsov) — Sat, 01 Apr 2023 00:48:00 +0000

The advent of quantum computing has revolutionized many areas of science and technology, and one of the most intriguing developments has been its potential application to artificial neural networks (ANNs). One area where quantum computing can significantly improve performance is in vector search, a critical component of many machine learning tasks. In this article, we will discuss the concept of quantum quantization for ANN vector search, focusing on the conversion of float32 to qbit vectors and the ability to perform vector search on arbitrary-sized databases in constant time.

Building Performant, Scaled Agentic Vector Search with Qdrant

info@qdrant.tech (Andrey Vasnetsov) — Sun, 26 Oct 2025 00:00:00 +0000

Overview

AI agents have grown from simple Q&A chatbots into systems that can independently plan, retrieve, act, and verify tasks. As developers work to recreate real-life workflows with agents, a common starting point is to give your agent access to a search API.

The Limitations of Agents

While agents have proven they can create incredible impact, they still face serious limitations without the right tools. This is where a simple search box isn’t enough, and agents often fail when they move from prototype to production in three key areas:

MUVERA: Making Multivectors More Performant

info@qdrant.tech (Andrey Vasnetsov) — Fri, 05 Sep 2025 00:00:00 +0000

What are MUVERA Embeddings?

Multi-vector representations are superior to single-vector embeddings in many benchmarks. It might be tempting to use them right away, but there is a catch: they are slower to search. Traditional vector search structures like HNSW are optimized for retrieving the nearest neighbors of a single query vector using simple metrics such as cosine similarity. These indexes are not suitable for multi-vector retrieval strategies, such as MaxSim, where a query and document are each represented by multiple vectors and the final score is computed as the maximum similarity over all cross-pairings. MaxSim is inherently asymmetric and non-metric, so HNSW could potentially help us find the closest document token to a given query token, but that does not mean the whole document is the best hit for the query.

How to choose an embedding model

info@qdrant.tech (Andrey Vasnetsov) — Tue, 15 Jul 2025 00:00:00 +0000

No matter if you are just beginning your journey in the world of vector search, or you are a seasoned practitioner, you have probably wondered how to choose the right embedding model to achieve the best search quality. There are some public benchmarks, such as MTEB, that can help you narrow down the options, but datasets used in those benchmarks will rarely be representative of your domain-specific data. Moreover, search quality is not the only requirement you could have. For example, some of the best models might be amazingly accurate for retrieval, but you can’t afford to run them, e.g., due to high resource usage or your budget constraints.

Vector Search in Production

info@qdrant.tech (Andrey Vasnetsov) — Wed, 30 Apr 2025 00:00:00 +0000

What Does it Take to Run Search in Production?

A mid-sized e-commerce company launched a vector search pilot to improve product discovery. During testing, everything ran smoothly. But in production, their queries began failing intermittently: memory errors, disk I/O spikes, and search delays sprang up unexpectedly.

It turned out the team hadn’t adjusted the default configuration settings or reserved dedicated paths for write-ahead logs. Their vector index was too large to fit comfortably in RAM, and it frequently spilled to disk, causing slowdowns.

Semantic Cache: Accelerating AI with Lightning-Fast Data Retrieval

info@qdrant.tech (Andrey Vasnetsov) — Tue, 07 May 2024 00:00:00 -0800

What is Semantic Cache?

Semantic cache is a method of retrieval optimization, where similar queries instantly retrieve the same appropriate response from a knowledge base.

Semantic cache differs from traditional caching methods. In computing, cache refers to high-speed memory that efficiently stores frequently accessed data. In the context of vector databases, a semantic cache improves AI application performance by storing previously retrieved results along with the conditions under which they were computed. This allows the application to reuse those results when the same or similar conditions occur again, rather than finding them from scratch.

Full-text filter and index are already available!

info@qdrant.tech (Andrey Vasnetsov) — Wed, 16 Nov 2022 00:00:00 -0800

Qdrant is designed as an efficient vector database, allowing for a quick search of the nearest neighbours. But, you may find yourself in need of applying some extra filtering on top of the semantic search. Up to version 0.10, Qdrant was offering support for keywords only. Since 0.10, there is a possibility to apply full-text constraints as well. There is a new type of filter that you can use to do that, also combined with every other filter type.

Optimizing Semantic Search by Managing Multiple Vectors

info@qdrant.tech (Andrey Vasnetsov) — Wed, 05 Oct 2022 00:00:00 -0800

How to Optimize Vector Storage by Storing Multiple Vectors Per Object

In a real case scenario, a single object might be described in several different ways. If you run an e-commerce business, then your items will typically have a name, longer textual description and also a bunch of photos. While cooking, you may care about the list of ingredients, and description of the taste but also the recipe and the way your meal is going to look. Up till now, if you wanted to enable semantic search with multiple vectors per object, Qdrant would require you to create separate collections for each vector type, even though they could share some other attributes in a payload. However, since Qdrant 0.10 you are able to store all those vectors together in the same collection and share a single copy of the payload!

Mastering Batch Search for Vector Optimization

info@qdrant.tech (Andrey Vasnetsov) — Mon, 26 Sep 2022 00:00:00 -0800

How to Optimize Vector Search Using Batch Search in Qdrant 0.10.0

The latest release of Qdrant 0.10.0 has introduced a lot of functionalities that simplify some common tasks. Those new possibilities come with some slightly modified interfaces of the client library. One of the recently introduced features is the possibility to query the collection with multiple vectors at once — a batch search mechanism.

There are a lot of scenarios in which you may need to perform multiple non-related tasks at the same time. Previously, you only could send several requests to Qdrant API on your own. But multiple parallel requests may cause significant network overhead and slow down the process, especially in case of poor connection speed.