<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Data Management on Qdrant - Vector Database</title><link>https://deploy-preview-2138--condescending-goldwasser-91acf0.netlify.app/documentation/data-management/</link><description>Recent content in Data Management on Qdrant - Vector Database</description><generator>Hugo</generator><language>en-us</language><managingEditor>info@qdrant.tech (Andrey Vasnetsov)</managingEditor><webMaster>info@qdrant.tech (Andrey Vasnetsov)</webMaster><atom:link href="https://deploy-preview-2138--condescending-goldwasser-91acf0.netlify.app/documentation/data-management/index.xml" rel="self" type="application/rss+xml"/><item><title>Airbyte</title><link>https://deploy-preview-2138--condescending-goldwasser-91acf0.netlify.app/documentation/data-management/airbyte/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><author>info@qdrant.tech (Andrey Vasnetsov)</author><guid>https://deploy-preview-2138--condescending-goldwasser-91acf0.netlify.app/documentation/data-management/airbyte/</guid><description>&lt;h1 id="airbyte">Airbyte&lt;/h1>
&lt;p>&lt;a href="https://airbyte.com/" target="_blank" rel="noopener nofollow">Airbyte&lt;/a> is an open-source data integration platform that helps you replicate your data
between different systems. It has a &lt;a href="https://docs.airbyte.io/integrations" target="_blank" rel="noopener nofollow">growing list of connectors&lt;/a> that can
be used to ingest data from multiple sources. Building data pipelines is also crucial for managing the data in
Qdrant, and Airbyte is a great tool for this purpose.&lt;/p>
&lt;p>Airbyte may take care of the data ingestion from a selected source, while Qdrant will help you to build a search
engine on top of it. There are three supported modes of how the data can be ingested into Qdrant:&lt;/p></description></item><item><title>Apache Airflow</title><link>https://deploy-preview-2138--condescending-goldwasser-91acf0.netlify.app/documentation/data-management/airflow/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><author>info@qdrant.tech (Andrey Vasnetsov)</author><guid>https://deploy-preview-2138--condescending-goldwasser-91acf0.netlify.app/documentation/data-management/airflow/</guid><description>&lt;h1 id="apache-airflow">Apache Airflow&lt;/h1>
&lt;p>&lt;a href="https://airflow.apache.org/" target="_blank" rel="noopener nofollow">Apache Airflow&lt;/a> is an open-source platform for authoring, scheduling and monitoring data and computing workflows. Airflow uses Python to create workflows that can be easily scheduled and monitored.&lt;/p>
&lt;p>Qdrant is available as a &lt;a href="https://airflow.apache.org/docs/apache-airflow-providers-qdrant/stable/index.html" target="_blank" rel="noopener nofollow">provider&lt;/a> in Airflow to interface with the database.&lt;/p>
&lt;h2 id="prerequisites">Prerequisites&lt;/h2>
&lt;p>Before configuring Airflow, you need:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>A Qdrant instance to connect to. You can set one up in our &lt;a href="https://deploy-preview-2138--condescending-goldwasser-91acf0.netlify.app/documentation/guides/installation/">installation guide&lt;/a>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>A running Airflow instance. You can use their &lt;a href="https://airflow.apache.org/docs/apache-airflow/stable/start.html" target="_blank" rel="noopener nofollow">Quick Start Guide&lt;/a>.&lt;/p></description></item><item><title>Apache Spark</title><link>https://deploy-preview-2138--condescending-goldwasser-91acf0.netlify.app/documentation/data-management/spark/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><author>info@qdrant.tech (Andrey Vasnetsov)</author><guid>https://deploy-preview-2138--condescending-goldwasser-91acf0.netlify.app/documentation/data-management/spark/</guid><description>&lt;h1 id="apache-spark">Apache Spark&lt;/h1>
&lt;p>&lt;a href="https://spark.apache.org/" target="_blank" rel="noopener nofollow">Spark&lt;/a> is a distributed computing framework designed for big data processing and analytics. The &lt;a href="https://github.com/qdrant/qdrant-spark" target="_blank" rel="noopener nofollow">Qdrant-Spark connector&lt;/a> enables Qdrant to be a storage destination in Spark.&lt;/p>
&lt;h2 id="installation">Installation&lt;/h2>
&lt;p>To integrate the connector into your Spark environment, get the JAR file from one of the sources listed below.&lt;/p>
&lt;ul>
&lt;li>GitHub Releases&lt;/li>
&lt;/ul>
&lt;p>The packaged &lt;code>jar&lt;/code> file with all the required dependencies can be found &lt;a href="https://github.com/qdrant/qdrant-spark/releases" target="_blank" rel="noopener nofollow">here&lt;/a>.&lt;/p>
&lt;ul>
&lt;li>Building from Source&lt;/li>
&lt;/ul>
&lt;p>To build the &lt;code>jar&lt;/code> from source, you need &lt;a href="https://www.azul.com/downloads/#zulu" target="_blank" rel="noopener nofollow">JDK@8&lt;/a> and &lt;a href="https://maven.apache.org/" target="_blank" rel="noopener nofollow">Maven&lt;/a> installed. Once the requirements have been satisfied, run the following command in the &lt;a href="https://github.com/qdrant/qdrant-spark" target="_blank" rel="noopener nofollow">project root&lt;/a>.&lt;/p></description></item><item><title>Chonkie</title><link>https://deploy-preview-2138--condescending-goldwasser-91acf0.netlify.app/documentation/data-management/chonkie/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><author>info@qdrant.tech (Andrey Vasnetsov)</author><guid>https://deploy-preview-2138--condescending-goldwasser-91acf0.netlify.app/documentation/data-management/chonkie/</guid><description>&lt;h1 id="chonkie">Chonkie&lt;/h1>
&lt;p>&lt;a href="https://github.com/chonkie-inc/chonkie" target="_blank" rel="noopener nofollow">Chonkie&lt;/a> is a no-nonsense, ultra-light, and lightning-fast chunking library designed for RAG (Retrieval-Augmented Generation) applications.&lt;/p>
&lt;p>Chonkie integrates seamlessly with Qdrant through the &lt;strong>QdrantHandshake&lt;/strong> class, allowing you to chunk, embed, and store text data without ever leaving the Chonkie SDK.&lt;/p>
&lt;h2 id="setup">Setup&lt;/h2>
&lt;p>Install Chonkie with Qdrant support:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">pip install &lt;span class="s2">&amp;#34;chonkie[qdrant]&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="basic-usage">Basic Usage&lt;/h2>
&lt;p>The &lt;code>QdrantHandshake&lt;/code> provides a simple interface for storing and searching chunks:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">chonkie&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">QdrantHandshake&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">SemanticChunker&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Initialize handshake with custom embedding model&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">handshake&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">QdrantHandshake&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">url&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;http://localhost:6333&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">collection_name&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;my_documents&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">embedding_model&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;sentence-transformers/all-MiniLM-L6-v2&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Create and write chunks&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">chunker&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">SemanticChunker&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">chunks&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">chunker&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">chunk&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;Your text content here...&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">handshake&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">write&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">chunks&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Search using natural language&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">results&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">handshake&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">search&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">query&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;your search query&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">limit&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">5&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">for&lt;/span> &lt;span class="n">result&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">results&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">result&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;score&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">: &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">result&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;text&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="qdrant-cloud">Qdrant Cloud&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">handshake&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">QdrantHandshake&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">url&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;https://your-cluster.qdrant.io&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">api_key&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;your-api-key&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">collection_name&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;my_collection&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">embedding_model&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;BAAI/bge-small-en-v1.5&amp;#34;&lt;/span> &lt;span class="c1"># Change to your preferred model&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="complete-rag-pipeline">Complete RAG Pipeline&lt;/h2>
&lt;p>Build end-to-end RAG pipelines using Chonkie&amp;rsquo;s fluent Pipeline API:&lt;/p></description></item><item><title>CocoIndex</title><link>https://deploy-preview-2138--condescending-goldwasser-91acf0.netlify.app/documentation/data-management/cocoindex/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><author>info@qdrant.tech (Andrey Vasnetsov)</author><guid>https://deploy-preview-2138--condescending-goldwasser-91acf0.netlify.app/documentation/data-management/cocoindex/</guid><description>&lt;h1 id="cocoindex">CocoIndex&lt;/h1>
&lt;p>&lt;a href="https://cocoindex.io" target="_blank" rel="noopener nofollow">CocoIndex&lt;/a> is a high performance ETL framework to transform data for AI, with real-time incremental processing.&lt;/p>
&lt;p>Qdrant is available as a native built-in vector database to store and retrieve embeddings.&lt;/p>
&lt;p>Install CocoIndex:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">pip install -U cocoindex
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Install Postgres with &lt;a href="https://docs.docker.com/compose/install/" target="_blank" rel="noopener nofollow">Docker Compose&lt;/a>:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">docker compose -f &amp;lt;&lt;span class="o">(&lt;/span>curl -L https://raw.githubusercontent.com/cocoindex-io/cocoindex/refs/heads/main/dev/postgres.yaml&lt;span class="o">)&lt;/span> up -d
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>CocoIndex is a stateful ETL framework and only processes data that has changed. It uses Postgres as a metadata store to track the state of the data.&lt;/p></description></item><item><title>Confluent Kafka</title><link>https://deploy-preview-2138--condescending-goldwasser-91acf0.netlify.app/documentation/data-management/confluent/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><author>info@qdrant.tech (Andrey Vasnetsov)</author><guid>https://deploy-preview-2138--condescending-goldwasser-91acf0.netlify.app/documentation/data-management/confluent/</guid><description>&lt;p>&lt;img src="https://deploy-preview-2138--condescending-goldwasser-91acf0.netlify.app/documentation/frameworks/confluent/confluent-logo.png" alt="Confluent Logo">&lt;/p>
&lt;p>Built by the original creators of Apache Kafka®, &lt;a href="https://www.confluent.io/confluent-cloud/?utm_campaign=tm.pmm_cd.cwc_partner_Qdrant_generic&amp;amp;utm_source=Qdrant&amp;amp;utm_medium=partnerref" target="_blank" rel="noopener nofollow">Confluent Cloud&lt;/a> is a cloud-native and complete data streaming platform available on AWS, Azure, and Google Cloud. The platform includes a fully managed, elastically scaling Kafka engine, 120+ connectors, serverless Apache Flink®, enterprise-grade security controls, and a robust governance suite.&lt;/p>
&lt;p>With our &lt;a href="https://github.com/qdrant/qdrant-kafka" target="_blank" rel="noopener nofollow">Qdrant-Kafka Sink Connector&lt;/a>, Qdrant is part of the &lt;a href="https://www.confluent.io/partners/connect/" target="_blank" rel="noopener nofollow">Connect with Confluent&lt;/a> technology partner program. It brings fully managed data streams directly to organizations from Confluent Cloud, making it easier for organizations to stream any data to Qdrant with a fully managed Apache Kafka service.&lt;/p></description></item><item><title>DLT</title><link>https://deploy-preview-2138--condescending-goldwasser-91acf0.netlify.app/documentation/data-management/dlt/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><author>info@qdrant.tech (Andrey Vasnetsov)</author><guid>https://deploy-preview-2138--condescending-goldwasser-91acf0.netlify.app/documentation/data-management/dlt/</guid><description>&lt;h1 id="dltdata-load-tool">DLT(Data Load Tool)&lt;/h1>
&lt;p>&lt;a href="https://dlthub.com/" target="_blank" rel="noopener nofollow">DLT&lt;/a> is an open-source library that you can add to your Python scripts to load data from various and often messy data sources into well-structured, live datasets.&lt;/p>
&lt;p>With the DLT-Qdrant integration, you can now select Qdrant as a DLT destination to load data into.&lt;/p>
&lt;p>&lt;strong>DLT Enables&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Automated maintenance - with schema inference, alerts and short declarative code, maintenance becomes simple.&lt;/li>
&lt;li>Run it where Python runs - on Airflow, serverless functions, notebooks. Scales on micro and large infrastructure alike.&lt;/li>
&lt;li>User-friendly, declarative interface that removes knowledge obstacles for beginners while empowering senior professionals.&lt;/li>
&lt;/ul>
&lt;h2 id="usage">Usage&lt;/h2>
&lt;p>To get started, install &lt;code>dlt&lt;/code> with the &lt;code>qdrant&lt;/code> extra.&lt;/p></description></item><item><title>InfinyOn Fluvio</title><link>https://deploy-preview-2138--condescending-goldwasser-91acf0.netlify.app/documentation/data-management/fluvio/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><author>info@qdrant.tech (Andrey Vasnetsov)</author><guid>https://deploy-preview-2138--condescending-goldwasser-91acf0.netlify.app/documentation/data-management/fluvio/</guid><description>&lt;p>&lt;img src="https://deploy-preview-2138--condescending-goldwasser-91acf0.netlify.app/documentation/data-management/fluvio/fluvio-logo.png" alt="Fluvio Logo">&lt;/p>
&lt;p>&lt;a href="https://www.fluvio.io/" target="_blank" rel="noopener nofollow">InfinyOn Fluvio&lt;/a> is an open-source platform written in Rust for high speed, real-time data processing. It is cloud native, designed to work with any infrastructure type, from bare metal hardware to containerized platforms.&lt;/p>
&lt;h2 id="usage-with-qdrant">Usage with Qdrant&lt;/h2>
&lt;p>With the &lt;a href="https://github.com/qdrant/qdrant-fluvio" target="_blank" rel="noopener nofollow">Qdrant Fluvio Connector&lt;/a>, you can stream records from Fluvio topics to Qdrant collections, leveraging Fluvio&amp;rsquo;s delivery guarantees and high-throughput.&lt;/p>
&lt;h3 id="pre-requisites">Pre-requisites&lt;/h3>
&lt;ul>
&lt;li>A Fluvio installation. You can refer to the &lt;a href="https://www.fluvio.io/docs/fluvio/quickstart/" target="_blank" rel="noopener nofollow">Fluvio Quickstart&lt;/a> for instructions.&lt;/li>
&lt;li>Qdrant server to connect to. You can set up a &lt;a href="https://deploy-preview-2138--condescending-goldwasser-91acf0.netlify.app/documentation/quickstart/">local instance&lt;/a> or a free cloud instance at &lt;a href="https://cloud.qdrant.io/" target="_blank" rel="noopener nofollow">cloud.qdrant.io&lt;/a>.&lt;/li>
&lt;/ul>
&lt;h3 id="downloading-the-connector">Downloading the connector&lt;/h3>
&lt;p>Run the following commands after &lt;a href="https://www.fluvio.io/docs/fluvio/quickstart" target="_blank" rel="noopener nofollow">setting up Fluvio&lt;/a>.&lt;/p></description></item><item><title>Redpanda Connect</title><link>https://deploy-preview-2138--condescending-goldwasser-91acf0.netlify.app/documentation/data-management/redpanda/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><author>info@qdrant.tech (Andrey Vasnetsov)</author><guid>https://deploy-preview-2138--condescending-goldwasser-91acf0.netlify.app/documentation/data-management/redpanda/</guid><description>&lt;p>&lt;img src="https://deploy-preview-2138--condescending-goldwasser-91acf0.netlify.app/documentation/data-management/redpanda/redpanda-cover.png" alt="Redpanda Cover">&lt;/p>
&lt;p>&lt;a href="https://www.redpanda.com/connect" target="_blank" rel="noopener nofollow">Redpanda Connect&lt;/a> is a declarative data-agnostic streaming service designed for efficient, stateless processing steps. It offers transaction-based resiliency with back pressure, ensuring at-least-once delivery when connecting to at-least-once sources with sinks, without the need to persist messages during transit.&lt;/p>
&lt;p>Connect pipelines are configured using a YAML file, which organizes components hierarchically. Each section represents a different component type, such as inputs, processors and outputs, and these can have nested child components and &lt;a href="https://docs.redpanda.com/redpanda-connect/configuration/interpolation/" target="_blank" rel="noopener nofollow">dynamic values&lt;/a>.&lt;/p></description></item><item><title>Unstructured</title><link>https://deploy-preview-2138--condescending-goldwasser-91acf0.netlify.app/documentation/data-management/unstructured/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><author>info@qdrant.tech (Andrey Vasnetsov)</author><guid>https://deploy-preview-2138--condescending-goldwasser-91acf0.netlify.app/documentation/data-management/unstructured/</guid><description>&lt;h1 id="unstructured">Unstructured&lt;/h1>
&lt;p>&lt;a href="https://unstructured.io/" target="_blank" rel="noopener nofollow">Unstructured&lt;/a> is a library designed to help preprocess, structure unstructured text documents for downstream machine learning tasks.&lt;/p>
&lt;p>Qdrant can be used as an ingestion destination in Unstructured.&lt;/p>
&lt;h2 id="setup">Setup&lt;/h2>
&lt;p>Install Unstructured with the &lt;code>qdrant&lt;/code> extra.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">pip install &lt;span class="s2">&amp;#34;unstructured-ingest[qdrant]&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="usage">Usage&lt;/h2>
&lt;p>Depending on the use case you can prefer the command line or using it within your application.&lt;/p>
&lt;h3 id="cli">CLI&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">unstructured-ingest &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> &lt;span class="nb">local&lt;/span> &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --input-path &lt;span class="nv">$LOCAL_FILE_INPUT_DIR&lt;/span> &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --chunking-strategy by_title &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --embedding-provider huggingface &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --partition-by-api &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --api-key &lt;span class="nv">$UNSTRUCTURED_API_KEY&lt;/span> &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --partition-endpoint &lt;span class="nv">$UNSTRUCTURED_API_URL&lt;/span> &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --additional-partition-args&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;{\&amp;#34;split_pdf_page\&amp;#34;:\&amp;#34;true\&amp;#34;, \&amp;#34;split_pdf_allow_failed\&amp;#34;:\&amp;#34;true\&amp;#34;, \&amp;#34;split_pdf_concurrency_level\&amp;#34;: 15}&amp;#34;&lt;/span> &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> qdrant-cloud &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --url &lt;span class="nv">$QDRANT_URL&lt;/span> &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --api-key &lt;span class="nv">$QDRANT_API_KEY&lt;/span> &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --collection-name &lt;span class="nv">$QDRANT_COLLECTION&lt;/span> &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --batch-size &lt;span class="m">50&lt;/span> &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --num-processes &lt;span class="m">1&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>For a full list of the options the CLI accepts, run &lt;code>unstructured-ingest &amp;lt;upstream connector&amp;gt; qdrant --help&lt;/code>&lt;/p></description></item></channel></rss>