Welcome to the QDrant Loader developer documentation! This guide provides everything you need to understand, extend, test, and deploy QDrant Loader. Whether you're contributing to the core project or building custom extensions, you'll find detailed technical information and practical examples here.
- Architecture Guide - System design, components, and data flow
- Extending QDrant Loader - Custom connectors and processors
- Testing Guide - Testing strategies, frameworks, and best practices
- Deployment Guide - Production deployment, containerization, and CI/CD
- Best Practices - Pythonic patterns, AI/RAG guidelines, and PR review checklist
- Documentation Maintenance - Maintaining and updating documentation
QDrant Loader follows a modular architecture designed for multi-project document ingestion and vector storage:
┌─────────────────────────────────────────────────────────────┐
│ QDrant Loader Core │
├─────────────────────────────────────────────────────────────┤
│ Data Sources │ Processing │ Vector Storage │
│ ┌─────────────┐ │ ┌─────────────┐ │ ┌─────────────────┐ │
│ │ Connectors │ │ │ Processors │ │ │ QDrant Client │ │
│ │ - Local │ │ │ - MarkItDown│ │ │ - Collections │ │
│ │ - Git │ │ │ - Text │ │ │ - Vectors │ │
│ │ - Confluence│ │ │ - Chunking │ │ │ - Search │ │
│ │ - Jira │ │ │ - Embedding │ │ │ - Metadata │ │
│ │ - PublicDocs│ │ │ │ │ │ │ │
│ └─────────────┘ │ └─────────────┘ │ └─────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ MCP Server │ CLI Interface │ Configuration │
│ ┌─────────────┐ │ ┌─────────────┐ │ ┌─────────────────┐ │
│ │ Search APIs │ │ │ Commands │ │ │ YAML Config │ │
│ │ - Semantic │ │ │ - init │ │ │ - Multi-project │ │
│ │ - Hierarchy │ │ │ - ingest │ │ │ - Workspace │ │
│ │ - Attachment│ │ │ - config │ │ │ - Environment │ │
│ │ │ │ │ - project │ │ │ - Validation │ │
│ └─────────────┘ │ └─────────────┘ │ └─────────────────┘ │
└─────────────────────────────────────────────────────────────┘
# Clone the repository
git clone https://github.com/martin-papy/qdrant-loader.git
cd qdrant-loader
# Install all workspace packages with development dependencies
# uv automatically creates and manages the virtual environment
uv sync --all-packages --all-extras
# Start QDrant for development
docker run -p 6333:6333 qdrant/qdrant:latest# Run all tests from workspace root
make test
# Run specific package tests
make test-loader
make test-mcp
make test-core
# Run with coverage
make test-coverage
# Or run pytest directly via uv
uv run pytest packages/qdrant-loader/tests/
uv run pytest packages/qdrant-loader-mcp-server/tests/ --cov=src --cov-report=html# From workspace root
make lint
make format
# Or run tools directly via uv
uv run ruff check --fix .
uv run black .
uv run isort .Understanding the data flow is crucial for development:
-
Configuration Phase
- Multi-project workspace configuration
- Global settings and project-specific sources
- Environment variable management
- Validation and initialization
-
Ingestion Phase
- Connectors fetch documents from data sources
- File conversion using MarkItDown library
- Content extraction and cleaning
- Chunking strategies for large documents
- Metadata extraction and enrichment
-
Embedding Phase
- Text content converted to embeddings via configurable LLM providers (OpenAI, Azure OpenAI, Ollama)
- Batch processing for efficiency
- Error handling and retries
- Progress tracking and metrics
-
Storage Phase
- Vectors stored in QDrant collections
- Metadata indexed for filtering
- Project-based organization
- State tracking and change detection
-
Search Phase (MCP Server)
- Semantic similarity search
- Hierarchy-aware search
- Attachment-specific search
- Project filtering and organization
QDrant Loader uses a connector-based architecture for data sources:
# Example connector implementation
from qdrant_loader.connectors.base import BaseConnector
from qdrant_loader.core.document import Document
class CustomConnector(BaseConnector):
async def get_documents(self) -> list[Document]:
"""Get documents from the source."""
documents = []
# Your custom logic here
for item in self.fetch_data():
doc = Document(
content=item.content,
metadata=item.metadata,
source_type="custom",
source_name=self.config.name
)
documents.append(doc)
return documentsAvailable connectors:
LocalFileConnector- Local file systemGitConnector- Git repositoriesConfluenceConnector- Confluence spacesJiraConnector- Jira projectsPublicDocsConnector- Public documentation sites
-
Fork and Clone
git clone https://github.com/your-username/qdrant-loader.git cd qdrant-loader git remote add upstream https://github.com/martin-papy/qdrant-loader.git -
Create Feature Branch
git checkout -b feature/your-feature-name
-
Development Cycle
# Make changes # Run tests make test # Check code quality make lint # Commit changes git commit -m "feat: add new feature"
-
Submit Pull Request
- Ensure all tests pass
- Update documentation
- Add changelog entry
- Request review
-
Create Connector Structure
my-connector/ ├── src/ │ └── my_connector/ │ ├── __init__.py │ ├── connector.py │ └── config.py ├── tests/ └── pyproject.toml -
Implement Connector Interface
from qdrant_loader.connectors.base import BaseConnector from qdrant_loader.config.source_config import SourceConfig class MyConnector(BaseConnector): def __init__(self, config: SourceConfig): super().__init__(config) # Initialize your connector async def get_documents(self) -> list[Document]: # Implement document fetching logic pass
-
Add Configuration Support
from pydantic import BaseModel class MyConnectorConfig(SourceConfig): source_type: str = "my_connector" api_key: str base_url: str # Add your configuration fields
Deep dive into system design, component interactions, and architectural decisions. Essential reading for understanding how QDrant Loader works internally.
Key Topics:
- Multi-project workspace architecture
- Connector and processor interfaces
- Async ingestion pipeline design
- State management and change detection
- MCP server integration
Comprehensive guide for building custom functionality and connectors. Learn how to extend QDrant Loader for your specific needs.
Key Topics:
- Custom connector development
- File conversion extensions
- Configuration schema extensions
- Testing custom components
- Packaging and distribution
Testing strategies, frameworks, and best practices for ensuring code quality and reliability.
Key Topics:
- Unit testing with pytest
- Integration testing strategies
- Async testing patterns
- Mock and fixture usage
- CI/CD integration
Production deployment strategies, containerization, and operational best practices.
Key Topics:
- Docker containerization
- Environment configuration
- Monitoring and logging
- Performance optimization
- Security considerations
# Initialize QDrant collection
qdrant-loader init --workspace .
# Ingest documents
qdrant-loader ingest --workspace .
# View configuration
qdrant-loader config --workspace .
# Project management
qdrant-loader config --workspace .
qdrant-loader config --workspace .
qdrant-loader config --workspace .
# Start MCP server
mcp-qdrant-loader# Enable debug logging
qdrant-loader --log-level DEBUG --workspace . ingest
# Profile performance
qdrant-loader ingest --workspace . --profile
# Memory profiling (requires memory_profiler)
python -m memory_profiler your_script.py# Makefile targets
make test # Run all tests
make lint # Run linting
make format # Format code
make docs # Build documentation
make clean # Clean build artifacts# config.yaml
global:
qdrant:
url: "http://localhost:6333"
collection_name: "my_collection"
llm:
provider: "openai"
base_url: "https://api.openai.com/v1"
api_key: "${LLM_API_KEY}"
models:
embeddings: "text-embedding-3-small"
chat: "gpt-4o-mini"
embeddings:
vector_size: 1536
projects:
- project_id: "docs"
sources:
- source_type: "local_files"
name: "documentation"
config:
base_url: "file://./docs"
include_paths:
- "**/*.md"from qdrant_loader.config import Settings, get_settings
from qdrant_loader.core.async_ingestion_pipeline import AsyncIngestionPipeline
# Load settings
settings = get_settings()
# Create and run pipeline
pipeline = AsyncIngestionPipeline(settings)
await pipeline.run()# The MCP server runs as a separate process
# Start with: mcp-qdrant-loader
# It provides search tools to AI development environments
# Tools available:
# - search_documents
# - search_with_hierarchy
# - search_attachments- All tests pass (
make test) - Code style checks pass (
make lint) - Type checking passes (
mypy) - Documentation updated
- Changelog entry added (if applicable)
- Design document created (for major features)
- Tests cover all code paths
- Documentation includes examples
- Backward compatibility maintained
- Configuration schema updated (if needed)
- Root cause identified
- Regression test added
- Fix verified in multiple environments
- Documentation updated (if needed)
- GitHub Issues - Bug reports and feature requests
- Discussions - Questions and community support
- Documentation - Comprehensive guides and references
- Code Examples - Real-world usage patterns
- Code of Conduct - Be respectful and inclusive
- Issue Templates - Use provided templates for consistency
- Pull Request Process - Follow the established workflow
- Review Process - Participate in code reviews
- Documentation - Keep documentation up to date
- Core Features - Enhanced search capabilities and performance
- Connectors - Additional data source integrations
- Developer Experience - Better tooling and documentation
- Enterprise Features - Advanced security and compliance
Ready to start developing? Choose your path:
- New to QDrant Loader? Start with the Architecture Guide
- Creating connectors? Follow the Extending Guide
- Setting up CI/CD? Use the Deployment Guide
Need help? Join our community discussions or open an issue on GitHub!