🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL
-
Updated
May 18, 2026 - Python
🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL
PyLate efficient inference engine
Official repository of TACHIOM.
Fused Triton kernels for late-interaction (MaxSim) scoring
High-performance late-interaction retrieval engine for on-prem AI. ColBERT/ColPali multi-vector search with Rust fused MaxSim, Triton GPU kernels, ROQ quantization, LEMUR routing, WAL-backed CRUD, and a FastAPI server — single machine, CPU or GPU.
Python library for MUVERA multi-vector retrieval via Fixed Dimensional Encodings. ColBERT / ColQwen2 / ColQwen3.5 compatible.
A physics-grounded, agent-driven digital twin for HP Metal Jet S100 3D printer
MCP server exposing late-interaction (ColBERT-style MaxSim) document retrieval over a local folder — pluggable encoder, runs offline by default.
Serve Jina Embeddings v4 multi-vector (ColBERT-style) multimodal text+image embeddings from a stock vLLM OpenAI server via an out-of-tree plugin, with configurable image fidelity.
ColFastVLM: Towards low-latency indexing in visual document retrieval
Accreted Intelligence — whitepaper + architecture writeup for acc4, an RLM over late-interaction scored-token memory (engine source private)
Repo for portfolio, containing working redirects to all projects.
ColPali-style multimodal late-interaction retrieval (text → document-image patches via MaxSim) — numpy reference implementation
ColBERT-style late-interaction MaxSim scoring for multi-vector retrieval — tiny, zero-dependency Rust crate
🌐 Build and share your personal website with ease using mjsushanth.github.io, a simple and effective static site generator.
Add a description, image, and links to the late-interaction topic page so that developers can more easily learn about it.
To associate your repository with the late-interaction topic, visit your repo's landing page and select "manage topics."