Welcome to the FiftyOne Plugins ecosystem! 🚀
Here you’ll discover cutting-edge research, state-of-the-art models, and powerful add-ons that unlock new FiftyOne workflows.
FiftyOne plugins allow you to extend and customize the functionality of the core tool to suit your specific needs. From advanced computer vision models to integrations with other popular AI tools, this curated collection of plugins will transform FiftyOne into your bespoke visual AI development workbench.
by Burhan-Q
NVIDIA LocateAnything-3B is an open-vocabulary grounding VLM from the Eagle family, supporting object detection, phrase grounding, pointing, scene-text/OCR localization, document layout, and GUI element grounding across images and video.
Community,model,vlm,Model

by harpreetsahota
A FiftyOne Panel for modular node-based workflows which takes inspiration from ComfyUI.
Community,visualization

by harpreetsahota
A FiftyOne zoo model integration for Qwen3-VL that enables comprehensive video understanding with multiple label types in a single forward pass and for computing video embeddings.
Community,model,vlm,Model

by danielgural
search through your video datasets using FiftyOne Brain and Twelve Labs!
Community,model,search

by voxel51
Utilities for integrating FiftyOne with annotation tools
Voxel51,annotation

by voxel51
Utilities for working with the FiftyOne Brain
Voxel51,curation,visualization

by voxel51
Create your own custom dashboards from within the App
Voxel51,visualization

by voxel51
A collection of import/export utilities
Voxel51,io

by voxel51
Utilities working with FiftyOne database indexes
Voxel51,utils
by voxel51
Utilities for managing and building FiftyOne plugins
Voxel51,utils

by voxel51
Utilities for managing your delegated operations
Voxel51,utils

by voxel51
Utilities for managing your custom runs
Voxel51,utils

by voxel51
Call your favorite SDK utilities from the App
Voxel51,utils

by voxel51
Download datasets and run inference with models from the FiftyOne Zoo, all without leaving the App
Voxel51,model,dataset
by jacobmarks
Play YouTube videos in the FiftyOne App!
Community,visualization
by Burhan-Q
ScreenParser is a YOLO11-L detector fine-tuned by the docling-project on ~1.45M screenshots to localize 55 UI element classes (buttons, tables, navigation bars, text inputs, icons, etc.) in application and web screenshots.
Community,model,detection

by Burhan-Q
Run inference using an online vLLM instance for image captioning, classification, object detection, VQA, and OCR.
Community,model,vlm

by Burhan-Q
Play the classic DOOM (1993) shareware game directly within the FiftyOne App.
Community,examples

by Burhan-Q
Label images with OpenAI vision models for classification, detection, captioning, VQA, and OCR via the Responses API with Pydantic-validated structured output.
Community,model,vlm
by Burhan-Q
Augment dataset samples with image-corruption artifacts\: pixel sorting, block corruption, channel shifting, scan-line noise, frame tearing, and more.
Community,data,curation

by harpreetsahota
CRADIOv4 performs visual feature extraction whose image embeddings can be used by a downstream model for various tasks. This implementation also produces attention maps.
Community,model,embeddings,Model

by harpreetsahota
Import your LeRobot format dataset into FiftyOne format
Community,io

by harpreetsahota
Nomic Embed Multimodal is a family of vision language models built on Qwen2.5-VL that generates high-dimensional embeddings for both images and text in a shared vector space.
Community,model,embeddings,Model

by harpreetsahota
Integrating Online Video Depth Anything (oVDA) for a temporally-consistent monocular depth estimator for videos that runs in an online setting with low VRAM consumption.
Community,model,depth,Model

by voxel51
Track model training experiments on your FiftyOne datasets with MLflow!
Voxel51,training
by AdonaiVera
This plugin integrates Google Gemini's multimodal vision models (e.g., gemini-2.5-flash) into your FiftyOne workflows. Prompt with text and one or more images; receive a text response grounded in visual inputs
Community,model,vlm,Model

by harpreetsahota
Jina Embeddings v4 is a state-of-the-art vision language model that generates embeddings for both images and text in a shared vector space.
Community,model,embeddings,Model

by landingai
Parse, extract, and split documents using LandingAI's Agentic Document Extraction (ADE) API. Converts PDFs, images, spreadsheets, and Office files into structured Markdown with spatial bounding box grounding.
Community,ocr,model,text

by ardamamur
EgoExOR is an Operating Room dataset fusing egocentric and exocentric perspectives for surgical procedures. See here to load it with FiftyOne.
Community,dataset,medical

by danielgural
Find the optimal confidence threshold for your detection models automatically!
Community,evaluation

by harpreetsahota
Moondream 3 (Preview) is an vision language model with a mixture-of-experts architecture (9B total parameters, 2B active). This model makes no compromises, delivering state-of-the-art visual reasoning while still retaining our efficient and deployment-friendly ethos.
Community,model,vlm,Model

by sherpan
Fine-tune pretrained torchvision backbones (ResNet-50, EfficientNet-B2, MobileNetV3) on FiftyOne datasets with Classification labels and run inference directly from the App.
Community,model,training

by parva101
Convert YouTube URLs or local videos into FiftyOne image datasets with uniform/scene-change/hybrid frame sampling, perceptual deduplication, and source metadata.
Community,utils

by harpreetsahota
A FiftyOne plugin for generating synthetic samples for datasets in COCO4GUI format
Community,model,vlm

by harpreetsahota
Implementing NVLabs C-RADIOv3 Embeddings Model as Remotely Sourced Zoo Model for FiftyOne
Community,model,embeddings,Model

by madave94
Tackle noisy annotation! Find and analyze annotation issues in datasets with multiple annotators per image.
Community,annotation

by voxel51
Load and explore the MOSE complex video object segmentation dataset via the FiftyOne Zoo.
Community,dataset,video,Dataset
by voxel51
Visualize Rerun data files (.rrd) inside the FiftyOne App
Voxel51,visualization

by ehofesmann
Explore hyperspectral image datasets, interactively visualize pixel-level spectra, and dynamically recolor images.
Community,visualization,hyperspectral

by harpreetsahota
Implementing the COCO4GUI dataset type in FiftyOne with importers and exports
Community,io

by harpreetsahota
Qwen3-VL-Embedding maps text, images, and video into a unified representation space, enabling powerful cross-modal retrieval and understanding.
Community,model,embeddings,Model

by harpreetsahota
Implementing Qwen3.5VL as a Remote Source Zoo Model for FiftyOne.
Community,model,Model

by Burhan-Q
Google's Gemma 4 multimodal family as a Remote Zoo Model, supporting image and video operations including detection, classification, VQA, OCR, and temporal localization.
Community,model,vlm,video,Model

by voxel51
Load and explore the DAVIS-2017 video segmentation dataset via the FiftyOne Zoo.
Community,dataset,video,Dataset

by perceptron-ai-inc
Isaac-0.2 is Perceptron AI's hybrid-reasoning vision language model supporting object detection, keypoint detection, OCR, instance segmentation, visual question answering, and UI understanding. Includes thinking and tool use for improving detection in complex scenes.
Community,model,vlm,Model

by harpreetsahota
Integrating MolmoPoint a model that locates and tracks objects in images and videos by pointing and returning precise pixel coordinates
Community,model,tracking,Model
by jacobmarks
Test out any Albumentations data augmentation transform with FiftyOne!
Community,data

by harpreetsahota
LightOnOCR-2-1B is a compact multilingual VLM that converts document images into clean, naturally ordered text without brittle multi-stage OCR pipelines.
Community,model,vlm,Model

by harpreetsahota
A plugin that intelligently displays and formats vision language model outputs and text fields. Perfect for viewing OCR results, receipt analysis, document processing, and any text-heavy computer vision workflows.
Community,visualization,vlm

by harpreetsahota
Chat-based image editing powered by HuggingFace image-to-image Inference API.
Community,model,huggingface
by jacobmarks
Run zero-shot (open vocabulary) prediction on your data!
Community,model

by harpreetsahota
Experiment with any VLM that can be run in a Hugging Face image-text-to-text pipeline right in the FiftyOne App!
Community,model,vlm

by harpreetsahota
This plugin provides five text evaluation metrics for comparing predictions against ground truth\: ANLS, Exact Match, Normalized Similarity, Character Error Rate, and Word Error Rate.
Community,model,evaluation,text

by harpreetsahota
FiftyOne Remotely Sourced Zoo Model integration for Moonshot AI's Kimi-VL-A3B models enabling object detection, keypoint localization, and image classification with strong GUI and document understanding capabilities.
Community,model,vlm,Model
by jacobmarks
Find common image quality issues in your datasets
Community,curation

by harpreetsahota
Chat-based image editing powered by drbaph/Qwen-Image-Edit-2511-FP8
Community,model

by harpreetsahota
A FiftyOne Remotely Sourced Zoo Model integration for Google's SigLIP2 model enabling natural language search across images in your FiftyOne Dataset
Community,model,vlm,Model

by allenleetc
Adjust image brightness and contrast and filter semantic masks by class in a sample detail view!
Community,curation

by mgustineli
Tile images into a configurable grid of ROI patches with adjustable overlap for region-based analysis, using FiftyOne's native patches view.
Community,curation

by vlm-run
Extract structured data from visual and audio sources including documents, images, and videos
Community,model,vlm

by harpreetsahota
Molmo2 is a family of open vision language models developed by the Allen Institute for AI (Ai2) that support image, video, and multi-image understanding and grounding.
Community,model,vlm,Model

by harpreetsahota
Compute embeddings for video using Facebook Hiera Models
Community,model,video

by harpreetsahota
Integration of Meta's SAM3 (Segment Anything Model 3) into FiftyOne, with full support of text prompts, keypoint prompts, bounding box prompts, auto segmentation, and image embeddings.
Community,model,segmentation,Model

by harpreetsahota
Integrating FastVLM as a Remote Source Zoo Model for FiftyOne
Community,model,vlm,Model
by harpreetsahota
A plugin to fine-tune Hugging Face models on your FiftyOne Dataset.
Community,model,huggingface

by mmoollllee
Tile your high resolution images to squares for training small object detection models
Community,visualization

by harpreetsahota
GLM-OCR is a lightweight 0.9B vision language model achieving state-of-the-art document understanding, including formula recognition, table recognition, and structured information extraction.
Community,model,vlm,Model

by harpreetsahota
Implementing Microsoft's GUI Actor as a Remote Zoo Model for FiftyOne
Community,model,vlm,Model
by jacobmarks
Caption all your images with state of the art vision language models!
Community,model,vlm

by harpreetsahota
Implementing MedGemma 1.5 as a Remote Zoo Model for FiftyOne
Community,model,medical,Model

by harpreetsahota
This plugin connects FiftyOne datasets with Weights & Biases to enable reproducible, data-centric ML workflows.
Community,utils,evaluation

by harpreetsahota
Isaac-0.1 is the first in Perceptron AI's family of models built to be the intelligence layer for the physical world. This integration supports various computer vision tasks including object detection, classification, OCR, visual question answering, and more.
Community,model,vlm,Model

by harpreetsahota
SHARP is Apple's state-of-the-art model for predicting 3D Gaussian Splats from a single RGB image. This integration brings SHARP to FiftyOne, enabling batch inference on image datasets with 3D visualization.
Community,model,3d,Model
by jacobmarks
Accelerate your data labeling with Active Learning!
Community,annotation
by swheaton
Anonymize/blur images based on a FiftyOne Detections field.
Community,curation
by allenleetc
Plotly-based Map Panel with adjustable marker cosmetics!
Community,visualization

by danielgural
Find the clusters in your data using some of the best algorithms available!
Community,curation
by jacobmarks
Find the images in your dataset most similar to an image from filesystem or the internet!
Community,curation

by voxel51
Push FiftyOne datasets to the Hugging Face Hub, and load datasets from the Hub into FiftyOne!
Voxel51,dataset,huggingface

by voxel51
Run inference on your datasets using Hugging Face Transformers models!
Voxel51,model,huggingface

by brimoor
Load your PDF documents into FiftyOne as per-page images
Community,io

by harpreetsahota
MinerU2.5 is a 1.2B-parameter vision language model for efficient high-resolution document parsing. This model can support grounding OCR as well as free text OCR.
Community,model,vlm,Model

by AdonaiVera
Improve VLM training data quality with state-of-the-art dataset pruning and quality techniques
Community,model,vlm,curation

by harpreetsahota
Implementing Llama-3.1-Nemotron-Nano-VL-8B-V1 as a Remote Zoo Model for FiftyOne
Community,model,vlm,Model

by allenleetc
Compare two object detection models!
Community,evaluation

by harpreetsahota
Implemeting Meta AI's VGGT as a FiftyOne Remote Zoo Model
Community,model,3d,Model

by harpreetsahota
Implementing MedSigLIP as a Remote Zoo Model for FiftyOne
Community,model,medical,Model

by harpreetsahota
Nanonets-OCR2 transforms documents into structured markdown with intelligent content recognition and semantic tagging, making it ideal for downstream processing by Large Language Models (LLMs).
Community,model,ocr,Model

by harpreetsahota
olmOCR-2 is a state-of-the-art OCR model built on Qwen2.5-VL architecture that extracts text from document images with high accuracy.
Community,model,ocr,Model

by harpreetsahota
DeepSeek-OCR is a vision language model designed for optical character recognition with a focus on "contextual optical compression."
Community,model,vlm,Model
by jacobmarks
Perform semantic search on text in your documents!
Community,search

by harpreetsahota
Kosmos-2.5 excels at two core tasks\: generating spatially-aware text blocks (OCR) and producing structured markdown output from images.
Community,model,ocr,Model

by harpreetsahota
Implementing MedGemma as a Remote Zoo Model for FiftyOne
Community,model,medical,Model

by harpreetsahota
BiModernVBert is a vision language model built on the ModernVBert architecture that generates embeddings for both images and text in a shared 768-dimensional vector space.
Community,model,embeddings,Model

by harpreetsahota
ColModernVBert is a multi-vector vision language model built on the ModernVBert architecture that generates ColBERT-style embeddings for both images and text.
Community,model,embeddings,Model

by harpreetsahota
ColQwen2.5 is a vision language model based on Qwen2.5-VL-3B-Instruct that generates ColBERT-style multi-vector representations for efficient document retrieval. This version takes dynamic image resolutions (up to 768 image patches) and doesn't resize them, preserving aspect ratios for better accuracy.
Community,model,embeddings,Model
by AdonaiVera
Load and explore the BDDOIA Safe/Unsafe Action dataset via the FiftyOne Zoo
Community,dataset,Dataset

by harpreetsahota
Implementing UI-TARS-1.5 as a Remote Zoo Model for FiftyOne
Community,model,vlm,Model
by AdonaiVera
A comprehensive FiftyOne plugin for testing and evaluating multiple vision langugage models with dynamic prompts and built-in evaluation capabilities
Community,vlm,evaluation

by harpreetsahota
ColPali is a vision language model based on PaliGemma-3B that generates ColBERT-style multi-vector representations for efficient document retrieval.
Community,model,embeddings,Model

by harpreetsahota
Implementing PaliGemma-2-Mix as a Remote Zoo Model for FiftyOne
Community,model,vlm,Model

by harpreetsahota
Integrating MiniCPM-V 4.5 as a Remote Source Zoo Model in FiftyOne
Community,model,vlm,Model

by jacobmarks
Create and test multimodal RAG pipelines with LlamaIndex, Milvus, and FiftyOne!
Community,search,embeddings

by jacobmarks
Find the images in your dataset most similar to an audio file!
Community,audio

by harpreetsahota
Implementing NVIDIA NeMo Retriever Parse as a FiftyOne Plugin
Community,model,ocr
by jacobmarks
Cluster your images using embeddings with FiftyOne and scikit-learn!
Community,curation

by harpreetsahota
Run ViTPose Models from Hugging Face on your FiftyOne Dataset
Community,model,pose

by harpreetsahota
Moondream2 implementation as a remotely sourced zoo model for FiftyOne
Community,model,vlm,Model

by harpreetsahota
Implementing Florence2 as a Remote Zoo Model for FiftyOne
Community,model,vlm,Model

by harpreetsahota
Integrating ShowUI into FiftyOne as a Remote Source Zoo Model
Community,model,vlm,Model

by harpreetsahota
Implementing MiMo-VL as a Remote Zoo Model for FiftyOne
Community,model,vlm,Model

by harpreetsahota
Integrating OS-Atlas Base into FiftyOne as a Remote Source Zoo Model
Community,model,vlm,Model
by jacobmarks
Ask (and answer) open-ended visual questions about your images!
Community,model,vqa

by segmentsai
Integrate FiftyOne with the Segments.ai annotation tool!
Community,annotation
by ehofesmann
Edit attributes of your labels directly in the FiftyOne App!
Community,annotation

by harpreetsahota
Implementing Qwen2.5-VL as a Remote Zoo Model for FiftyOne
Community,model,vlm,Model
by jacobmarks
Run optical character recognition with PyTesseract!
Community,model,ocr

by danielgural
Import your audio datasets as spectograms into FiftyOne!
Community,audio,visualization

by harpreetsahota
A FiftyOne Remotely Sourced Zoo Model integration for LlamaIndex's VDR model enabling natural language search across document images, screenshots, and charts in your datasets.
Community,model,ocr,Model
by jacobmarks
Find exact and approximate duplicates in your dataset!
Community,curation
by jacobmarks
Semantically search emojis and copy to clipboard!
Community,examples

by harpreetsahota
Run the Janus Pro Models from Deepseek on your Fiftyone Dataset
Community,model,vlm

by harpreetsahota
Perfom zero-shot metric monocular depth estimation using the Apple Depth Pro model
Community,model,depth

by danielgural
Find those troublesome outliers in your dataset automatically!
Community,curation
by jacobmarks
Add synthetic data from prompts with text-to-image models and FiftyOne!
Community,model,vlm
by jacobmarks
Navigate concept space with CLIP, vector search, and FiftyOne!
Community,embeddings
by jacobmarks
Find images that best interpolate between two text-based extremes!
Community,curation
by jacobmarks
Chat with your images using GPT-4 Vision!
Community,model,vlm

by mmoollllee
Compute datetime-related fields (sunrise, dawn, evening, weekday, ...) from your samples' filenames or creation dates
Community,curation
by jacobmarks
Perform keyword search on a specified field!
Community,search

by danielgural
Bring images to life with image to video!
Community,video
by jacobmarks
on two numeric ranges simultaneously!
Community,search
by ehofesmann
Filter a field of your FiftyOne dataset by one or more values.
Community,search
by wayofsamu
Visualize x,y-Points as a line chart.
Community,visualization
by jacobmarks
Automate data ingestion with Twilio!
Community,data

by 51labs
panel listing all the available FiftyOne Labs features
Labs,ml,utils

by 51labs
image model to video dataset using torch dataloader
Labs,ml,video

by 51labs
few-shot learning with multiple model types
Labs,ml,classification

by 51labs
Labels across frames of a video
Labs,ml,video,segmentation

by 51labs
Boxes Fusion for detections
Labs,ml,detection

by 51labs
image segmentation via prompts
Labs,ml,segmentation

by 51labs
coreset selection (ZCore) for unlabeled image data
Labs,ml
Note
Community plugins are external projects maintained by their respective authors. They are not part of FiftyOne core and may change independently. Please review each plugin’s documentation and license before use.