vision-encoder

A Gradio-based demonstration for the AllenAI Molmo2-8B multimodal model, enabling image QA, multi-image pointing, video QA, and temporal tracking. Users upload images or videos, provide natural language prompts.

natural-language-processing pillow torch python3 pytorch vqa matplotlib gradio multimodal torchvision huggingface-transformers huggingface-spaces vision-language-model vlms multi-image-understanding vision-encoder allenai molmo2

Updated May 25, 2026
Python

PRITHIVSAKTHIUR / Coral-Health

Star

Coral-Health is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify coral reef images into two health conditions using the SiglipForImageClassification architecture.

health healthy coral coral-reefs huggingface-transformers vision-encoder siglip2 bleached

Updated Apr 28, 2025
Python

PRITHIVSAKTHIUR / FineTuning-MetaCLIP-2

Star

This demonstrates the process of adapting a large scale pretrained model, MetaCLIP 2, for fine tuning a specific downstream task: image classification.

facebook torch pytorch accelerate clip evaluate metaai huggingface-transformers vision-transformer huggingface-models huggingface-datasets metaclip vision-encoder metaclip-2

Updated Nov 15, 2025
Jupyter Notebook

PRITHIVSAKTHIUR / Flood-Image-Detection

Star

Flood-Image-Detection is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-512 for binary image classification. It is trained to detect whether an image contains a flooded scene or non-flooded environment. The model uses the SiglipForImageClassification architecture.

google disaster flood gradio flooding huggingface-transformers vision-transformer vision-encoder siglip2

Updated May 27, 2025
Python

PRITHIVSAKTHIUR / Multilabel-GeoSceneNet

Star

Multilabel-GeoSceneNet is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for multi-label image classification. It is designed to recognize and label multiple geographic or environmental elements in a single image using the SiglipForImageClassification architecture.

map geospatial landscape spaces gradio huggingface-transformers hugging-face siglip vision-encoder siglip2 geoscenenet

Updated Apr 23, 2025
Python

sathishkumar67 / PaliGemma

Star

Implementation of PaliGemma

deeplearning vlm llm siglip vision-encoder

Updated Nov 29, 2024
Python

PRITHIVSAKTHIUR / Multilabel-Portrait-SigLIP2

Star

Multilabel-Portrait-SigLIP2 is a vision-language model fine-tuned from google/siglip2-base-patch16-224 using the SiglipForImageClassification architecture. It classifies portrait-style images into one of the following visual portrait categories:

python google autoencoder image-classification gradio multilabel-classification portraits huggingface-transformers vision-transformer vision-encoder siglip2

Updated Apr 16, 2025
Python

PRITHIVSAKTHIUR / PussyCat-vs-Doggie-SigLIP2

Star

PussyCat-vs-Doggie-SigLIP2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify images as either a cat or a dog using the SiglipForImageClassification architecture.

cat dog prediction classification image-classification gradio huggingface-transformers vision-encoder siglip2

Updated Apr 19, 2025
Python

PRITHIVSAKTHIUR / shoe-type-detection

Star

shoe-type-detection is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-512 for multi-class image classification. It is trained to detect different types of shoes such as Ballet Flats, Boat Shoes, Brogues, Clogs, and Sneakers. The model uses the SiglipForImageClassification architecture.

google type gradio multiclass-classification shoe huggingface-transformers huggingface-models vision-encoder siglip2

Updated Jun 7, 2025
Python

PRITHIVSAKTHIUR / Fashion-Product-Usage

Star

Fashion-Product-Usage is a vision-language model fine-tuned from google/siglip2-base-patch16-224 using the SiglipForImageClassification architecture. It classifies fashion product images based on their intended usage context.

google image-classification season gradio clothing huggingface-transformers vision-transformer vision-encoder wearing-time siglip2

Updated Apr 18, 2025
Python

fritzkeisler / FineTuning-MetaCLIP-2

Star

🌍 Fine-tune MetaCLIP-2 for multilingual image classification on various tasks, enhancing performance with innovative training and data curation methods.

facebook torch pytorch accelerate clip evaluate metaai huggingface-transformers vision-transformer huggingface-models huggingface-datasets metaclip vision-encoder

Updated Jun 16, 2026
Jupyter Notebook

sitammeur / siglip2-litserve

Star

Leverage SigLIP 2's capabilities using LitServe.

python deep-learning transformers artificial-intelligence fastapi lightning-ai zero-shot-image-classification siglip litserve vision-encoder

Updated Feb 28, 2025
Python

morikonon / kazakh-vlm-research

Star

Research project for exploring Kazakh Vision Language Models.

transformers fine-tuning large-language-models vision-language-model low-rank-adaptation vision-encoder

Updated May 6, 2026
Python

Improve this page

Add a description, image, and links to the vision-encoder topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-encoder topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vision-encoder

Here are 16 public repositories matching this topic...

google-deepmind / tips

UCSC-VLAA / OpenVision

rwightman / timme

PRITHIVSAKTHIUR / Molmo2-HF-Demo

PRITHIVSAKTHIUR / Coral-Health

PRITHIVSAKTHIUR / FineTuning-MetaCLIP-2

PRITHIVSAKTHIUR / Flood-Image-Detection

PRITHIVSAKTHIUR / Multilabel-GeoSceneNet

sathishkumar67 / PaliGemma

PRITHIVSAKTHIUR / Multilabel-Portrait-SigLIP2

PRITHIVSAKTHIUR / PussyCat-vs-Doggie-SigLIP2

PRITHIVSAKTHIUR / shoe-type-detection

PRITHIVSAKTHIUR / Fashion-Product-Usage

fritzkeisler / FineTuning-MetaCLIP-2

sitammeur / siglip2-litserve

morikonon / kazakh-vlm-research

Improve this page

Add this topic to your repo