Skip to content

aim-uofa/COSINE

Repository files navigation

Unified Open-World Segmentation with Multi-Modal Prompts

Yang Liu1*, Yufei Yin2*, Chenchen Jing3, Muzhi Zhu1, Hao Chen1, Yuling Xi1, Bo Feng4, Hao Wang4, Shiyu Li4, Chunhua Shen1

1Zhejiang University, 2Hangzhou Dianzi University, 3Zhejiang University of Technology, 4Apple

ModelScope Weights

COSINE teaser

Overview

COSINE is a unified open-world segmentation model for open-vocabulary segmentation and in-context segmentation with multi-modal prompts. It uses foundation-model features from the input image and text/visual prompts, then aligns them through a segmentation decoder to predict prompt-specific masks.

This repository is organized for public release and reproduction. Local datasets, checkpoints, and generated outputs should stay outside git under the paths listed below.

Setup

conda create --name cosine python=3.9.17
conda activate cosine

pip install torch==2.0.1 torchvision==0.15.2
git clone https://github.com/facebookresearch/detectron2.git
python -m pip install -e detectron2

pip install -r requirements.txt

Optional DINOv2 speedup:

pip install xformers==0.0.21 torch==2.0.1 torchvision==0.15.2 --extra-index-url https://download.pytorch.org/whl/cu117

Directory Layout

datasets/                 # common datasets for FSS, RefSeg, VOS, training
models/                   # pretrained backbones and COSINE checkpoints
outputs/                  # logs, predictions, visualizations
cosine/                   # shared COSINE implementation package
inference_fsod/datasets/  # FSOD datasets used by detectron2-style configs
inference_fsod/models/    # optional FSOD-local checkpoint links/copies
inference_fsod/outputs/   # FSOD outputs

The scripts default to these relative paths. You can override checkpoint roots in shell scripts with:

WEIGHT_ROOT=/path/to/cosine-weights bash scripts/fss/eval_fss_coco20i.sh

Weights

Download the DINOv2 ViT-L pretrained weight and place it at:

models/dinov2_vitl14_pretrain.pth

COSINE checkpoints are hosted on ModelScope and are expected under models/cosine/ using the public checkpoint directory names. See MODEL_ZOO.md for the checkpoint map.

With ModelScope access, download the release checkpoint files and place the weights/ contents under models/cosine/:

MODELSCOPE_TOKEN=... bash scripts/download_weights_modelscope.sh
bash scripts/check_required_assets.sh --weights-only

The token is optional when your ModelScope CLI is already authenticated.

Evaluation

Task-specific data layouts are documented in each subdirectory:

The consolidated dataset layout is documented in DATASETS.md. Before running evaluation, check local assets with:

bash scripts/check_required_assets.sh

Representative entry points:

bash scripts/fss/eval_fss_coco20i.sh
bash scripts/refseg/eval_referseg_dist_ms.sh
bash scripts/vos/eval_vos_d17_ms.sh

cd inference_fsod
bash scripts/coco_ms.sh
bash scripts/lvis_ms_fcclip.sh

See REPRODUCTION.md for the current reproduction checklist. The script inventory is tracked in EVALUATION_SCRIPTS.md. The source layout and the role of cosine/ are documented in SOURCE_LAYOUT.md. For a quick functional check before full metrics, use the bounded smoke options recorded in REPRODUCTION.md.

Training

Training commands and dataset preparation notes are in TRAINING.md. The default training scripts use datasets/, models/, and outputs/ unless explicit command-line paths are provided.

License

For academic use, this project is licensed under the 2-clause BSD License. For commercial use, please contact Chunhua Shen.

About

[ICCV'25] Unified Open-World Segmentation with Multi-Modal Prompts

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors