Unified Open-World Segmentation with Multi-Modal Prompts

Yang Liu^1*, Yufei Yin^2*, Chenchen Jing³, Muzhi Zhu¹, Hao Chen¹, Yuling Xi¹, Bo Feng⁴, Hao Wang⁴, Shiyu Li⁴, Chunhua Shen¹

¹Zhejiang University, ²Hangzhou Dianzi University, ³Zhejiang University of Technology, ⁴Apple

Overview

COSINE is a unified open-world segmentation model for open-vocabulary segmentation and in-context segmentation with multi-modal prompts. It uses foundation-model features from the input image and text/visual prompts, then aligns them through a segmentation decoder to predict prompt-specific masks.

This repository is organized for public release and reproduction. Local datasets, checkpoints, and generated outputs should stay outside git under the paths listed below.

Setup

conda create --name cosine python=3.9.17
conda activate cosine

pip install torch==2.0.1 torchvision==0.15.2
git clone https://github.com/facebookresearch/detectron2.git
python -m pip install -e detectron2

pip install -r requirements.txt

Optional DINOv2 speedup:

pip install xformers==0.0.21 torch==2.0.1 torchvision==0.15.2 --extra-index-url https://download.pytorch.org/whl/cu117

Directory Layout

datasets/                 # common datasets for FSS, RefSeg, VOS, training
models/                   # pretrained backbones and COSINE checkpoints
outputs/                  # logs, predictions, visualizations
cosine/                   # shared COSINE implementation package
inference_fsod/datasets/  # FSOD datasets used by detectron2-style configs
inference_fsod/models/    # optional FSOD-local checkpoint links/copies
inference_fsod/outputs/   # FSOD outputs

The scripts default to these relative paths. You can override checkpoint roots in shell scripts with:

WEIGHT_ROOT=/path/to/cosine-weights bash scripts/fss/eval_fss_coco20i.sh

Weights

Download the DINOv2 ViT-L pretrained weight and place it at:

models/dinov2_vitl14_pretrain.pth

COSINE checkpoints are hosted on ModelScope and are expected under models/cosine/ using the public checkpoint directory names. See MODEL_ZOO.md for the checkpoint map.

With ModelScope access, download the release checkpoint files and place the weights/ contents under models/cosine/:

MODELSCOPE_TOKEN=... bash scripts/download_weights_modelscope.sh
bash scripts/check_required_assets.sh --weights-only

The token is optional when your ModelScope CLI is already authenticated.

Evaluation

Task-specific data layouts are documented in each subdirectory:

Few-shot semantic segmentation: inference_fss/EVALUATION.md
Few-shot instance segmentation: inference_fsod/EVALUATION.md
Video object segmentation: inference_vos/EVALUATION.md

The consolidated dataset layout is documented in DATASETS.md. Before running evaluation, check local assets with:

bash scripts/check_required_assets.sh

Representative entry points:

bash scripts/fss/eval_fss_coco20i.sh
bash scripts/refseg/eval_referseg_dist_ms.sh
bash scripts/vos/eval_vos_d17_ms.sh

cd inference_fsod
bash scripts/coco_ms.sh
bash scripts/lvis_ms_fcclip.sh

See REPRODUCTION.md for the current reproduction checklist. The script inventory is tracked in EVALUATION_SCRIPTS.md. The source layout and the role of cosine/ are documented in SOURCE_LAYOUT.md. For a quick functional check before full metrics, use the bounded smoke options recorded in REPRODUCTION.md.

Training

Training commands and dataset preparation notes are in TRAINING.md. The default training scripts use datasets/, models/, and outputs/ unless explicit command-line paths are provided.

License

For academic use, this project is licensed under the 2-clause BSD License. For commercial use, please contact Chunhua Shen.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unified Open-World Segmentation with Multi-Modal Prompts

Overview

Setup

Directory Layout

Weights

Evaluation

Training

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
cosine		cosine
dinov2		dinov2
figs		figs
inference_fsod		inference_fsod
inference_fss		inference_fss
inference_referseg		inference_referseg
inference_vos		inference_vos
scripts		scripts
segment_anything		segment_anything
tools		tools
.gitignore		.gitignore
DATASETS.md		DATASETS.md
EVALUATION_SCRIPTS.md		EVALUATION_SCRIPTS.md
LICENSE		LICENSE
MODEL_ZOO.md		MODEL_ZOO.md
README.md		README.md
REPRODUCTION.md		REPRODUCTION.md
SOURCE_LAYOUT.md		SOURCE_LAYOUT.md
TRAINING.md		TRAINING.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Unified Open-World Segmentation with Multi-Modal Prompts

Overview

Setup

Directory Layout

Weights

Evaluation

Training

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages