Yang Liu1*, Yufei Yin2*, Chenchen Jing3, Muzhi Zhu1, Hao Chen1, Yuling Xi1, Bo Feng4, Hao Wang4, Shiyu Li4, Chunhua Shen1
1Zhejiang University, 2Hangzhou Dianzi University, 3Zhejiang University of Technology, 4Apple
COSINE is a unified open-world segmentation model for open-vocabulary segmentation and in-context segmentation with multi-modal prompts. It uses foundation-model features from the input image and text/visual prompts, then aligns them through a segmentation decoder to predict prompt-specific masks.
This repository is organized for public release and reproduction. Local datasets, checkpoints, and generated outputs should stay outside git under the paths listed below.
conda create --name cosine python=3.9.17
conda activate cosine
pip install torch==2.0.1 torchvision==0.15.2
git clone https://github.com/facebookresearch/detectron2.git
python -m pip install -e detectron2
pip install -r requirements.txtOptional DINOv2 speedup:
pip install xformers==0.0.21 torch==2.0.1 torchvision==0.15.2 --extra-index-url https://download.pytorch.org/whl/cu117datasets/ # common datasets for FSS, RefSeg, VOS, training
models/ # pretrained backbones and COSINE checkpoints
outputs/ # logs, predictions, visualizations
cosine/ # shared COSINE implementation package
inference_fsod/datasets/ # FSOD datasets used by detectron2-style configs
inference_fsod/models/ # optional FSOD-local checkpoint links/copies
inference_fsod/outputs/ # FSOD outputs
The scripts default to these relative paths. You can override checkpoint roots in shell scripts with:
WEIGHT_ROOT=/path/to/cosine-weights bash scripts/fss/eval_fss_coco20i.shDownload the DINOv2 ViT-L pretrained weight and place it at:
models/dinov2_vitl14_pretrain.pth
COSINE checkpoints are hosted on ModelScope and are expected under models/cosine/ using the public checkpoint directory names. See MODEL_ZOO.md for the checkpoint map.
With ModelScope access, download the release checkpoint files and place the weights/ contents under models/cosine/:
MODELSCOPE_TOKEN=... bash scripts/download_weights_modelscope.sh
bash scripts/check_required_assets.sh --weights-onlyThe token is optional when your ModelScope CLI is already authenticated.
Task-specific data layouts are documented in each subdirectory:
- Few-shot semantic segmentation: inference_fss/EVALUATION.md
- Few-shot instance segmentation: inference_fsod/EVALUATION.md
- Video object segmentation: inference_vos/EVALUATION.md
The consolidated dataset layout is documented in DATASETS.md. Before running evaluation, check local assets with:
bash scripts/check_required_assets.shRepresentative entry points:
bash scripts/fss/eval_fss_coco20i.sh
bash scripts/refseg/eval_referseg_dist_ms.sh
bash scripts/vos/eval_vos_d17_ms.sh
cd inference_fsod
bash scripts/coco_ms.sh
bash scripts/lvis_ms_fcclip.shSee REPRODUCTION.md for the current reproduction checklist.
The script inventory is tracked in EVALUATION_SCRIPTS.md.
The source layout and the role of cosine/ are documented in SOURCE_LAYOUT.md.
For a quick functional check before full metrics, use the bounded smoke options
recorded in REPRODUCTION.md.
Training commands and dataset preparation notes are in TRAINING.md. The default training scripts use datasets/, models/, and outputs/ unless explicit command-line paths are provided.
For academic use, this project is licensed under the 2-clause BSD License. For commercial use, please contact Chunhua Shen.