SABR: A Stable Adaptive Bitrate Framework Using Behavior Cloning Pretraining and Reinforcement Learning Fine-Tuning
SABR (Stable Adaptive Bitrate) is a learning framework for adaptive bitrate (ABR) decision-making. SABR adopts a two-stage training framework (Pretraining + Fine-tuning):
- Behavior Cloning (BC) Pretraining
- Trained on demonstration data generated by expert policies (e.g., MPC with beam search), using DPO (Direct Preference Optimization) for efficient and stable imitation learning.
- Reinforcement Learning (RL) Fine-tuning
- Starting from the pretrained model, applies PPO (Proximal Policy Optimization) for deeper exploration and policy improvement.
- The simulation environment is implemented following the OpenAI Gym (or Gymnasium) interface standard.
- It is theoretically compatible with various RL algorithms in Stable Baselines3, including: ✅ PPO ✅ SAC ✅ DQN ✅ TD3 ✅ A2C, etc.
- Supports VecEnv parallel simulation environments for improved training efficiency.
- The ABR simulation environment (for training) is implemented in C++ and wrapped as a Python library via pybind11.
- Compared with a pure Python implementation, it offers higher simulation efficiency.
- OS: Ubuntu / CentOS
- Python: 3.10
- Install dependencies:
pip install -r requirements.txt
- Download ABRBench.
- Place the unpacked folder in the project root and keep the name exactly
ABRBench(the path must be./ABRBench). Example:SABR/ ├── ABRBench/ # dataset folder name must be exactly this ├── build_env_c_plus/ ├── config.py ├── train_sabr.py └── ...
See the section above: Dataset: ABRBench.
- In
config.py, set_DATASETto one of:ABRBench-3G, orABRBench-4G+
- In
build_env_c_plus/config.h, setDATASET_OPTIONto:20(for theABRBench-3Gmixed suite), or30(for theABRBench-4G+mixed suite)
Recompile after every change to config.h:
cd build_env_c_plus
bash build_all.shpython train_sabr.pyAfter training, the script will automatically evaluate on each test set and OOD set, and write out the results.
Steps 1/2/3 are the same as SABR (dataset, config, build).
Step 4 — Train:
# Comyco
python train_comyco.py
# Pensieve
python train_pensieve.pyAfter training, each baseline is automatically evaluated on all test and OOD sets.
Included: QUETRA / BOLA / BB / RobustMPC, etc.
- Step 1 is the same as SABR (prepare the dataset).
- Step 2 (important difference):
Rule-based methods cannot select the mixed suitesABRBench-3G/ABRBench-4G+directly.
You must choose a single trace set and test them one by one.- Example for FCC-18:
- In
config.py: set_DATASET = 'FCC-18' - In
build_env_c_plus/config.h: setDATASET_OPTION = 2
- In
- Example for FCC-18:
- Step 3 is the same as SABR (rebuild after
config.hchanges):cd build_env_c_plus bash build_all.sh - Step 4 — Run all rule-based baselines:
bash ex_rule_baseline.sh
These three use future bandwidth and serve as lower bounds or optimal solutions for case studies:
# 1) Beam-search based lower bound
python run_bs_mpc.py bs
# 2) MPC-based lower bound
python run_bs_mpc.py mfd
# 3) Dynamic programming (dp_my): per-case optimal solution (see Pensieve paper)
./dp_myNote:
dp_mymay fail on some trace sets (you can skip it if needed).
After running rule-based (and learning) methods, use the plotting script to compare QoE across schemes. Select which schemes to display via SCHEMES in plot_results.py, then:
python plot_results.py-
Changed
config.hbut nothing happened?
You need to rebuild:cd build_env_c_plus && bash build_all.sh
⚠️ If you changeconfig.h(C++ side) but do not updateconfig.pyaccordingly, your training/evaluation and plotting may use inconsistent QoE parameters. -
Does
plot_results.pyuse QoE parameters fromconfig.py?
Yes.plot_results.pyreads the bitrate levels and rebuffer penalty for the current_DATASET.- These parameters come from the
_DATASET_OPTIONentry inconfig.py. - If you change
VIDEO_BIT_RATEorREBUF_PENALTYinconfig.py, the QoE values computed byplot_results.pywill change accordingly.
- Implementations / baselines:

