SABR: A Stable Adaptive Bitrate Framework Using Behavior Cloning Pretraining and Reinforcement Learning Fine-Tuning

SABR (Stable Adaptive Bitrate) is a learning framework for adaptive bitrate (ABR) decision-making. SABR adopts a two-stage training framework (Pretraining + Fine-tuning):

Behavior Cloning (BC) Pretraining
- Trained on demonstration data generated by expert policies (e.g., MPC with beam search), using DPO (Direct Preference Optimization) for efficient and stable imitation learning.
Reinforcement Learning (RL) Fine-tuning
- Starting from the pretrained model, applies PPO (Proximal Policy Optimization) for deeper exploration and policy improvement.

⚙️ Features

1️⃣ Standardized Gym Interface

The simulation environment is implemented following the OpenAI Gym (or Gymnasium) interface standard.
It is theoretically compatible with various RL algorithms in Stable Baselines3, including: ✅ PPO ✅ SAC ✅ DQN ✅ TD3 ✅ A2C, etc.
Supports VecEnv parallel simulation environments for improved training efficiency.

2️⃣ High-Performance C++ Simulation Core

The ABR simulation environment (for training) is implemented in C++ and wrapped as a Python library via pybind11.
Compared with a pure Python implementation, it offers higher simulation efficiency.

Prerequisites

OS: Ubuntu / CentOS
Python: 3.10
Install dependencies:
```
pip install -r requirements.txt
```

Dataset: ABRBench

Download ABRBench.

Place the unpacked folder in the project root and keep the name exactly ABRBench (the path must be ./ABRBench). Example:

SABR/
├── ABRBench/              # dataset folder name must be exactly this
├── build_env_c_plus/
├── config.py
├── train_sabr.py
└── ...

Train & Evaluate: SABR

Step 1 — Prepare the dataset

See the section above: Dataset: ABRBench.

Step 2 — Configure dataset options

In config.py, set _DATASET to one of:
- ABRBench-3G, or
- ABRBench-4G+
In build_env_c_plus/config.h, set DATASET_OPTION to:
- 20 (for the ABRBench-3G mixed suite), or
- 30 (for the ABRBench-4G+ mixed suite)

Step 3 — Rebuild the C++ environment

Recompile after every change to config.h:

cd build_env_c_plus
bash build_all.sh

Step 4 — Train and auto-evaluate

python train_sabr.py

After training, the script will automatically evaluate on each test set and OOD set, and write out the results.

Learning-based Baselines: Comyco & Pensieve

Steps 1/2/3 are the same as SABR (dataset, config, build).

Step 4 — Train:

# Comyco
python train_comyco.py

# Pensieve
python train_pensieve.py

After training, each baseline is automatically evaluated on all test and OOD sets.

Rule-based Baselines

Included: QUETRA / BOLA / BB / RobustMPC, etc.

Step 1 is the same as SABR (prepare the dataset).
Step 2 (important difference):
Rule-based methods cannot select the mixed suites ABRBench-3G / ABRBench-4G+ directly.
You must choose a single trace set and test them one by one.
- Example for FCC-18:
  - In config.py: set _DATASET = 'FCC-18'
  - In build_env_c_plus/config.h: set DATASET_OPTION = 2
Step 3 is the same as SABR (rebuild after config.h changes):
```
cd build_env_c_plus
bash build_all.sh
```
Step 4 — Run all rule-based baselines:
```
bash ex_rule_baseline.sh
```

Optional: Lower-bound / Oracle-style analyses

These three use future bandwidth and serve as lower bounds or optimal solutions for case studies:

# 1) Beam-search based lower bound
python run_bs_mpc.py bs

# 2) MPC-based lower bound
python run_bs_mpc.py mfd

# 3) Dynamic programming (dp_my): per-case optimal solution (see Pensieve paper)
./dp_my

Note: dp_my may fail on some trace sets (you can skip it if needed).

QoE Results

After running rule-based (and learning) methods, use the plotting script to compare QoE across schemes. Select which schemes to display via SCHEMES in plot_results.py, then:

python plot_results.py

FAQ

Changed config.h but nothing happened?
You need to rebuild:
```
cd build_env_c_plus && bash build_all.sh
```
⚠️ If you change config.h (C++ side) but do not update config.py accordingly, your training/evaluation and plotting may use inconsistent QoE parameters.
Does plot_results.py use QoE parameters from config.py?
Yes.
- plot_results.py reads the bitrate levels and rebuffer penalty for the current _DATASET.
- These parameters come from the _DATASET_OPTION entry in config.py.
- If you change VIDEO_BIT_RATE or REBUF_PENALTY in config.py, the QoE values computed by plot_results.py will change accordingly.

References & Acknowledgments

Implementations / baselines:

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
build_env_c_plus		build_env_c_plus
rl		rl
sim_env		sim_env
utils_tool		utils_tool
.gitignore		.gitignore
README.md		README.md
README_CN.md		README_CN.md
bb.py		bb.py
bola.py		bola.py
config.py		config.py
dev_log.txt		dev_log.txt
ex_rule_baseline.sh		ex_rule_baseline.sh
plot_result.py		plot_result.py
quetra.py		quetra.py
requirements.txt		requirements.txt
run_bs_mpc.py		run_bs_mpc.py
run_rmpc_c_version.py		run_rmpc_c_version.py
test_comyco.py		test_comyco.py
test_pensieve.py		test_pensieve.py
test_ppo_sb.py		test_ppo_sb.py
train_comyco.py		train_comyco.py
train_pensieve.py		train_pensieve.py
train_sabr.py		train_sabr.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SABR: A Stable Adaptive Bitrate Framework Using Behavior Cloning Pretraining and Reinforcement Learning Fine-Tuning

⚙️ Features

1️⃣ Standardized Gym Interface

2️⃣ High-Performance C++ Simulation Core

Prerequisites

Dataset: ABRBench

Train & Evaluate: SABR

Step 1 — Prepare the dataset

Step 2 — Configure dataset options

Step 3 — Rebuild the C++ environment

Step 4 — Train and auto-evaluate

Learning-based Baselines: Comyco & Pensieve

Rule-based Baselines

Optional: Lower-bound / Oracle-style analyses

QoE Results

FAQ

References & Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SABR: A Stable Adaptive Bitrate Framework Using Behavior Cloning Pretraining and Reinforcement Learning Fine-Tuning

⚙️ Features

1️⃣ Standardized Gym Interface

2️⃣ High-Performance C++ Simulation Core

Prerequisites

Dataset: ABRBench

Train & Evaluate: SABR

Step 1 — Prepare the dataset

Step 2 — Configure dataset options

Step 3 — Rebuild the C++ environment

Step 4 — Train and auto-evaluate

Learning-based Baselines: Comyco & Pensieve

Rule-based Baselines

Optional: Lower-bound / Oracle-style analyses

QoE Results

FAQ

References & Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages