Skip to content

feat(rl): add REINFORCE advantage estimator#2083

Open
EazyReal wants to merge 1 commit into
THUDM:mainfrom
EazyReal:reinforce-estimator
Open

feat(rl): add REINFORCE advantage estimator#2083
EazyReal wants to merge 1 commit into
THUDM:mainfrom
EazyReal:reinforce-estimator

Commits

Commits on Jun 15, 2026