Skip to content

[ENH] add online update to BaggingRegressor#1064

Open
patelchaitany wants to merge 1 commit into
sktime:mainfrom
patelchaitany:enh/bagging-regressor-update
Open

[ENH] add online update to BaggingRegressor#1064
patelchaitany wants to merge 1 commit into
sktime:mainfrom
patelchaitany:enh/bagging-regressor-update

Conversation

@patelchaitany

Copy link
Copy Markdown
Member

Reference Issues/PRs

Partially addresses #1049.

What does this implement/fix? Explain your changes.

Adds online update support to BaggingRegressor.

On update, each fitted bagged clone is updated on a row subsample of the incoming batch, using the same n_samples and bootstrap settings as in fit. Feature subsets cols_[i] from fit are reused (column sampling is not repeated). Sets capability:update=True on the meta-estimator so the public update path runs; meaningful incremental learning still depends on the inner regressor (batch-only inners effectively no-op). When bootstrap=False, subsample size is capped at the update batch size so small batches do not error.

Docstring updated to describe update behaviour.

Does your contribution introduce a new dependency? If yes, which one?

No.

What should a reviewer concentrate their feedback on?

  • Whether _update row subsampling matches fit semantics (including bootstrap=False on small update batches).

Did you add any tests for the change?

No dedicated tests added. Covered by existing test_online_update in test_all_regressors.py for BaggingRegressor via get_test_params().

Any other comments?

Follow-up for #1049 may include EnbpiRegressor and a River wrapper for online inner estimators.

PR checklist

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
For new estimators
  • I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
  • I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
  • If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured dependency isolation, see the estimator dependencies guide.

@fkiraly fkiraly left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

For the algorithm to make sense, in the case where n_samples is an integer, a fraction n_samples / n should be computed in _fit, and applied to the sample in _update.

That is, the row selection probability should remain the same for _fit and _update.

With this insight, it should also be possible to merge the two "resolve size" methods into a single one.

Implement _update so each bagged clone is updated on a row subsample of
the incoming batch (same n_samples and bootstrap as fit) with cols_[i]
fixed from fit. Set capability:update on the meta-estimator.

Partially addresses sktime#1049.
@patelchaitany patelchaitany force-pushed the enh/bagging-regressor-update branch from 1520b1e to 4e1b46f Compare June 8, 2026 07:16
@patelchaitany patelchaitany requested a review from fkiraly June 8, 2026 07:16
@patelchaitany

Copy link
Copy Markdown
Member Author

Hey @fkiraly, I've addressed your review comments! Whenever you get some time, please take another look and let me know if it's good to go. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement module:regression probabilistic regression module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants