[Repo Assist] fix: estimate_effect_naive handles list treatment_variable and uses actual treatment/control values#1553
Draft
github-actions[bot] wants to merge 1 commit into
Conversation
…ctual treatment/control values The estimate_effect_naive method had two bugs: 1. self._target_estimand.treatment_variable is always a list (set via parse_state()). Indexing a DataFrame with a list returns a DataFrame, and data.loc[bool_dataframe] raises ValueError: Cannot index with multidimensional key. 2. The method hardcoded == 1 and == 0 instead of using self._treatment_value and self._control_value, causing incorrect comparisons for non-binary treatments. Fix: - For a single treatment variable, use data[treatment_var[0]] to get a Series before comparison. - For multiple treatment variables, use (data[cols] == value).all(axis=1) to get a boolean Series. - Replace hardcoded 0/1 with self._control_value / self._treatment_value. Closes #416 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
This was referenced May 30, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
This pull request fixes CausalEstimator.estimate_effect_naive so that effect-strength evaluation works when treatment_variable is stored as a list (DoWhy’s internal convention) and so that naive comparisons use the caller’s actual control_value / treatment_value rather than hardcoded 0/1 (closing #416).
Changes:
- Fix boolean masking in
estimate_effect_naivefor single-treatment (list-of-column-name) and add support for multi-treatment masking via row-wise.all(axis=1). - Use
self._control_value/self._treatment_valueinstead of hardcoded0/1when forming naive treatment/control subsets. - Add regression tests intended to cover
evaluate_effect_strength=Truescenarios.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
dowhy/causal_estimator.py |
Fixes naive observational estimate masking for list-based treatment variables and uses actual treatment/control values. |
tests/causal_estimators/test_linear_regression_estimator.py |
Adds regression tests for effect-strength evaluation; one test currently needs adjustment to truly cover non-standard treatment/control values. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+347
to
+348
| # Recode continuous treatment to binary {0, 1} so both control and treatment rows exist | ||
| df[data["treatment_name"][0]] = np.where(df[data["treatment_name"][0]] > 0, 1, 0) |
| target_estimand.set_identifier_method("backdoor") | ||
| estimator = LinearRegressionEstimator(identified_estimand=target_estimand) | ||
| estimator.fit(df) | ||
| ate_estimate = estimator.estimate_effect(df, control_value=0, treatment_value=1) |
41 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🤖 This is an automated PR from Repo Assist, an AI assistant.
Summary
Fixes a bug in
CausalEstimator.estimate_effect_naivethat causedValueError: Cannot index with multidimensional keywhenevaluate_effect_strength=Trueis passed toestimate_effect.Closes #416
Root Cause
self._target_estimand.treatment_variableis always a list (set viaparse_state()inIdentifiedEstimand.__init__). Indexing a pandas DataFrame with a list of column names returns a DataFrame, not a Series. Using that DataFrame as a boolean mask indata.loc[bool_dataframe]raises:Additionally, the method hardcoded
== 1and== 0for treatment/control comparison instead of usingself._treatment_valueandself._control_value, which are always available at this point in the call chain.Fix
Single treatment variable (
len == 1):Multiple treatment variables:
Format check: ✅
poetry run poe format_checkpassesLint: ✅
poetry run poe lintpasses