Fix data subset refuter index misalignment#1546
Conversation
…ssue py-why#1372) Signed-off-by: Tasfin Mahmud <tasfinmahmud1@gmail.com>
8646f73 to
98b091b
Compare
|
🤖 This is an automated response from Repo Assist. Welcome to the project, This PR addresses issue #1372 — the Heads-up — overlap with an existing PR: There is an existing Repo Assist PR (#1533) that makes the identical A few suggestions:
Thanks again for the contribution!
|
… with categorical columns (py-why#1372) Signed-off-by: Tasfin Mahmud <tasfinmahmud1@gmail.com>
|
Thanks for the thorough review, @github-actions[bot]! I've addressed all three suggestions:
|
|
@all-contributors please add @TasfinMahmud for code |
|
I've put up a pull request to add @TasfinMahmud! 🎉 |
Preserves index in encoding.py to prevent Unalignable boolean Series errors in distance_matching_estimator.
Closes #1372
Changes
1.
dowhy/utils/encoding.pyreset_index(drop=True)calls inone_hot_encodethat were discarding the original DataFrame index after encoding. This preserves index alignment when the function is called on subsets of data (e.g., by DataSubsetRefuter).2.
dowhy/causal_estimators/distance_matching_estimator.pyestimate_effect()to re-encodeself._observed_common_causesfresh on every call instead of reusing the stale cached version fromfit(). This ensures the encoded data always matches the current (potentially subsetted) DataFrame index, preventing index misalignment even if encoding preserves index correctly.Why both changes?
The
encoding.pyfix alone prevents the immediate crash, but the estimator would still carry a stale cached reference from the fitting phase with the original (pre-subset) index. Re-encoding on eachestimate_effect()call is the more robust fix that handles all refuter scenarios correctly.