feat: Add z-identifiability via surrogate experiments as identification method in identify_ate_effect#1531
feat: Add z-identifiability via surrogate experiments as identification method in identify_ate_effect#1531Omar-Camara wants to merge 4 commits into
Conversation
|
Thanks @Omar-Camara for this PR! A couple quick notes:
Thanks again! |
…on step 5 - Add ZIDIdentifier bridge class (zid_identifier.py) - Converts DoWhy nx.DiGraph to ananke-causal ADMG format - Handles unobserved common-cause nodes and explicit bidirected edge attrs - Lazy import: ananke-causal not required at import time - Add surrogate_nodes: Optional[List[str]] = None to identify_effect_auto and identify_ate_effect (fully backward-compatible) - Insert z-ID as step 5 in identify_ate_effect pipeline - Export ZIDIdentifier from causal_identifier __init__ - Add tests/causal_identifiers/test_zid_identifier.py Implements Bareinboim & Pearl (2012) Theorem 3 via ananke-causal. Validated: 5/5 unit tests, 500/500 rescue benchmark, 600/600 mixed benchmark. Draft pending upstream ananke-causal compatibility PR. Signed-off-by: Omar Camara <omarcamara000@gmail.com> Signed-off-by: Omar Camara <117489682+Username273183@users.noreply.github.com>
Signed-off-by: Omar Camara <117489682+Username273183@users.noreply.github.com>
- Add pyananke >= 0.6.0 as optional dep in pyproject.toml - Add zid extras group: pip install dowhy[zid] - Update ZIDIdentifier import error message to reference dowhy[zid] pyananke is a modernized fork of ananke-causal with Python 3.10-3.12 compatibility fixes and the idz module for z-identifiability. See: https://pypi.org/project/pyananke/ Signed-off-by: Omar Camara <117489682+Username273183@users.noreply.github.com>
Signed-off-by: Omar Camara <117489682+Username273183@users.noreply.github.com>
Hi @emrekiciman — I've addressed all the feedback: DCO: Fixed, all commits signed off An upstream MR to ananke-causal is also open (causal/ananke!69). If that merges, I'll switch the dependency from pyananke to ananke-causal in a follow-up commit. |
|
thanks for adding this @Omar-Camara . Much appreciated. Also, is there a way to avoid dependence on a fork of ananke? |
What this PR does
Adds z-identifiability (Bareinboim & Pearl, 2012) as a new identification strategy — step 5 — in DoWhy's
identify_ate_effectpipeline. A newZIDIdentifierbridge class and asurrogate_nodesparameter toidentify_effect_autoenable identification of causal effects in graphs with hidden confounding where all existing methods fail but a valid surrogate experiment exists.Motivation
DoWhy's current pipeline (backdoor, IV, frontdoor, general adjustment) silently returns
Noneon graphs where there is unblockable hidden confounding between treatment and outcome and no observed variable satisfies any existing criterion. Z-identifiability offers a principled rescue: when a surrogate variableZis available,P(Y | do(X))is identifiable viaΣ_z P(Y|X,Z) P(Z)even when standard identification fails.Empirically: on a balanced benchmark of 600 graphs (200 standard-ID, 200 z-ID rescue, 200 non-identifiable), the existing DoWhy identifier handles 400/600. With this PR it handles all 600, recovering the 200 rescue cases that previously returned
None.Changes
New file:
dowhy/causal_identifier/zid_identifier.pyZIDIdentifierbridge class between DoWhy'snx.DiGraphandpyananke'sADMG_convert_to_ananke(): handles DoWhy's primary latent-confounder encoding (unobserved nodes withobserved="no") and explicit bidirected edge attributes (style="bidirected",bidirected=True,arrowhead="both")identify_effect(): delegates topyananke'sidz_id()oracle (sound and complete per Bareinboim & Pearl 2012 Theorem 3); returnsIdentifiedEstimandon success, raises on failurepyanankeis not imported at module load time; the z-ID step is skipped gracefully if the package is absentModified:
dowhy/causal_identifier/auto_identifier.pysurrogate_nodes: Optional[List[str]] = Noneadded toidentify_effect_autoandidentify_ate_effect— fully backward-compatible, all existing behavior unchanged when omittedidentify_ate_effect: whensurrogate_nodesis provided, attempts z-ID after steps 1–4; result stored asestimands_dict["zid"](always present,Noneif not identifiable or surrogates not provided)Modified:
dowhy/causal_identifier/__init__.pyZIDIdentifierfor direct useModified:
pyproject.tomlpyananke >= 0.6.1as optional dependencyzid = ["pyananke"]extras groupUsage
Or directly:
Validation
zid_surrogateZ_500zid_mixed_benchmarkThe mixed benchmark specifically confirms the bridge does not false-positive on non-identifiable graphs and correctly defers when standard methods already succeed.
Dependency
pyananke >= 0.6.1— a modernized fork ofananke-causalwith Python 3.10–3.12 compatibility fixes and theidzmodule for z-identifiability. Available on PyPI: https://pypi.org/project/pyananke/Install via:
An upstream MR to the original
ananke-causalrepo is also open (causal/ananke!69). If that merges, a follow-up commit will switch the dependency frompyananketoananke-causal.References
Bareinboim, E. & Pearl, J. (2012). Causal Inference by Surrogate Experiments: z-Identifiability. UAI.