Skip to content

feat: Add z-identifiability via surrogate experiments as identification method in identify_ate_effect#1531

Open
Omar-Camara wants to merge 4 commits into
py-why:mainfrom
Omar-Camara:main
Open

feat: Add z-identifiability via surrogate experiments as identification method in identify_ate_effect#1531
Omar-Camara wants to merge 4 commits into
py-why:mainfrom
Omar-Camara:main

Conversation

@Omar-Camara
Copy link
Copy Markdown

@Omar-Camara Omar-Camara commented May 18, 2026

What this PR does

Adds z-identifiability (Bareinboim & Pearl, 2012) as a new identification strategy — step 5 — in DoWhy's identify_ate_effect pipeline. A new ZIDIdentifier bridge class and a surrogate_nodes parameter to identify_effect_auto enable identification of causal effects in graphs with hidden confounding where all existing methods fail but a valid surrogate experiment exists.

Motivation

DoWhy's current pipeline (backdoor, IV, frontdoor, general adjustment) silently returns None on graphs where there is unblockable hidden confounding between treatment and outcome and no observed variable satisfies any existing criterion. Z-identifiability offers a principled rescue: when a surrogate variable Z is available, P(Y | do(X)) is identifiable via Σ_z P(Y|X,Z) P(Z) even when standard identification fails.

Empirically: on a balanced benchmark of 600 graphs (200 standard-ID, 200 z-ID rescue, 200 non-identifiable), the existing DoWhy identifier handles 400/600. With this PR it handles all 600, recovering the 200 rescue cases that previously returned None.

Changes

New file: dowhy/causal_identifier/zid_identifier.py

  • ZIDIdentifier bridge class between DoWhy's nx.DiGraph and pyananke's ADMG
  • _convert_to_ananke(): handles DoWhy's primary latent-confounder encoding (unobserved nodes with observed="no") and explicit bidirected edge attributes (style="bidirected", bidirected=True, arrowhead="both")
  • identify_effect(): delegates to pyananke's idz_id() oracle (sound and complete per Bareinboim & Pearl 2012 Theorem 3); returns IdentifiedEstimand on success, raises on failure
  • Fully lazy import — pyananke is not imported at module load time; the z-ID step is skipped gracefully if the package is absent

Modified: dowhy/causal_identifier/auto_identifier.py

  • surrogate_nodes: Optional[List[str]] = None added to identify_effect_auto and identify_ate_effect — fully backward-compatible, all existing behavior unchanged when omitted
  • Step 5 in identify_ate_effect: when surrogate_nodes is provided, attempts z-ID after steps 1–4; result stored as estimands_dict["zid"] (always present, None if not identifiable or surrogates not provided)

Modified: dowhy/causal_identifier/__init__.py

  • Exported ZIDIdentifier for direct use

Modified: pyproject.toml

  • Added pyananke >= 0.6.1 as optional dependency
  • Added zid = ["pyananke"] extras group

Usage

from dowhy.causal_identifier.auto_identifier import identify_effect_auto, EstimandType

# Existing call signature unchanged — surrogate_nodes is optional
estimand = identify_effect_auto(
    graph,
    action_nodes=["X"],
    outcome_nodes=["Y"],
    observed_nodes=observed,
    estimand_type=EstimandType.NONPARAMETRIC_ATE,
    surrogate_nodes=["Z"],  # new — omit to preserve existing behavior exactly
)

# All existing keys still populated as before
estimand.estimands["backdoor"]          # None if standard-ID failed
estimand.estimands["zid"]              # IdentifiedEstimand if z-identifiable, else None
estimand.estimands["zid"].backdoor_variables  # surrogate adjustment set, e.g. ["Z"]

Or directly:

from dowhy.causal_identifier import ZIDIdentifier

zid = ZIDIdentifier(graph, ["X"], ["Y"], surrogate_nodes=["Z"])
estimand = zid.identify_effect()  # raises if not z-identifiable

Validation

Benchmark Description N Result
Unit tests True/False/edge cases, multi-child latents, explicit bidirected attrs 5 5/5
zid_surrogateZ_500 Rescue-only (std-ID fails, z-ID succeeds) 500 500/500
zid_mixed_benchmark Balanced: std-ID / rescue / non-identifiable 600 600/600

The mixed benchmark specifically confirms the bridge does not false-positive on non-identifiable graphs and correctly defers when standard methods already succeed.

Dependency

pyananke >= 0.6.1 — a modernized fork of ananke-causal with Python 3.10–3.12 compatibility fixes and the idz module for z-identifiability. Available on PyPI: https://pypi.org/project/pyananke/

Install via:

pip install dowhy[zid]

An upstream MR to the original ananke-causal repo is also open (causal/ananke!69). If that merges, a follow-up commit will switch the dependency from pyananke to ananke-causal.

References

Bareinboim, E. & Pearl, J. (2012). Causal Inference by Surrogate Experiments: z-Identifiability. UAI.

@emrekiciman
Copy link
Copy Markdown
Member

Thanks @Omar-Camara for this PR!

A couple quick notes:

Thanks again!

…on step 5

- Add ZIDIdentifier bridge class (zid_identifier.py)

  - Converts DoWhy nx.DiGraph to ananke-causal ADMG format

  - Handles unobserved common-cause nodes and explicit bidirected edge attrs

  - Lazy import: ananke-causal not required at import time

- Add surrogate_nodes: Optional[List[str]] = None to identify_effect_auto

  and identify_ate_effect (fully backward-compatible)

- Insert z-ID as step 5 in identify_ate_effect pipeline

- Export ZIDIdentifier from causal_identifier __init__

- Add tests/causal_identifiers/test_zid_identifier.py

Implements Bareinboim & Pearl (2012) Theorem 3 via ananke-causal.

Validated: 5/5 unit tests, 500/500 rescue benchmark, 600/600 mixed benchmark.

Draft pending upstream ananke-causal compatibility PR.

Signed-off-by: Omar Camara <omarcamara000@gmail.com>
Signed-off-by: Omar Camara <117489682+Username273183@users.noreply.github.com>
Signed-off-by: Omar Camara <117489682+Username273183@users.noreply.github.com>
- Add pyananke >= 0.6.0 as optional dep in pyproject.toml
- Add zid extras group: pip install dowhy[zid]
- Update ZIDIdentifier import error message to reference dowhy[zid]

pyananke is a modernized fork of ananke-causal with Python 3.10-3.12
compatibility fixes and the idz module for z-identifiability.
See: https://pypi.org/project/pyananke/

Signed-off-by: Omar Camara <117489682+Username273183@users.noreply.github.com>
Signed-off-by: Omar Camara <117489682+Username273183@users.noreply.github.com>
@Omar-Camara Omar-Camara marked this pull request as ready for review May 20, 2026 03:36
@Omar-Camara
Copy link
Copy Markdown
Author

Thanks @Omar-Camara for this PR!

A couple quick notes:

Thanks again!

Hi @emrekiciman — I've addressed all the feedback:

DCO: Fixed, all commits signed off
Tests: Added tests/causal_identifiers/test_zid_identifier.py covering ADMG conversion, decision procedure, and identify_effect_auto integration
Dependency: Resolved, Published pyananke to PyPI as a modernized fork of ananke-causal with Python 3.12 compatibility and the idz module. pip install dowhy[zid] now works
Draft status: Marked ready for review

An upstream MR to ananke-causal is also open (causal/ananke!69). If that merges, I'll switch the dependency from pyananke to ananke-causal in a follow-up commit.
Happy to make any adjustments. Thanks!

@amit-sharma
Copy link
Copy Markdown
Member

amit-sharma commented Jun 6, 2026

thanks for adding this @Omar-Camara . Much appreciated.
One suggestion: can you add the example instructions you have in this PR to the docs, e.g., as an example notebook? https://github.com/py-why/dowhy/tree/main/docs/source/example_notebooks
Otherwise this addition may get lost and new users wont be able to discover it.

Also, is there a way to avoid dependence on a fork of ananke?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants