diff --git a/qualtran/bloqs/mcmt/multi_qubit_toffoli.ipynb b/qualtran/bloqs/mcmt/multi_qubit_toffoli.ipynb
new file mode 100644
index 000000000..6d560d2e4
--- /dev/null
+++ b/qualtran/bloqs/mcmt/multi_qubit_toffoli.ipynb
@@ -0,0 +1,2992 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "6186bdee-982c-4061-88dc-d8ba9b62dccc",
+ "metadata": {},
+ "source": [
+ "This note presents an approximate $n$-qubit AND/Toffoli gate construction introduced in [GKZ25]. The idea is that, instead of directly computing the full AND, we randomly sample a small number of subsets of $[n]$, compute their parities, and combine those parity values through a smaller exact AND together with Clifford processing.\n",
+ "\n",
+ "The idea is easiest to see in the simpler classical setting of the $n$-bit OR function. For each subset $S \\subseteq [n]$, define\n",
+ "$$\n",
+ "\\mathrm{XOR}_S(x)=\\bigoplus_{i\\in S} x_i\n",
+ "$$\n",
+ "to be the parity function. If $\\mathrm{OR}_n(x)=0$, then every sampled parity is $0$. If $\\mathrm{OR}_n(x)=1$, then for a uniformly random subset $S$, the parity $\\mathrm{XOR}_S(x)$ equals $1$ with probability $1/2$. This leads to the following approximation scheme: If we sample independent subsets $S_1,\\dots,S_k$ and compute\n",
+ "$$\n",
+ "g_{S_1,\\dots,S_k}(x)=\\mathrm{OR}_k\\bigl(\\mathrm{XOR}_{S_1}(x),\\dots,\\mathrm{XOR}_{S_k}(x)\\bigr),\n",
+ "$$\n",
+ "then $g_{S_1,\\dots,S_k}(x)$ always agrees with $\\mathrm{OR}_n(x)$ when $\\mathrm{OR}_n(x)=0$, and when $\\mathrm{OR}_n(x)=1$ it fails only if all $k$ sampled parities are $0$, which happens with probability\n",
+ "$$\n",
+ "\\Pr[g_{S_1,\\dots,S_k}(x)\\neq \\mathrm{OR}_n(x)]\\le 2^{-k}.\n",
+ "$$\n",
+ "Hence, $\\mathrm{OR}_n$ can be approximated by computing a small number of random parities and then applying an exact OR on only $k$ bits. Performing this procedure in quantum superposition together with Clifford post-processing gives the approximate $n$-qubit Toffoli construction.\n",
+ "\n",
+ "This note is organized as follows. First, we implement the vanilla log-depth exact construction from a balanced tree of 2-bit AND gates in `MultiAndLogDepth`, while keeping the same external interface as the bloq `MultiAnd`. Then, we move to implement the approximate construction in [GKZ25]. In the `ParityMask` bloq, we prepare the masked strings whose parities encode the sampled subsets. Then, in `ApproxToffoli` we package the full approximate construction into a single bloq."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1392227b-2d95-421c-9c97-7582d9519ac7",
+ "metadata": {},
+ "source": [
+ "The following example presents the external interface of the bloq `MultiAnd`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "id": "7b638d0e-f828-4a78-904b-08b91f5345e2",
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/svg+xml": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from qualtran.drawing import show_bloq\n",
+ "from qualtran.bloqs.mcmt import MultiAnd\n",
+ "\n",
+ "bloq = MultiAnd(cvs=(1, 1, 1, 1, 1))\n",
+ "show_bloq(bloq)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c36e9695-a17b-472f-90c7-8f01c0d5ea55",
+ "metadata": {},
+ "source": [
+ "Next, we give an implementation of the $n$-qubit Toffoli gate with depth $\\log n$, and define it to be a bloq with the same external interface."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "id": "68917f9b-0b8f-45a3-ad22-79c90aebe375",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from dataclasses import dataclass\n",
+ "from typing import Dict, Optional, Tuple\n",
+ "\n",
+ "import numpy as np\n",
+ "\n",
+ "from qualtran import Bloq, BloqBuilder, Signature, Register, QBit, Side\n",
+ "from qualtran.bloqs.basic_gates import Toffoli, CNOT, XGate\n",
+ "from qualtran.drawing import (\n",
+ " Circle,\n",
+ " Text,\n",
+ " WireSymbol,\n",
+ " directional_text_box,\n",
+ " show_bloq,\n",
+ ")\n",
+ "\n",
+ "\n",
+ "@dataclass(frozen=True)\n",
+ "class MultiAndLogDepth(Bloq):\n",
+ " r\"\"\"A log-depth many-bit AND with the same public I/O as `MultiAnd`.\n",
+ "\n",
+ " This bloq computes the conjunction of a control register into a right-sided\n",
+ " output qubit using a balanced tree of Toffoli gates. The public register\n",
+ " interface matches `MultiAnd`: the input control register is preserved, the\n",
+ " output is written to a right-sided `target` qubit, and the intermediate tree\n",
+ " values are exposed as a right-sided `junk` register of length `n_ctrls - 2`.\n",
+ "\n",
+ " The control pattern is specified by `cvs`. An entry `cvs[i] = 1` denotes a\n",
+ " positive control on `ctrl[i]`, while `cvs[i] = 0` denotes a negative control.\n",
+ " Negative controls are implemented by conjugating the corresponding wire with\n",
+ " `X` before and after the tree computation.\n",
+ "\n",
+ " This implementation differs from the upstream `MultiAnd` in the semantics of\n",
+ " the `junk` register: here, `junk` stores balanced-tree intermediate values\n",
+ " rather than the ladder-style prefix-AND values used by the reference\n",
+ " implementation.\n",
+ "\n",
+ " Args:\n",
+ " cvs: A tuple of control values. Each entry must be `0` or `1`. The number\n",
+ " of controls is `len(cvs)`, and must be at least `3`.\n",
+ "\n",
+ " Registers:\n",
+ " ctrl: An `n`-bit control register.\n",
+ " junk [right]: An `n - 2` qubit junk register storing intermediate tree values.\n",
+ " target [right]: The output bit. \n",
+ " \"\"\"\n",
+ "\n",
+ " cvs: Tuple[int, ...]\n",
+ "\n",
+ " def __post_init__(self):\n",
+ " if len(self.cvs) < 3:\n",
+ " raise ValueError(\"MultiAndLogDepth must have at least 3 control values.\")\n",
+ " if any(cv not in (0, 1) for cv in self.cvs):\n",
+ " raise ValueError(\"Each control value in `cvs` must be 0 or 1.\")\n",
+ "\n",
+ " @property\n",
+ " def n_ctrls(self) -> int:\n",
+ " return len(self.cvs)\n",
+ "\n",
+ " @property\n",
+ " def concrete_cvs(self) -> Tuple[int, ...]:\n",
+ " return self.cvs\n",
+ "\n",
+ " @property\n",
+ " def signature(self) -> Signature:\n",
+ " return Signature(\n",
+ " [\n",
+ " Register(\"ctrl\", QBit(), shape=(self.n_ctrls,)),\n",
+ " Register(\"junk\", QBit(), shape=(self.n_ctrls - 2,), side=Side.RIGHT),\n",
+ " Register(\"target\", QBit(), side=Side.RIGHT),\n",
+ " ]\n",
+ " )\n",
+ "\n",
+ " def _build_tree(\n",
+ " self,\n",
+ " bb: BloqBuilder,\n",
+ " ctrls,\n",
+ " junk,\n",
+ " out,\n",
+ " ):\n",
+ " m = len(ctrls)\n",
+ "\n",
+ " if m == 1:\n",
+ " ctrls[0], out = bb.add(CNOT(), ctrl=ctrls[0], target=out)\n",
+ " return ctrls, junk, out\n",
+ "\n",
+ " if m == 2:\n",
+ " (ctrls[0], ctrls[1]), out = bb.add(\n",
+ " Toffoli(), ctrl=[ctrls[0], ctrls[1]], target=out\n",
+ " )\n",
+ " return ctrls, junk, out\n",
+ "\n",
+ " a = (m + 1) // 2\n",
+ " b = m - a\n",
+ "\n",
+ " left_ctrls = list(ctrls[:a])\n",
+ " right_ctrls = list(ctrls[a:])\n",
+ "\n",
+ " left_need = a - 1 if a > 1 else 0\n",
+ " right_need = b - 1 if b > 1 else 0\n",
+ "\n",
+ " left_pool = list(junk[:left_need])\n",
+ " right_pool = list(junk[left_need:left_need + right_need])\n",
+ "\n",
+ " if a == 1:\n",
+ " left_out = left_ctrls[0]\n",
+ " else:\n",
+ " left_out = left_pool[0]\n",
+ " left_ctrls, left_internal, left_out = self._build_tree(\n",
+ " bb, left_ctrls, left_pool[1:], left_out\n",
+ " )\n",
+ " left_pool = [left_out] + left_internal\n",
+ "\n",
+ " if b == 1:\n",
+ " right_out = right_ctrls[0]\n",
+ " else:\n",
+ " right_out = right_pool[0]\n",
+ " right_ctrls, right_internal, right_out = self._build_tree(\n",
+ " bb, right_ctrls, right_pool[1:], right_out\n",
+ " )\n",
+ " right_pool = [right_out] + right_internal\n",
+ "\n",
+ " (left_out, right_out), out = bb.add(\n",
+ " Toffoli(), ctrl=[left_out, right_out], target=out\n",
+ " )\n",
+ "\n",
+ " if a == 1:\n",
+ " left_ctrls[0] = left_out\n",
+ " else:\n",
+ " left_pool[0] = left_out\n",
+ "\n",
+ " if b == 1:\n",
+ " right_ctrls[0] = right_out\n",
+ " else:\n",
+ " right_pool[0] = right_out\n",
+ "\n",
+ " return left_ctrls + right_ctrls, left_pool + right_pool, out\n",
+ "\n",
+ " def build_composite_bloq(self, bb: BloqBuilder, **soqs):\n",
+ " ctrls = list(np.ravel(soqs[\"ctrl\"]))\n",
+ "\n",
+ " junk_reg = bb.allocate(self.n_ctrls - 2)\n",
+ " junk = list(bb.split(junk_reg))\n",
+ "\n",
+ " target_reg = bb.allocate(1)\n",
+ " target = bb.split(target_reg)[0]\n",
+ "\n",
+ " for i, cv in enumerate(self.concrete_cvs):\n",
+ " if cv == 0:\n",
+ " ctrls[i] = bb.add(XGate(), q=ctrls[i])\n",
+ "\n",
+ " ctrls, junk, target = self._build_tree(bb, ctrls, junk, target)\n",
+ "\n",
+ " for i, cv in enumerate(self.concrete_cvs):\n",
+ " if cv == 0:\n",
+ " ctrls[i] = bb.add(XGate(), q=ctrls[i])\n",
+ "\n",
+ " return {\n",
+ " \"ctrl\": np.asarray(ctrls, dtype=object),\n",
+ " \"junk\": np.asarray(junk, dtype=object),\n",
+ " \"target\": target,\n",
+ " }\n",
+ "\n",
+ " def _classical_tree(self, vals):\n",
+ " m = len(vals)\n",
+ "\n",
+ " if m == 1:\n",
+ " return [], np.uint8(vals[0])\n",
+ "\n",
+ " if m == 2:\n",
+ " return [], np.uint8(vals[0] & vals[1])\n",
+ "\n",
+ " a = (m + 1) // 2\n",
+ " b = m - a\n",
+ "\n",
+ " left_junk, left_out = self._classical_tree(vals[:a])\n",
+ " right_junk, right_out = self._classical_tree(vals[a:])\n",
+ "\n",
+ " out = np.uint8(left_out & right_out)\n",
+ " junk = [left_out] + left_junk + [right_out] + right_junk\n",
+ " return junk, out\n",
+ "\n",
+ " def on_classical_vals(self, ctrl: np.ndarray) -> Dict[str, np.ndarray]:\n",
+ " ctrl = np.asarray(ctrl, dtype=np.uint8)\n",
+ " effective_ctrl = np.equal(ctrl, np.asarray(self.concrete_cvs)).astype(np.uint8)\n",
+ " junk, target = self._classical_tree(list(effective_ctrl))\n",
+ " return {\n",
+ " \"ctrl\": ctrl,\n",
+ " \"junk\": np.asarray(junk, dtype=np.uint8),\n",
+ " \"target\": np.uint8(target),\n",
+ " }\n",
+ "\n",
+ " def __pow__(self, power: int):\n",
+ " if power == 1:\n",
+ " return self\n",
+ " if power == -1:\n",
+ " return self.adjoint()\n",
+ " return NotImplemented\n",
+ "\n",
+ " def wire_symbol(\n",
+ " self, reg: Optional[Register], idx: Tuple[int, ...] = tuple()\n",
+ " ) -> WireSymbol:\n",
+ " if reg is None:\n",
+ " return Text(\"\")\n",
+ " if reg.name == \"ctrl\":\n",
+ " return Circle(filled=self.concrete_cvs[idx[0]] == 1)\n",
+ " if reg.name == \"target\":\n",
+ " return directional_text_box(\"∧\", side=reg.side)\n",
+ " if len(idx) > 0:\n",
+ " pretty_text = f'{reg.name}[{\", \".join(str(i) for i in idx)}]'\n",
+ " else:\n",
+ " pretty_text = reg.name\n",
+ " return directional_text_box(text=pretty_text, side=reg.side)\n",
+ "\n",
+ " def __str__(self):\n",
+ " return f\"MultiAndLogDepth(n={self.n_ctrls})\"\n",
+ "\n",
+ " def build_call_graph(self, ssa=None):\n",
+ " cost = {Toffoli(): self.n_ctrls - 1}\n",
+ " n_neg = sum(int(cv == 0) for cv in self.concrete_cvs)\n",
+ " if n_neg:\n",
+ " cost[XGate()] = 2 * n_neg\n",
+ " return cost"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "id": "59d77954-2b87-423e-90fa-26331ec46ea2",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/svg+xml": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "image/svg+xml": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# Example\n",
+ "bloq = MultiAndLogDepth(cvs=(1, 1, 1, 1))\n",
+ "show_bloq(bloq)\n",
+ "show_bloq(bloq.decompose_bloq())"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "162d97fb-419f-48d8-803e-da0a6b875e5f",
+ "metadata": {},
+ "source": [
+ "The recent work [GKZ25] gives an $\\epsilon$-approximate implementation of the $n$-qubit Toffoli gate using exponentially fewer $T$ gates. The construction is based on the following classical randomized algorithm that computes the OR function: Samples $k=O(\\log n)$ subsets of $[n-1]$, computes their parity values, and then applies a $k$-bit OR function together with additional Clifford operations. Implementing the above procedure in superpositions with proper negations yields an approximate implementation of the $n$-bit AND gate.\n",
+ "\n",
+ "Here, we assume the sampling procedure is done, and the $k$ subsets are given as $k$ classical $(n-1)$-bit strings $s_1,\\ldots,s_k$, where $[s_i]_j=1$ if and only if $j$ belows to the $i$-th subset. We first prepare the following intermediate bloq that computes $k$ $(n-1)$-bit strings $x_1,\\ldots,x_k$, where $[x_i]_j=x_j\\cdot [s_i]_j$ for any $i,j$. The parity of each string $x_i$ equals the parity function of the $i$-th subset, which will be computed later."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "id": "a6d66b78-7f14-4fc7-b231-1c0fe754bca2",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from dataclasses import dataclass\n",
+ "from typing import Tuple, Dict, Any as AnyType\n",
+ "\n",
+ "from qualtran import Bloq, BloqBuilder, Signature, Register, QAny, Side\n",
+ "from qualtran.bloqs.basic_gates import CNOT, ZeroState\n",
+ "from qualtran.drawing import show_bloq\n",
+ "\n",
+ "\n",
+ "@dataclass(frozen=True)\n",
+ "class ParityMask(Bloq):\n",
+ " \"\"\"Prepare masked copies of an input register for sampled subset-parity computation.\n",
+ "\n",
+ " This bloq takes an $(n - 1)$-qubit input register `x` and prepares `k` output\n",
+ " registers `x_0, ..., x_{k-1}` according to a collection of classical sample\n",
+ " strings `s[0], ..., s[k-1]`. For each `i` and `j`,\n",
+ " the output bit $[x_i]_j = x_j \\cdot s[i]_j$.\n",
+ "\n",
+ " Equivalently, `x_i` is obtained by copying into the `i`-th output register\n",
+ " exactly those positions of `x` selected by the `i`-th mask, leaving all other\n",
+ " positions unchanged. The parity of each output string `x_i` therefore equals\n",
+ " the parity of the subset of input bits selected by `s[i]`.\n",
+ "\n",
+ " This bloq is intended as a sample-preparation subroutine in an approximate\n",
+ " multi-controlled AND / Toffoli construction based on classically sampled\n",
+ " subsets. The sampling itself is assumed to have already been performed, and\n",
+ " the sampled subsets are provided explicitly through `sample_strings`.\n",
+ "\n",
+ " Args:\n",
+ " n: The total number of qubits in the target approximate Toffoli construction.\n",
+ " This bloq acts on registers of length `n - 1`.\n",
+ " k: The number of sampled subsets, equivalently the number of output registers.\n",
+ " sample_strings: A tuple of `k` classical bit strings `s_1, ..., s_k`, each of length `n - 1`.\n",
+ " The entry `s[i]_j` equals `1` iff the `j`-th input bit is\n",
+ " included in the `i`-th sampled subset.\n",
+ "\n",
+ " Registers:\n",
+ " x: An `(n - 1)`-qubit input register.\n",
+ " x_0 [right]: An `(n - 1)`-qubit output register containing the masked copy\n",
+ " for the first sampled subset.\n",
+ " x_1 [right]: An `(n - 1)`-qubit output register containing the masked copy\n",
+ " for the second sampled subset.\n",
+ " ...\n",
+ " x_{k-1} [right]: An `(n - 1)`-qubit output register containing the masked\n",
+ " copy for the last sampled subset.\n",
+ " \"\"\"\n",
+ "\n",
+ " n: int\n",
+ " k: int\n",
+ " sample_strings: Tuple[Tuple[int, ...], ...]\n",
+ "\n",
+ " @property\n",
+ " def signature(self) -> Signature:\n",
+ " regs = [Register(\"x\", QAny(self.n - 1))]\n",
+ " regs += [\n",
+ " Register(f\"x_{i}\", QAny(self.n - 1), side=Side.RIGHT)\n",
+ " for i in range(self.k)\n",
+ " ]\n",
+ " return Signature(regs)\n",
+ "\n",
+ " def build_composite_bloq(self, bb: BloqBuilder, **soqs) -> Dict[str, AnyType]:\n",
+ " if len(self.sample_strings) != self.k:\n",
+ " raise ValueError(\"sample_strings must have length k.\")\n",
+ " for row in self.sample_strings:\n",
+ " if len(row) != self.n - 1:\n",
+ " raise ValueError(\"Each mask in sample_strings must have length n-1.\")\n",
+ "\n",
+ " x = soqs[\"x\"]\n",
+ " x_bits = list(bb.split(x))\n",
+ "\n",
+ " out_regs = []\n",
+ " for i in range(self.k):\n",
+ " # Explicitly prepare |0...0> for the right-output register x_i.\n",
+ " xi_bits = [bb.add(ZeroState()) for _ in range(self.n - 1)]\n",
+ "\n",
+ " for j in range(self.n - 1):\n",
+ " if self.sample_strings[i][j]:\n",
+ " x_bits[j], xi_bits[j] = bb.add(\n",
+ " CNOT(), ctrl=x_bits[j], target=xi_bits[j]\n",
+ " )\n",
+ "\n",
+ " out_regs.append(bb.join(xi_bits))\n",
+ "\n",
+ " x = bb.join(x_bits)\n",
+ "\n",
+ " return {\n",
+ " \"x\": x,\n",
+ " **{f\"x_{i}\": out_regs[i] for i in range(self.k)},\n",
+ " }\n",
+ "\n",
+ " def build_call_graph(self, ssa=None):\n",
+ " cnot_count = sum(sum(int(bit) for bit in row) for row in self.sample_strings)\n",
+ " return {\n",
+ " ZeroState(): self.k * (self.n - 1),\n",
+ " CNOT(): cnot_count,\n",
+ " }"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "id": "bb44882d-3e2b-418f-80f0-38046fe9965f",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/svg+xml": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "image/svg+xml": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# Example for this ParityMask subroutine with n=5, k=3\n",
+ "sample_strings = (\n",
+ " (1, 0, 1, 0),\n",
+ " (0, 1, 1, 0),\n",
+ " (1, 1, 0, 1),\n",
+ ")\n",
+ "\n",
+ "B = ParityMask(n=5, k=3, sample_strings=sample_strings)\n",
+ "show_bloq(B)\n",
+ "show_bloq(B.decompose_bloq())"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f5ab1f58-9332-4521-af42-4167dfabce0b",
+ "metadata": {},
+ "source": [
+ "Using the bloq ParityMask as a subroutine, we obtain the circuit that approximates the $n$-qubit Toffoli gate."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "id": "4e9e8c28-824b-4779-b433-0768370c462e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from dataclasses import dataclass\n",
+ "from typing import Tuple, Dict, Any as AnyType\n",
+ "\n",
+ "import numpy as np\n",
+ "\n",
+ "from qualtran import Bloq, BloqBuilder, Signature, Register, QBit, QAny, Side\n",
+ "from qualtran.bloqs.basic_gates import XGate, CNOT, Toffoli, ZeroState\n",
+ "from qualtran.drawing import show_bloq\n",
+ "\n",
+ "\n",
+ "@dataclass(frozen=True)\n",
+ "class ApproxToffoli(Bloq):\n",
+ "\n",
+ " \"\"\"Approximate multi-controlled Toffoli via sampled subset parities.\n",
+ "\n",
+ " This bloq implements an approximate `n`-qubit Toffoli / AND construction based on\n",
+ " classically sampled subsets. Given an `(n - 1)`-qubit control register `ctrl`, it\n",
+ " first prepares `k` masked copies determined by the sampled subset indicators in\n",
+ " `sample_strings`, computes the parity of each masked copy, negates those parity\n",
+ " bits, and then computes their conjunction into the output qubit `target`.\n",
+ "\n",
+ " The resulting output approximates the `(n - 1)`-bit AND of the control register.\n",
+ " Equivalently, it approximates an `n`-qubit Toffoli with `(n - 1)` controls and one\n",
+ " target, except that this bloq exposes a fresh right-sided output qubit `target`\n",
+ " rather than toggling an input target in place.\n",
+ "\n",
+ " All intermediate data are exposed through the right-sided `junk` register. In the\n",
+ " current implementation, `junk` is packed in the order\n",
+ " `x_0, ..., x_{k-1}, OR_values, ancilla`, where `x_i` are the masked copies,\n",
+ " `OR_values` are the computed parity bits, and `ancilla` are the auxiliary qubits\n",
+ " used by the multi-control AND subroutine.\n",
+ "\n",
+ " Args:\n",
+ " n: The total number of qubits in the target Toffoli interpretation. The bloq\n",
+ " acts on an `(n - 1)`-qubit control register.\n",
+ " k: The number of sampled subsets used in the approximation.\n",
+ " sample_strings: A tuple of `k` classical bit strings, each of length `n - 1`.\n",
+ " The entry `sample_strings[i][j]` equals `1` iff the `j`-th control bit is\n",
+ " included in the `i`-th sampled subset.\n",
+ "\n",
+ " Registers:\n",
+ " ctrl: An `(n - 1)`-qubit control register.\n",
+ " junk [right]: A right-sided junk register containing all intermediate masked\n",
+ " copies, parity values, and ancilla qubits.\n",
+ " target [right]: The output bit of the approximate multi-controlled AND.\n",
+ " \"\"\"\n",
+ " \n",
+ " n: int\n",
+ " k: int\n",
+ " sample_strings: Tuple[Tuple[int, ...], ...]\n",
+ "\n",
+ " @property\n",
+ " def n_junk(self) -> int:\n",
+ " return self.k * (self.n - 1) + self.k + max(self.k - 2, 0)\n",
+ "\n",
+ " @property\n",
+ " def signature(self) -> Signature:\n",
+ " return Signature(\n",
+ " [\n",
+ " Register(\"ctrl\", QBit(), shape=(self.n - 1,)),\n",
+ " Register(\"junk\", QBit(), shape=(self.n_junk,), side=Side.RIGHT),\n",
+ " Register(\"target\", QBit(), side=Side.RIGHT),\n",
+ " ]\n",
+ " )\n",
+ "\n",
+ " def build_composite_bloq(self, bb: BloqBuilder, **soqs) -> Dict[str, AnyType]:\n",
+ " n, k = self.n, self.k\n",
+ "\n",
+ " ctrl = list(np.ravel(soqs[\"ctrl\"]))\n",
+ " x = bb.join(ctrl)\n",
+ "\n",
+ " # 1) Flip x\n",
+ " x_bits = list(bb.split(x))\n",
+ " for j in range(n - 1):\n",
+ " x_bits[j] = bb.add(XGate(), q=x_bits[j])\n",
+ " x = bb.join(x_bits)\n",
+ "\n",
+ " # 2) Parity Mask: produces right-output registers x_0, ..., x_{k-1}\n",
+ " sp = ParityMask(n=n, k=k, sample_strings=self.sample_strings)\n",
+ " sp_out = bb.add(sp, x=x)\n",
+ " x = sp_out[0]\n",
+ " x_regs = list(sp_out[1:])\n",
+ "\n",
+ " # 3) Explicitly prepare OR_values in |0...0>\n",
+ " OR_regs = [bb.add(ZeroState()) for _ in range(k)]\n",
+ "\n",
+ " # 4) Parity compute\n",
+ " for i in range(k):\n",
+ " xi = x_regs[i]\n",
+ " oi = OR_regs[i]\n",
+ "\n",
+ " xi_bits = list(bb.split(xi))\n",
+ " for j in range(n - 1):\n",
+ " xi_bits[j], oi = bb.add(CNOT(), ctrl=xi_bits[j], target=oi)\n",
+ "\n",
+ " x_regs[i] = bb.join(xi_bits)\n",
+ " OR_regs[i] = oi\n",
+ "\n",
+ " # 5) Compute AND of negated parity bits into target\n",
+ " for i in range(k):\n",
+ " OR_regs[i] = bb.add(XGate(), q=OR_regs[i])\n",
+ "\n",
+ " ancilla_bits = []\n",
+ " if k == 1:\n",
+ " target = bb.add(ZeroState())\n",
+ " OR_regs[0], target = bb.add(CNOT(), ctrl=OR_regs[0], target=target)\n",
+ " elif k == 2:\n",
+ " target = bb.add(ZeroState())\n",
+ " (OR_regs[0], OR_regs[1]), target = bb.add(\n",
+ " Toffoli(), ctrl=[OR_regs[0], OR_regs[1]], target=target\n",
+ " )\n",
+ " else:\n",
+ " and_k = MultiAndLogDepth(cvs=(1,) * k)\n",
+ " OR_ctrl = np.asarray(OR_regs, dtype=object)\n",
+ " OR_ctrl, ancilla, target = bb.add(and_k, ctrl=OR_ctrl)\n",
+ " OR_regs = list(np.ravel(OR_ctrl))\n",
+ " ancilla_bits = list(np.ravel(ancilla))\n",
+ "\n",
+ " for i in range(k):\n",
+ " OR_regs[i] = bb.add(XGate(), q=OR_regs[i])\n",
+ "\n",
+ " # 6) Uncompute parity so OR_values return to |0...0>\n",
+ " for i in reversed(range(k)):\n",
+ " xi = x_regs[i]\n",
+ " oi = OR_regs[i]\n",
+ "\n",
+ " xi_bits = list(bb.split(xi))\n",
+ " for j in range(n - 1):\n",
+ " xi_bits[j], oi = bb.add(CNOT(), ctrl=xi_bits[j], target=oi)\n",
+ "\n",
+ " x_regs[i] = bb.join(xi_bits)\n",
+ " OR_regs[i] = oi\n",
+ "\n",
+ " # 7) Unflip x\n",
+ " x_bits = list(bb.split(x))\n",
+ " for j in range(n - 1):\n",
+ " x_bits[j] = bb.add(XGate(), q=x_bits[j])\n",
+ " x = bb.join(x_bits)\n",
+ "\n",
+ " ctrl = np.asarray(bb.split(x), dtype=object)\n",
+ "\n",
+ " # Pack junk as: x_0, ..., x_{k-1}, OR_values, ancilla\n",
+ " junk_bits = []\n",
+ " for i in range(k):\n",
+ " junk_bits.extend(list(bb.split(x_regs[i])))\n",
+ " junk_bits.extend(OR_regs)\n",
+ " junk_bits.extend(ancilla_bits)\n",
+ "\n",
+ " return {\n",
+ " \"ctrl\": ctrl,\n",
+ " \"junk\": np.asarray(junk_bits, dtype=object),\n",
+ " \"target\": target,\n",
+ " }"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "id": "47ccbac3-07b7-48ed-97db-c9c11c4ed3df",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/svg+xml": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "image/svg+xml": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# Example: n = 3, k = 2\n",
+ "n = 3\n",
+ "k = 2\n",
+ "sample_strings = (\n",
+ " (1, 0),\n",
+ " (0, 1),\n",
+ ")\n",
+ "\n",
+ "B = ApproxToffoli(n=n, k=k, sample_strings=sample_strings)\n",
+ "show_bloq(B)\n",
+ "show_bloq(B.decompose_bloq())"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7aab3b41-d463-43b1-a935-35fae37c3331",
+ "metadata": {},
+ "source": [
+ "Finally, we test the correctness of the algorithm on classical inputs/outputs. In the example below, we set $n=17$ and $k=4$. We randomly sample $x$ and sample_strings every time. The expected error probability is $2^{-k}=1/16$."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 35,
+ "id": "7d44573e-9902-4b15-8edc-f26947445a97",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Single randomized test case\n",
+ " x (binary) = 1100111100000011\n",
+ " sample_strings = ((0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0), (0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1), (1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1), (1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0))\n",
+ " Approx t = 0\n",
+ " Ideal t = 0\n",
+ " MATCH = True\n"
+ ]
+ }
+ ],
+ "source": [
+ "import random\n",
+ "\n",
+ "# n = 17 (16 controls), k = 4\n",
+ "n = 17\n",
+ "k = 4\n",
+ "m = n - 1\n",
+ "ALL_ONES = (1 << m) - 1\n",
+ "\n",
+ "def random_sample_string(width: int) -> tuple[int, ...]:\n",
+ " return tuple(random.randint(0, 1) for _ in range(width))\n",
+ "\n",
+ "def int_to_bits(x: int, width: int) -> tuple[int, ...]:\n",
+ " return tuple((x >> j) & 1 for j in range(width))\n",
+ "\n",
+ "def ideal_toffoli_t(x_int: int, t_in: int) -> int:\n",
+ " # New ApproxToffoli outputs a fresh target bit\n",
+ " return int(x_int == ALL_ONES)\n",
+ "\n",
+ "# ----------------------------\n",
+ "# Single randomized example\n",
+ "# ----------------------------\n",
+ "\n",
+ "# Randomly sample x\n",
+ "x_int = random.randint(0, ALL_ONES)\n",
+ "t_in = 0\n",
+ "\n",
+ "# Fresh random sample strings\n",
+ "sample_strings = tuple(random_sample_string(m) for _ in range(k))\n",
+ "\n",
+ "B = ApproxToffoli(n=n, k=k, sample_strings=sample_strings)\n",
+ "\n",
+ "out = B.call_classically(\n",
+ " ctrl=int_to_bits(x_int, m),\n",
+ ")\n",
+ "\n",
+ "t_out = out[2]\n",
+ "t_expected = ideal_toffoli_t(x_int, t_in)\n",
+ "\n",
+ "print(\"Single randomized test case\")\n",
+ "print(f\" x (binary) = {x_int:016b}\")\n",
+ "print(f\" sample_strings = {sample_strings}\")\n",
+ "print(f\" Approx t = {t_out}\")\n",
+ "print(f\" Ideal t = {t_expected}\")\n",
+ "print(f\" MATCH = {t_out == t_expected}\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "0e0e04ff-557c-4478-8d03-0ffcb2588773",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python (qualtran311)",
+ "language": "python",
+ "name": "qualtran311"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.11.13"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}