Skip to content

feat(weave): add OTel tracing primitives to weave.trace_server (PR-7a of ddtrace migration)#7196

Draft
amwarrier wants to merge 1 commit into
masterfrom
aravind/weave-trace/otel-migration-pr-7a
Draft

feat(weave): add OTel tracing primitives to weave.trace_server (PR-7a of ddtrace migration)#7196
amwarrier wants to merge 1 commit into
masterfrom
aravind/weave-trace/otel-migration-pr-7a

Conversation

@amwarrier

Copy link
Copy Markdown
Contributor

Summary

First PR of the weave/trace_server/ ddtrace → OpenTelemetry migration. Adds the abstraction layer (@traced, @traced_generator) that subsequent PRs will use to replace @ddtrace.tracer.wrap decorators across 12 files (66 wraps total).

This PR is intentionally scope-limited to the decorator surface + tests. No call sites are migrated. Call-site migration follows:

  • PR-7b1 — clickhouse_trace_server_batched.py (47 of 66 wraps) — draft opened immediately after this PR.
  • PR-7b2 — remaining 11 files (kafka.py, usage_utils.py, ttl_settings.py, etc).
  • PR-7c — datadog.py rewrite + remove ddtrace>=2.7.0 from pyproject.toml.

Why a new module

weave.trace_server.datadog keeps its existing public symbols (record_db_insert, set_root_span_dd_tags, db_insert_path, etc.). Only the tracing surface — @traced and the generator variant — moves to the new module. This keeps the PR-7a diff scoped to additions, and the module name is honest: the SDK is OpenTelemetry, DD is just the backend.

Public surface

from weave.trace_server.tracing import traced, traced_generator

@traced(name="my_op")
def my_op(x: int) -> int:
    ...

@traced(name="my_async_op")
async def my_async_op(x: int) -> int:
    ...

@traced_generator(name="stream_calls")
def stream_calls(limit: int) -> Iterator[dict]:
    yield from ...

Why no try/except in the decorator body

OTel's start_as_current_span context manager already:

  • Records the exception as a span event on __exit__
  • Sets Status(ERROR, "{ExcType}: {msg}") for Exception subclasses
  • Lets BaseException (GeneratorExit, CancelledError, KeyboardInterrupt, SystemExit) pass through without marking the span errored

Reference: opentelemetry-python's use_span helper catches except Exception, not BaseException, per open-telemetry/opentelemetry-python#4484.

This also means the historical generator_trace wrapper in datadog.py (which existed to suppress ddtrace's auto-mark-error-on-GeneratorExit) is removable in PR-7c.

Why _tracer = trace.get_tracer(...) is hoisted to module scope

Measured: 16.5μs/call with get_tracer(...) inside the wrapper vs 9.95μs/call with it hoisted (40% per-call overhead saved). These decorators run on hot CH-query paths with dozens of spans per request, so the hoist matters.

get_tracer returns a ProxyTracer that caches its real-tracer reference on first span open and never re-resolves. To preserve test isolation when the global TracerProvider is swapped between cases, a reset_tracer_cache() helper is exported. Production code never needs to call it — there is exactly one TracerProvider per process, set during application startup before any span opens.

Test plan

  • All 12 unit tests pass (validated locally against opentelemetry-sdk 1.28.0+):
    • Sync function: span name, error marking, functools.wraps metadata preservation
    • Async function: span name, error marking, CancelledError pass-through
    • Generator/async-generator refusal at decoration time (clear TypeError)
    • @traced_generator: span covers full iteration, GeneratorExit pass-through, exception marking
    • Nested decorator calls produce correct parent/child span linkage
  • No regressions in the existing tests/trace_server/ suite (no call sites migrated yet)
  • Tested in the wandb/core image via submodule bump on wandbench-small before merging

Out of scope

  • No traced_with_service (per-span service override) — none of upstream's existing wraps use @ddtrace.tracer.wrap(service="X"), so adding it would be speculative. Will add in a future PR if a call site needs it.
  • Phase A in wandb/core (services/weave-trace/src/tracing/decorator.py) has its own equivalent decorator. Once this lands and the upstream snapshot advances in wandb/core, that file becomes deletable — tracked separately.

🤖 Generated with Claude Code

Introduces `weave.trace_server.tracing` — the OTel-equivalent of the
ddtrace-flavored helpers in `weave.trace_server.datadog`. Exposes two
public symbols:

  - `@traced(name)` — wraps a function in an OTel span; refuses generators
    at decoration time with a clear TypeError
  - `@traced_generator(name)` — for streaming endpoints; `yield from`s
    inside the span body so the span covers the full iteration

This is the abstraction layer for the upcoming ddtrace -> OpenTelemetry
migration of `weave/trace_server/`. Call-site migration follows in
subsequent PRs (PR-7b1 for clickhouse_trace_server_batched.py, PR-7b2 for
the remaining 11 files, PR-7c for datadog.py + pyproject removal).

The decorator's body is minimal because OTel's `start_as_current_span`
context manager already records exceptions and sets `Status(ERROR,
"{ExcType}: {msg}")` on Exception raises. BaseException subclasses
(GeneratorExit, CancelledError, KeyboardInterrupt, SystemExit) pass
through without marking the span errored — see
open-telemetry/opentelemetry-python#4484. So
the wrapper does not need its own try/except, and the historical
`generator_trace` wrapper in `datadog.py` (which existed to suppress
ddtrace's auto-mark-error-on-GeneratorExit) becomes deletable in PR-7c.

The module-scope `_tracer = trace.get_tracer(...)` is hoisted because
re-resolving inside every wrapper call costs ~7us/span (measured), and
these decorators run on hot CH-query paths. A `reset_tracer_cache()`
helper is exported for tests that swap the global TracerProvider
mid-process.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@wandbot-3000

wandbot-3000 Bot commented Jun 11, 2026

Copy link
Copy Markdown

@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.00000% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
weave/trace_server/tracing.py 95.00% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

amwarrier added a commit that referenced this pull request Jun 12, 2026
…ated)

Integrated commit for testing in wandb/core's weave-trace image on
wandbench-small. Represents the combined effect of upstream PRs:
  - #7196 (PR-7a): @Traced + @traced_generator in weave/trace_server/tracing.py
  - #7201 (PR-7b1): 47 wraps in clickhouse_trace_server_batched.py
  - #7202 (PR-7b2): 17 wraps + 2 trace blocks + 3 set_tag + 1 current_root in 10 files
  - #7203 (PR-7c): inline DogStatsD in datadog.py, drop ddtrace dep from pyproject.toml

This branch sits on top of the wandb/core-pinned submodule SHA so the
weave-trace image build picks up only PR-7 changes — no unrelated
upstream drift.

After this commit, `grep -rn "import ddtrace\|from ddtrace"` returns empty
across weave/trace_server/. Only docstring/comment references remain.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant