BE-519: Set up OpenTelemetry across hash-api and Temporal workers by TimDiekmann · Pull Request #8681 · hashintel/hash

TimDiekmann · 2026-04-30T10:25:48Z

🌟 What is the purpose of this PR?

Wires OpenTelemetry through hash-api, hash-ai-worker-ts, and hash-integration-worker so traces, logs, and metrics flow into the existing OTLP collector. Caller-side trace context (Express HTTP spans, Rust temporal-client::start_workflow) is propagated into Temporal workflow start headers so the worker-side RunWorkflow and RunActivity spans chain off the caller's trace in Tempo.

Outbound fetch calls (OpenAI, Anthropic, Linear, …) are now traced via @opentelemetry/instrumentation-undici with a shared peer.service mapping so Tempo's service_graphs processor renders external dependencies as edges in the service map.

🔗 Related links

BE-519
BE-520 — follow-up for dropping the v1↔v2 SDK adapter once @temporalio/interceptors-opentelemetry-v2 is released
SRE-676 — follow-up for Temporal Server self-instrumentation

🚫 Blocked by

none

🔍 What does this change?

New shared OTEL module (@local/hash-backend-utils/opentelemetry):

registerOpenTelemetry({ endpoint, serviceName, instrumentations }) registers a global NodeTracerProvider / LoggerProvider / MeterProvider against an OTLP/gRPC collector.
createUndiciInstrumentation() and httpRequestSpanNameHook shared across all three services.
peer.service mapping (discriminated exact/suffix host rules) for OpenAI, Anthropic, Linear, Google Cloud.
BatchSpanProcessor / BatchLogRecordProcessor keep export off the request path. Shutdown surfaces Promise.allSettled rejections and applies a 2-second per-provider timeout.

Temporal trace propagation:

Workflow client interceptor (OpenTelemetryWorkflowClientInterceptor) attached in createTemporalClient injects the active context into workflow start headers.
Activity-side interceptors (activity factory shape) extract context and stamp trace_id / span_id on activity log lines for Loki ↔ Tempo correlation.
Workflow modules registered both in bundleWorkflowCode (production path) AND in Worker.create (dev path) — without the bundle registration, prebuilt bundles ignore the runtime workflowModules config and prod ships zero workflow spans.
wrapWorkflowSpanExporter / makeV2WorkflowSink bridges v1-shaped ReadableSpan from @temporalio/interceptors-opentelemetry@1 to our v2 stack (synthesises instrumentationScope and parentSpanContext); marked TODO(BE-520) and quarantined to one helper.

Rust trace context propagation (hash-temporal-client::ai):

Rewrites start_ai_workflow to use the low-level WorkflowService::start_workflow_execution so the proto header field is exposed.
build_otel_header() injects the active trace context as a _tracer-data payload; emits a once-per-process tracing::warn! when the carrier is empty.
Caller span annotated with otel.kind = "producer" for the asynchronous fire-and-forget shape.

Worker bootstrap helper (@local/hash-backend-utils/temporal/worker-bootstrap):

runWorker(opts) collapses the previously-duplicated bootstrap logic in both worker main.ts files. Sentry.init stays per-worker because of ESM import-ordering.
WorkflowSource discriminated union ({ kind: "bundle" } | { kind: "path" }) replaces Partial<WorkerOptions>; ExtraWorkerOptions = Omit<WorkerOptions, …> excludes helper-owned and source-owned keys from the escape hatch.
SIGTERM/SIGINT handler awaits worker.run() so in-flight activities drain before OTEL providers shut down. Exit code 0 on graceful signals.

Sentry coexistence:

All three services pass skipOpenTelemetrySetup: !!otelSetup so Sentry shares our NodeTracerProvider. If Sentry registered its own, it would shadow the global provider and the OTEL workflow client interceptor would silently break trace context propagation.

Metrics scrape:

apps/hash-external-services/opentelemetry-collector/otel-collector-config.yaml now scrapes Temporal Server's prometheus endpoint (temporal:8000) with a temporal_ prefix relabel.

Pre-Merge Checklist 🚀

🚢 Has this modified a publishable library?

This PR:

does not modify any publishable blocks or libraries, or modifications do not need publishing

📜 Does this require a change to the docs?

The changes in this PR:

are internal and do not require a docs change

🕸️ Does this require a change to the Turbo Graph?

The changes in this PR:

do not affect the execution graph

⚠️ Known issues

The v1↔v2 SDK adapter is a workaround until @temporalio/interceptors-opentelemetry-v2 lands — tracked in BE-520.
Caller→workflow tracing uses parent-child semantics (what @temporalio/interceptors-opentelemetry produces). The OTEL spec for async messaging recommends PRODUCER/CONSUMER + Span Links, but neither v1 nor v2 of the upstream package implement that. Doesn't affect our metrics today: workflow spans are INTERNAL kind and start after the caller span ends, so Tempo's service-graphs processor doesn't inflate edge latencies. Captured as a follow-up note in BE-520.
HASH_OTLP_ENDPOINT must be set on ALL backend services or NONE — a partial config silently breaks caller↔worker context propagation. No runtime check (deferred — infra-level concern).

🐾 Next steps

BE-520: switch to @temporalio/interceptors-opentelemetry-v2 once released, drop the workflow span adapter.
SRE-676: instrument Temporal Server itself.
BE-518: broader logger cleanup sweep (separate PR).

🛡 What tests cover this?

23 unit tests in libs/@local/hash-backend-utils/:

src/opentelemetry.test.ts (12 tests) — resolvePeerService (exact/suffix matching, lookalike-domain rejection, suffix-dot-boundary), httpRequestSpanNameHook (incoming, outgoing, Express-wrapped, query-stripped, missing-method paths).
src/temporal/workflow-span-adapter.test.ts (11 tests) — v1→v2 normalisation (instrumentationLibrary → instrumentationScope, parentSpanId → parentSpanContext), already-v2 passthrough identity check, attribute / event preservation through the rewrite path, mixed batch handling, the parentSpanId === "" edge case, existing-parentSpanContext precedence, result-callback propagation, shutdown / forceFlush delegation.

Drain semantics on SIGTERM are verified manually — TestWorkflowEnvironment integration tests deferred.

❓ How to test this?

Set HASH_OTLP_ENDPOINT=http://localhost:4317 and start the OTEL collector + Tempo + Mimir + Loki via the apps/hash-external-services docker-compose.
yarn dev:backend + yarn start:worker:ai.
From the frontend, create an entity (triggers the updateEntityEmbeddings workflow).
Grafana → Tempo → search by service Node API. Expected trace shape:
- Node API POST /graphql (~150ms) at the root.
- start_ai_workflow (~12ms) as child, marked Producer kind.
- AI Worker RunWorkflow:updateEntityEmbeddings (~2.5s) as child of the producer.
- RunActivity:createAndStoreEntityEmbeddingsActivity nested under it.
- OpenAI POST /v1/embeddings as external-service span (undici instrumentation).
Service map should show nodes for Node API, Graph API, AI Worker, OpenAI, Postgres, with an AI Worker → OpenAI edge.
Drain check: kill -TERM <ai-worker-pid> while a workflow is mid-flight. Logs show Received SIGTERM, exiting…, then activity completion logs, then OTEL flush, then exit 0.

📹 Demo

Tempo screenshots from local testing — see Linear BE-519 for image attachments.

Two rounds of multi-agent review surfaced 5 Critical and 8 Important items, plus a Critical regression introduced by the first cleanup pass. This commit addresses all of them. Production telemetry fixes: - Bundle scripts now register the OTEL workflow interceptor as a workflow-bundle module; without this, prod workers shipped no workflow spans because `Worker.create.interceptors.workflowModules` is ignored when a prebuilt bundle is loaded. - `BatchSpanProcessor` / `BatchLogRecordProcessor` replace the Simple variants so OTLP exports stay off the request path. - `registerOpenTelemetry` shutdown surfaces `Promise.allSettled` rejections to stderr and applies a 2-second per-provider timeout. - `instrument.mjs` (and the worker shims) wrap the bootstrap in try/catch so a misconfigured collector falls back to no telemetry rather than crashing the process before the logger is wired. Worker shutdown: - Workers now call `worker.shutdown()` and await `worker.run()` from the SIGTERM handler so in-flight activities drain cleanly. The previous draft called `process.exit` before the drain completed. - Exit code is 0 on graceful signals, 1 only on actual failure. - `OpenTelemetryActivityOutboundInterceptor` is wired so log lines carry `trace_id` / `span_id` for Loki ↔ Tempo correlation. Refactor: - New `runWorker(opts)` helper in `@local/hash-backend-utils/temporal/worker-bootstrap` collapses the previously-duplicated bootstrap logic in both worker `main.ts` files. Sentry init stays per-worker (ESM ordering). - `WorkflowSource` discriminated union (`{ kind: "bundle" | "path" }`) replaces `Partial<WorkerOptions>` for workflow-source config, and `ExtraWorkerOptions = Omit<WorkerOptions, ...>` excludes helper-owned keys from the escape hatch. - `makeV2WorkflowSink(setup)` quarantines the `as unknown as Parameters<...>` casts to a single helper, ready to drop with BE-520. - `httpRequestSpanNameHook` is shared across the three Node bootstraps. - The `getActiveOpenTelemetrySetup` singleton is gone; `instrument.mjs` exports `otelSetup` directly. - Rust `build_otel_header` emits a once-per-process warning when the trace-context carrier is empty, surfacing missing-propagator configuration instead of silently producing parent-less workflows. Tests (23 unit tests covering the v1↔v2 normalisation, peer.service resolution, and HTTP hook behaviour): - `wrapWorkflowSpanExporter` / `normaliseSpan` against v1-shaped spans, mixed batches, the existing-`parentSpanContext` precedence case, the `parentSpanId === ""` edge case, and result-callback propagation. - `resolvePeerService` exact / suffix matching and lookalike-domain rejection. - `httpRequestSpanNameHook` for incoming, outgoing, Express-wrapped, query-stripped, and missing-method paths. The drain semantics are verified manually — `TestWorkflowEnvironment` integration tests would be a follow-up.

vercel · 2026-04-30T10:25:54Z

The latest updates on your projects. Learn more about Vercel for GitHub.

3 Skipped Deployments

Project	Deployment	Actions	Updated (UTC)
hash	Ignored	Preview	Apr 30, 2026 3:50pm
hashdotdesign-tokens	Ignored	Preview	Apr 30, 2026 3:50pm
petrinaut	Skipped	Comment	Apr 30, 2026 3:50pm

cursor · 2026-04-30T10:25:56Z

PR Summary

Medium Risk
Touches cross-cutting telemetry and worker bootstrap/shutdown paths plus Temporal client/workflow interceptors; mistakes could break context propagation or alter worker lifecycle, but changes are largely additive and guarded by HASH_OTLP_ENDPOINT.

Overview
Wires OpenTelemetry (traces + logs + metrics) across hash-api, the AI worker, and the integration worker via a new shared @local/hash-backend-utils/opentelemetry module (HTTP/Express/GraphQL/gRPC/undici instrumentations, peer-service labeling, batched exporters, and graceful shutdown hooks).

Adds end-to-end trace context propagation into Temporal by attaching the OTEL workflow client interceptor in createTemporalClient, bundling/activating workflow-side OTEL interceptors, and exporting workflow-sandbox spans through a v1→v2 span adapter so workflow/activity spans parent off caller requests; both workers are refactored to use a new shared runWorker bootstrap (health check server, signal handling/drain, Sentry+OTEL coexistence).

Updates the Rust hash-temporal-client AI workflow starter to call the low-level Temporal start API so it can inject OTEL trace headers, and extends infra to scrape Temporal server Prometheus metrics (with a temporal_ prefix) in the OTEL collector; also tweaks proxy logging to preserve message bodies for OTLP logs.

^{Reviewed by Cursor Bugbot for commit 98041ff. Bugbot is set up for automated code reviews on this repo. Configure here.}

augmentcode · 2026-04-30T10:29:13Z

🤖 Augment PR Summary

Summary: This PR standardizes OpenTelemetry across the Node API and Temporal workers, and propagates trace context into Temporal workflow starts so caller → workflow/activity spans join into one trace.

Changes:

Introduced a shared @local/hash-backend-utils/opentelemetry module to register global trace/log/metric providers, undici fetch instrumentation, consistent HTTP span naming, and peer.service host→service mapping.
Updated hash-api to use the shared OTEL bootstrap in instrument.mjs, flush OTEL on graceful shutdown, and improve proxy log forwarding for structured OTLP log bodies.
Added OTEL bootstrap entrypoints for hash-ai-worker-ts and hash-integration-worker (imported first) and included the OTEL workflow interceptor in bundled workflow code.
Added a shared Temporal worker bootstrap (runWorker) to centralize Runtime telemetry setup, connection creation, health checks, sinks/interceptors, and SIGTERM/SIGINT draining + OTEL flush.
Enabled Temporal workflow-start trace propagation in the TypeScript client via OpenTelemetryWorkflowClientInterceptor.
Updated the Rust temporal client to start workflows via the low-level API so it can inject the _tracer-data header payload.
Extended the local OTEL collector to scrape Temporal server Prometheus metrics and updated docker-compose to expose the Temporal metrics endpoint.
Added unit tests for OTEL helpers (span naming / peer.service resolution) and the Temporal workflow span v1→v2 adapter.

Technical Notes: Uses batch processors to keep export off the request path and includes explicit shutdown flushing with per-provider timeouts; Temporal workflow spans are bridged from the v1 interceptor shape to the v2 OTLP exporter shape.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 2 suggestions posted.

Comment augment review to trigger a new review at any time.

codecov · 2026-04-30T10:29:52Z

Codecov Report

❌ Patch coverage is 26.31579% with 154 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.03%. Comparing base (eeba8a9) to head (98041ff).
⚠️ Report is 20 commits behind head on main.

Files with missing lines	Patch %	Lines
...ash-backend-utils/src/temporal/worker-bootstrap.ts	0.00%	86 Missing ⚠️
...ibs/@local/hash-backend-utils/src/opentelemetry.ts	42.52%	50 Missing ⚠️
apps/hash-api/src/instrument.mjs	0.00%	11 Missing ⚠️
libs/@local/hash-backend-utils/src/temporal.ts	0.00%	3 Missing ⚠️
...ackend-utils/src/temporal/workflow-span-adapter.ts	90.00%	1 Missing and 1 partial ⚠️
apps/hash-api/src/integrations/linear/webhook.ts	0.00%	1 Missing ⚠️
...c/temporal/interceptors/workflows/opentelemetry.ts	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #8681      +/-   ##
==========================================
- Coverage   62.08%   62.03%   -0.05%     
==========================================
  Files        1341     1345       +4     
  Lines      135072   135263     +191     
  Branches     5744     5782      +38     
==========================================
+ Hits        83854    83908      +54     
- Misses      50310    50446     +136     
- Partials      908      909       +1

Flag	Coverage Δ
apps.hash-ai-worker-ts	`1.41% <ø> (ø)`
apps.hash-api	`0.00% <0.00%> (ø)`
local.hash-backend-utils	`2.81% <27.91%> (+2.81%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codspeed-hq · 2026-04-30T10:31:53Z

Merging this PR will not alter performance

✅ 80 untouched benchmarks

_{Comparing t/be-519-otel-workers (98041ff) with main (891f36f)}

- Fix tsc failure in @tests/hash-backend-integration: setup-opentelemetry.ts pointed at the removed @apps/hash-api/src/graphql/opentelemetry module; migrate it to @local/hash-backend-utils/opentelemetry with the new options-object signature. - Restore the workflow-start identity field that the high-level WorkflowClientTrait::start_workflow auto-populated; the low-level StartWorkflowExecutionRequest defaulted it to "" so Temporal Server could not attribute starts to a client. - Use URL.hostname (not URL.host) when resolving peer.service so outbound origins like https://api.openai.com:443/ still match the exact-host rules. - Fix the worker-bootstrap serviceName doc-comment: it claimed the value is also used as Temporal worker identity, but Worker.create keeps the SDK default (pid@hostname). Document the actual usage (service.name in the OTEL resource). - Move the OTEL shutdown error logging into the per-target try/catch so the label is captured at execution time instead of indexed back out of the targets array.

ESLint's no-unnecessary-condition narrows shuttingDown to its initial false here because the only mutation lives inside the SIGINT/SIGTERM handler closure, which TS-flow does not see from the surrounding scope. Same shape for the rethrow: workerError holds whatever the SDK threw (typed unknown), so only-throw-error fires on the rethrow even though preserving the original value is the correct behaviour.

The helper imports DefaultLogger from @temporalio/worker, which bundles the native Rust core bindings. Keeping it in temporal.ts pulled the worker package into every consumer of createTemporalClient — including hash-api, where it has no business being. worker-bootstrap.ts is the only caller, so the helper moves there and becomes file-private. temporal.ts is back to a pure @temporalio/client surface.

- worker-bootstrap.ts: drop the dead try/catch around httpServer.close(); http.Server.close() reports failures via the optional callback, not synchronously. Pass an error-logging callback instead so close failures actually surface. - opentelemetry.ts: tighten the host vs hostname comment. The example api.openai.com:443 was wrong — URL strips the default :443 for https. Use collector:4318 instead so the rationale matches what the URL spec does. - hash-api/index.ts: call out the LIFO cleanup order explicitly. Without the GracefulShutdown.reverse() reference, a future maintainer reading "Flush OpenTelemetry last" alongside addCleanup() calls below would reasonably conclude the comment is stale and move the registration.

A SIGTERM during the worker startup window — NativeConnection.connect, workflow bundle compile, Worker.create — would hit the Node default handler and terminate the process without flushing OTEL. K8s pod evictions during startup are a real edge case in rolling deploys. Move SIGINT/SIGTERM registration to the top of runWorker, before any async startup work. The handler captures `worker` once it's set; if a signal arrives earlier, it just flips `shuttingDown` and the linear cleanup path below handles flush+exit. The worker.run() call gates on `shuttingDown` so we skip running entirely if startup got interrupted.

Locks down the port-derivation contract that prevents the exporter from tracing its own outbound traffic. The failure mode is exponential span amplification per export batch, hard to spot from logs — easier to catch with a unit test. Covers: configured gRPC port (4317), non-default port from URL (4318), fallback when the URL has no explicit port, and fallback on a malformed endpoint (the helper must not throw on every outgoing request). Reads the hook back via HttpInstrumentation.getConfig() so the test exercises the same wiring runtime callers see.

Switching from Omit to Pick means future Temporal SDK additions don't silently leak through and let callers override helper-owned wiring (activities, connection, taskQueue, sinks, interceptors, workflowBundle). Only key currently in use is maxHeartbeatThrottleInterval; add more on demand.

The previous catch swallowed every throw — including the kind that matters most in dev / CI: typos in instrumentation construction, bad endpoint URLs, missing peer deps. A regression that disabled telemetry on every deploy would slip through. Keep the fall-through-to-undefined behaviour for production (don't crash a prod service over the collector layer) but rethrow elsewhere so the bootstrap error surfaces during development.

biome auto-removed the now-redundant as HttpInstrumentationConfig cast (getConfig() already types it), leaving the import unused. worker-bootstrap.ts:306: same TS-narrowing-vs-closure-mutation pattern as the other shuttingDown checks — TS sees the initial false but the SIGTERM handler can flip it before this line runs.

See the output of git range-diff at https://github.com/hashintel/hash/actions/runs/25175069658

http.RequestOptions.port is string | number | null | undefined; some callers pass a string and `"4317" === 4317` is false. The filter would miss the exporter's own outbound traffic and the very feedback loop this filter exists to prevent would slip through. Number(options.port) handles both shapes, undefined → NaN → not equal, which preserves the original "let unrelated traffic through" path. Test covers the string and undefined cases the original suite missed.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 98041ff. Configure here.}

github-actions · 2026-04-30T17:11:34Z

Benchmark results

@rust/hash-graph-benches – Integrations

policy_resolution_large

Function	Value	Mean	Flame graphs
resolve_policies_for_actor	user: empty, selectivity: high, policies: 2002	$$27.1 \mathrm{ms} \pm 149 \mathrm{μs}\left({\color{gray}0.162 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: empty, selectivity: low, policies: 1	$$3.49 \mathrm{ms} \pm 17.1 \mathrm{μs}\left({\color{gray}2.19 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: empty, selectivity: medium, policies: 1001	$$12.3 \mathrm{ms} \pm 87.2 \mathrm{μs}\left({\color{gray}0.872 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: seeded, selectivity: high, policies: 3314	$$42.2 \mathrm{ms} \pm 321 \mathrm{μs}\left({\color{gray}-1.586 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: seeded, selectivity: low, policies: 1	$$14.8 \mathrm{ms} \pm 135 \mathrm{μs}\left({\color{red}5.15 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: seeded, selectivity: medium, policies: 1526	$$23.6 \mathrm{ms} \pm 168 \mathrm{μs}\left({\color{gray}-0.855 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: system, selectivity: high, policies: 2078	$$28.0 \mathrm{ms} \pm 174 \mathrm{μs}\left({\color{gray}-2.618 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: system, selectivity: low, policies: 1	$$3.81 \mathrm{ms} \pm 16.9 \mathrm{μs}\left({\color{gray}2.59 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: system, selectivity: medium, policies: 1033	$$13.3 \mathrm{ms} \pm 104 \mathrm{μs}\left({\color{gray}1.15 \mathrm{\%}}\right) $$	Flame Graph

policy_resolution_medium

Function	Value	Mean	Flame graphs
resolve_policies_for_actor	user: empty, selectivity: high, policies: 102	$$3.77 \mathrm{ms} \pm 21.6 \mathrm{μs}\left({\color{gray}-0.010 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: empty, selectivity: low, policies: 1	$$2.95 \mathrm{ms} \pm 14.4 \mathrm{μs}\left({\color{gray}0.117 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: empty, selectivity: medium, policies: 51	$$3.32 \mathrm{ms} \pm 16.5 \mathrm{μs}\left({\color{gray}-0.289 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: seeded, selectivity: high, policies: 269	$$5.12 \mathrm{ms} \pm 25.8 \mathrm{μs}\left({\color{gray}-0.337 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: seeded, selectivity: low, policies: 1	$$3.51 \mathrm{ms} \pm 17.9 \mathrm{μs}\left({\color{gray}0.247 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: seeded, selectivity: medium, policies: 107	$$4.10 \mathrm{ms} \pm 29.2 \mathrm{μs}\left({\color{gray}-0.506 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: system, selectivity: high, policies: 133	$$4.39 \mathrm{ms} \pm 25.2 \mathrm{μs}\left({\color{gray}-0.443 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: system, selectivity: low, policies: 1	$$3.44 \mathrm{ms} \pm 23.4 \mathrm{μs}\left({\color{gray}0.288 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: system, selectivity: medium, policies: 63	$$4.05 \mathrm{ms} \pm 24.1 \mathrm{μs}\left({\color{gray}-0.141 \mathrm{\%}}\right) $$	Flame Graph

policy_resolution_none

Function	Value	Mean	Flame graphs
resolve_policies_for_actor	user: empty, selectivity: high, policies: 2	$$2.59 \mathrm{ms} \pm 16.2 \mathrm{μs}\left({\color{gray}0.045 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: empty, selectivity: low, policies: 1	$$2.49 \mathrm{ms} \pm 14.9 \mathrm{μs}\left({\color{gray}-0.755 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: empty, selectivity: medium, policies: 1	$$2.56 \mathrm{ms} \pm 15.7 \mathrm{μs}\left({\color{gray}-0.204 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: system, selectivity: high, policies: 8	$$2.81 \mathrm{ms} \pm 16.6 \mathrm{μs}\left({\color{gray}-1.128 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: system, selectivity: low, policies: 1	$$2.61 \mathrm{ms} \pm 15.0 \mathrm{μs}\left({\color{gray}-1.378 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: system, selectivity: medium, policies: 3	$$2.82 \mathrm{ms} \pm 19.5 \mathrm{μs}\left({\color{gray}-0.374 \mathrm{\%}}\right) $$	Flame Graph

policy_resolution_small

Function	Value	Mean	Flame graphs
resolve_policies_for_actor	user: empty, selectivity: high, policies: 52	$$2.99 \mathrm{ms} \pm 13.2 \mathrm{μs}\left({\color{gray}-0.375 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: empty, selectivity: low, policies: 1	$$2.71 \mathrm{ms} \pm 13.5 \mathrm{μs}\left({\color{gray}-0.565 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: empty, selectivity: medium, policies: 25	$$2.97 \mathrm{ms} \pm 17.7 \mathrm{μs}\left({\color{gray}0.004 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: seeded, selectivity: high, policies: 94	$$3.39 \mathrm{ms} \pm 19.2 \mathrm{μs}\left({\color{gray}-0.347 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: seeded, selectivity: low, policies: 1	$$2.92 \mathrm{ms} \pm 13.6 \mathrm{μs}\left({\color{gray}-0.493 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: seeded, selectivity: medium, policies: 26	$$3.25 \mathrm{ms} \pm 16.9 \mathrm{μs}\left({\color{gray}-0.767 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: system, selectivity: high, policies: 66	$$3.30 \mathrm{ms} \pm 16.3 \mathrm{μs}\left({\color{gray}-0.398 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: system, selectivity: low, policies: 1	$$2.91 \mathrm{ms} \pm 13.7 \mathrm{μs}\left({\color{gray}-0.435 \mathrm{\%}}\right) $$	Flame Graph
resolve_policies_for_actor	user: system, selectivity: medium, policies: 29	$$3.32 \mathrm{ms} \pm 19.1 \mathrm{μs}\left({\color{gray}0.448 \mathrm{\%}}\right) $$	Flame Graph

read_scaling_complete

Function	Value	Mean	Flame graphs
entity_by_id;one_depth	1 entities	$$54.5 \mathrm{ms} \pm 326 \mathrm{μs}\left({\color{gray}-1.279 \mathrm{\%}}\right) $$	Flame Graph
entity_by_id;one_depth	10 entities	$$46.1 \mathrm{ms} \pm 216 \mathrm{μs}\left({\color{gray}-0.072 \mathrm{\%}}\right) $$	Flame Graph
entity_by_id;one_depth	25 entities	$$49.7 \mathrm{ms} \pm 222 \mathrm{μs}\left({\color{gray}-1.472 \mathrm{\%}}\right) $$	Flame Graph
entity_by_id;one_depth	5 entities	$$44.2 \mathrm{ms} \pm 189 \mathrm{μs}\left({\color{gray}-0.101 \mathrm{\%}}\right) $$	Flame Graph
entity_by_id;one_depth	50 entities	$$61.8 \mathrm{ms} \pm 346 \mathrm{μs}\left({\color{gray}-1.990 \mathrm{\%}}\right) $$	Flame Graph
entity_by_id;two_depth	1 entities	$$61.7 \mathrm{ms} \pm 336 \mathrm{μs}\left({\color{gray}-0.476 \mathrm{\%}}\right) $$	Flame Graph
entity_by_id;two_depth	10 entities	$$55.5 \mathrm{ms} \pm 268 \mathrm{μs}\left({\color{gray}-0.520 \mathrm{\%}}\right) $$	Flame Graph
entity_by_id;two_depth	25 entities	$$102 \mathrm{ms} \pm 546 \mathrm{μs}\left({\color{gray}-1.358 \mathrm{\%}}\right) $$	Flame Graph
entity_by_id;two_depth	5 entities	$$46.3 \mathrm{ms} \pm 222 \mathrm{μs}\left({\color{gray}-0.795 \mathrm{\%}}\right) $$	Flame Graph
entity_by_id;two_depth	50 entities	$$291 \mathrm{ms} \pm 922 \mathrm{μs}\left({\color{red}6.12 \mathrm{\%}}\right) $$	Flame Graph
entity_by_id;zero_depth	1 entities	$$19.4 \mathrm{ms} \pm 143 \mathrm{μs}\left({\color{gray}-0.282 \mathrm{\%}}\right) $$	Flame Graph
entity_by_id;zero_depth	10 entities	$$20.0 \mathrm{ms} \pm 125 \mathrm{μs}\left({\color{gray}0.406 \mathrm{\%}}\right) $$	Flame Graph
entity_by_id;zero_depth	25 entities	$$20.4 \mathrm{ms} \pm 135 \mathrm{μs}\left({\color{gray}1.10 \mathrm{\%}}\right) $$	Flame Graph
entity_by_id;zero_depth	5 entities	$$19.5 \mathrm{ms} \pm 97.4 \mathrm{μs}\left({\color{gray}-0.446 \mathrm{\%}}\right) $$	Flame Graph
entity_by_id;zero_depth	50 entities	$$25.1 \mathrm{ms} \pm 119 \mathrm{μs}\left({\color{gray}-0.118 \mathrm{\%}}\right) $$	Flame Graph

read_scaling_linkless

Function	Value	Mean	Flame graphs
entity_by_id	1 entities	$$19.4 \mathrm{ms} \pm 112 \mathrm{μs}\left({\color{gray}-1.717 \mathrm{\%}}\right) $$	Flame Graph
entity_by_id	10 entities	$$19.2 \mathrm{ms} \pm 97.3 \mathrm{μs}\left({\color{gray}-0.648 \mathrm{\%}}\right) $$	Flame Graph
entity_by_id	100 entities	$$19.5 \mathrm{ms} \pm 123 \mathrm{μs}\left({\color{gray}-0.101 \mathrm{\%}}\right) $$	Flame Graph
entity_by_id	1000 entities	$$20.1 \mathrm{ms} \pm 131 \mathrm{μs}\left({\color{gray}-0.222 \mathrm{\%}}\right) $$	Flame Graph
entity_by_id	10000 entities	$$26.7 \mathrm{ms} \pm 204 \mathrm{μs}\left({\color{gray}1.40 \mathrm{\%}}\right) $$	Flame Graph

representative_read_entity

Function	Value	Mean	Flame graphs
entity_by_id	entity type ID: `https://blockprotocol.org/@alice/types/entity-type/block/v/1`	$$35.6 \mathrm{ms} \pm 287 \mathrm{μs}\left({\color{gray}3.76 \mathrm{\%}}\right) $$	Flame Graph
entity_by_id	entity type ID: `https://blockprotocol.org/@alice/types/entity-type/book/v/1`	$$34.7 \mathrm{ms} \pm 318 \mathrm{μs}\left({\color{gray}2.92 \mathrm{\%}}\right) $$	Flame Graph
entity_by_id	entity type ID: `https://blockprotocol.org/@alice/types/entity-type/building/v/1`	$$35.1 \mathrm{ms} \pm 348 \mathrm{μs}\left({\color{gray}1.06 \mathrm{\%}}\right) $$	Flame Graph
entity_by_id	entity type ID: `https://blockprotocol.org/@alice/types/entity-type/organization/v/1`	$$34.2 \mathrm{ms} \pm 289 \mathrm{μs}\left({\color{gray}3.04 \mathrm{\%}}\right) $$	Flame Graph
entity_by_id	entity type ID: `https://blockprotocol.org/@alice/types/entity-type/page/v/2`	$$34.4 \mathrm{ms} \pm 255 \mathrm{μs}\left({\color{gray}0.491 \mathrm{\%}}\right) $$	Flame Graph
entity_by_id	entity type ID: `https://blockprotocol.org/@alice/types/entity-type/person/v/1`	$$34.5 \mathrm{ms} \pm 295 \mathrm{μs}\left({\color{gray}0.338 \mathrm{\%}}\right) $$	Flame Graph
entity_by_id	entity type ID: `https://blockprotocol.org/@alice/types/entity-type/playlist/v/1`	$$33.9 \mathrm{ms} \pm 317 \mathrm{μs}\left({\color{gray}-3.988 \mathrm{\%}}\right) $$	Flame Graph
entity_by_id	entity type ID: `https://blockprotocol.org/@alice/types/entity-type/song/v/1`	$$35.0 \mathrm{ms} \pm 331 \mathrm{μs}\left({\color{gray}0.370 \mathrm{\%}}\right) $$	Flame Graph
entity_by_id	entity type ID: `https://blockprotocol.org/@alice/types/entity-type/uk-address/v/1`	$$34.6 \mathrm{ms} \pm 368 \mathrm{μs}\left({\color{gray}1.31 \mathrm{\%}}\right) $$	Flame Graph

representative_read_entity_type

Function	Value	Mean	Flame graphs
get_entity_type_by_id	Account ID: `bf5a9ef5-dc3b-43cf-a291-6210c0321eba`	$$8.59 \mathrm{ms} \pm 43.3 \mathrm{μs}\left({\color{gray}0.305 \mathrm{\%}}\right) $$	Flame Graph

representative_read_multiple_entities

Function	Value	Mean	Flame graphs
entity_by_property	traversal_paths=0	0	$$92.9 \mathrm{ms} \pm 639 \mathrm{μs}\left({\color{gray}0.189 \mathrm{\%}}\right) $$
entity_by_property	traversal_paths=255	1,resolve_depths=inherit:1;values:255;properties:255;links:127;link_dests:126;type:true	$$148 \mathrm{ms} \pm 497 \mathrm{μs}\left({\color{gray}0.466 \mathrm{\%}}\right) $$
entity_by_property	traversal_paths=2	1,resolve_depths=inherit:0;values:0;properties:0;links:0;link_dests:0;type:false	$$99.8 \mathrm{ms} \pm 492 \mathrm{μs}\left({\color{gray}-0.749 \mathrm{\%}}\right) $$
entity_by_property	traversal_paths=2	1,resolve_depths=inherit:0;values:0;properties:0;links:1;link_dests:0;type:true	$$111 \mathrm{ms} \pm 567 \mathrm{μs}\left({\color{gray}-0.400 \mathrm{\%}}\right) $$
entity_by_property	traversal_paths=2	1,resolve_depths=inherit:0;values:0;properties:2;links:1;link_dests:0;type:true	$$118 \mathrm{ms} \pm 472 \mathrm{μs}\left({\color{gray}-0.309 \mathrm{\%}}\right) $$
entity_by_property	traversal_paths=2	1,resolve_depths=inherit:0;values:2;properties:2;links:1;link_dests:0;type:true	$$127 \mathrm{ms} \pm 514 \mathrm{μs}\left({\color{gray}-0.769 \mathrm{\%}}\right) $$
link_by_source_by_property	traversal_paths=0	0	$$103 \mathrm{ms} \pm 520 \mathrm{μs}\left({\color{gray}-0.022 \mathrm{\%}}\right) $$
link_by_source_by_property	traversal_paths=255	1,resolve_depths=inherit:1;values:255;properties:255;links:127;link_dests:126;type:true	$$135 \mathrm{ms} \pm 543 \mathrm{μs}\left({\color{gray}1.05 \mathrm{\%}}\right) $$
link_by_source_by_property	traversal_paths=2	1,resolve_depths=inherit:0;values:0;properties:0;links:0;link_dests:0;type:false	$$109 \mathrm{ms} \pm 425 \mathrm{μs}\left({\color{gray}-0.846 \mathrm{\%}}\right) $$
link_by_source_by_property	traversal_paths=2	1,resolve_depths=inherit:0;values:0;properties:0;links:1;link_dests:0;type:true	$$119 \mathrm{ms} \pm 484 \mathrm{μs}\left({\color{gray}-0.435 \mathrm{\%}}\right) $$
link_by_source_by_property	traversal_paths=2	1,resolve_depths=inherit:0;values:0;properties:2;links:1;link_dests:0;type:true	$$121 \mathrm{ms} \pm 603 \mathrm{μs}\left({\color{gray}-0.282 \mathrm{\%}}\right) $$
link_by_source_by_property	traversal_paths=2	1,resolve_depths=inherit:0;values:2;properties:2;links:1;link_dests:0;type:true	$$121 \mathrm{ms} \pm 513 \mathrm{μs}\left({\color{gray}-0.177 \mathrm{\%}}\right) $$

scenarios

Function	Value	Mean	Flame graphs
full_test	query-limited	$$181 \mathrm{ms} \pm 723 \mathrm{μs}\left({\color{gray}1.26 \mathrm{\%}}\right) $$	Flame Graph
full_test	query-unlimited	$$190 \mathrm{ms} \pm 1.48 \mathrm{ms}\left({\color{gray}-4.952 \mathrm{\%}}\right) $$	Flame Graph
linked_queries	query-limited	$$40.7 \mathrm{ms} \pm 196 \mathrm{μs}\left({\color{lightgreen}-61.787 \mathrm{\%}}\right) $$	Flame Graph
linked_queries	query-unlimited	$$526 \mathrm{ms} \pm 781 \mathrm{μs}\left({\color{lightgreen}-7.763 \mathrm{\%}}\right) $$	Flame Graph

TimDiekmann added 2 commits April 30, 2026 12:22

BE-519: Set up OpenTelemetry across hash-api and Temporal workers

9b6e7ee

graphite-app Bot assigned TimDiekmann Apr 30, 2026

vercel Bot deployed to Preview – petrinaut April 30, 2026 10:26 View deployment

augmentcode Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread libs/@local/hash-backend-utils/src/opentelemetry.ts Outdated

Comment thread libs/@local/hash-backend-utils/src/temporal/worker-bootstrap.ts Outdated

github-advanced-security AI found potential problems Apr 30, 2026

View reviewed changes

Comment thread libs/@local/hash-backend-utils/src/opentelemetry.ts Fixed

cursor Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread libs/@local/temporal-client/src/ai.rs

vercel Bot deployed to Preview – hash April 30, 2026 10:38 View deployment

vercel Bot temporarily deployed to Preview – petrinaut April 30, 2026 11:28 Inactive

github-actions Bot added area/tests New or updated tests area/tests > integration New or updated integration tests labels Apr 30, 2026

github-advanced-security AI found potential problems Apr 30, 2026

View reviewed changes

Comment thread libs/@local/temporal-client/src/ai.rs Fixed

Comment thread libs/@local/temporal-client/src/ai.rs Fixed

graphite-app Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread apps/hash-api/src/instrument.mjs Outdated

Avoid cloning client identity in Temporal client

2b2962d

vercel Bot temporarily deployed to Preview – petrinaut April 30, 2026 11:36 Inactive

vercel Bot deployed to Preview – hash April 30, 2026 11:45 View deployment

cursor Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread libs/@local/hash-backend-utils/src/temporal/worker-bootstrap.ts

graphite-app Bot requested a review from a team April 30, 2026 12:01

vercel Bot temporarily deployed to Preview – petrinaut April 30, 2026 12:29 Inactive

vercel Bot temporarily deployed to Preview – petrinaut April 30, 2026 13:03 Inactive

graphite-app Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread libs/@local/hash-backend-utils/src/temporal/worker-bootstrap.ts Outdated

cursor Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread libs/@local/hash-backend-utils/src/temporal.ts Outdated

vercel Bot temporarily deployed to Preview – petrinaut April 30, 2026 14:12 Inactive

TimDiekmann added 5 commits April 30, 2026 16:53

vercel Bot temporarily deployed to Preview – petrinaut April 30, 2026 15:04 Inactive

CiaranMn previously approved these changes Apr 30, 2026

View reviewed changes

vercel Bot deployed to Preview – hash April 30, 2026 15:07 View deployment

TimDiekmann enabled auto-merge April 30, 2026 15:07

cursor Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread libs/@local/hash-backend-utils/src/temporal/worker-bootstrap.ts

Comment thread libs/@local/hash-backend-utils/src/opentelemetry.ts Outdated

vercel Bot temporarily deployed to Preview – petrinaut April 30, 2026 15:46 Inactive

vercel Bot temporarily deployed to Preview – petrinaut April 30, 2026 15:50 Inactive

cursor Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread libs/@local/hash-backend-utils/src/temporal.ts

TimDiekmann requested a review from vilkinsons May 11, 2026 07:38

vilkinsons approved these changes May 11, 2026

View reviewed changes

TimDiekmann added this pull request to the merge queue May 11, 2026

Merged via the queue into main with commit f8eae3a May 11, 2026
179 of 181 checks passed

TimDiekmann deleted the t/be-519-otel-workers branch May 11, 2026 08:23

Conversation

TimDiekmann commented Apr 30, 2026

🌟 What is the purpose of this PR?

🔗 Related links

🚫 Blocked by

🔍 What does this change?

Pre-Merge Checklist 🚀

🚢 Has this modified a publishable library?

📜 Does this require a change to the docs?

🕸️ Does this require a change to the Turbo Graph?

⚠️ Known issues

🐾 Next steps

🛡 What tests cover this?

❓ How to test this?

📹 Demo

Uh oh!

vercel Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

augmentcode Bot commented Apr 30, 2026

Uh oh!

augmentcode Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Apr 30, 2026

Benchmark results

@rust/hash-graph-benches – Integrations

policy_resolution_large

policy_resolution_medium

policy_resolution_none

policy_resolution_small

read_scaling_complete

read_scaling_linkless

representative_read_entity

representative_read_entity_type

representative_read_multiple_entities

scenarios

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

vercel Bot commented Apr 30, 2026 •

edited

Loading

cursor Bot commented Apr 30, 2026 •

edited

Loading

codecov Bot commented Apr 30, 2026 •

edited

Loading

codspeed-hq Bot commented Apr 30, 2026 •

edited

Loading