Make the shared background updater safe for library consumers#404
Make the shared background updater safe for library consumers#404adtyavrdhn wants to merge 19 commits into
Conversation
update_prices_in_background() is intended for logfire and pydantic-ai to opt into price updates independently without duplicating updater threads. Harden the handle/refcount design for that use: - Bind each UpdatePricesHandle to the updater it was created for, so directly-constructed or stale handles are inert and cannot stop an updater they never claimed - Return an inert handle instead of raising when a manually started UpdatePrices is already keeping prices fresh - Never raise from UpdatePricesHandle.close(); log background fetch errors instead so consumers can close in shutdown paths - Add GENAI_PRICES_DISABLE_AUTO_UPDATE env var as a process-wide opt-out - Document shared-updater semantics: revert to bundled data on last close, non-blocking first fetch, and the inert-handle caveat Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
update_prices_in_background() is intended for logfire and pydantic-ai to opt into price updates independently without duplicating updater threads. Harden the handle/refcount design for that use: - Bind each UpdatePricesHandle to the updater it was created for, so directly-constructed or stale handles are inert and cannot stop an updater they never claimed - Return an inert handle instead of raising when a manually started UpdatePrices is already keeping prices fresh - Never raise from UpdatePricesHandle.close(); log background fetch errors instead so consumers can close in shutdown paths - Add GENAI_PRICES_DISABLE_AUTO_UPDATE env var as a process-wide opt-out - Document shared-updater semantics: revert to bundled data on last close, non-blocking first fetch, and the inert-handle caveat
…s into genai-prices-update # Conflicts: # packages/python/genai_prices/update_prices.py
…pdater A manually started UpdatePrices now takes precedence over the shared updater from update_prices_in_background() regardless of start order: start() stops the shared updater and takes over instead of raising RuntimeError, and existing handles become inert. Two manually created instances still conflict loudly. Both precedence paths (takeover, and deferring to an already-running manual updater) are logged at INFO on the existing genai-prices logger, matching the module's lifecycle-at-INFO convention. Also documents the known fork limitation (e.g. gunicorn preload_app): the module-level updater state does not survive os.fork(), so a forked worker re-enabling background updates never starts a new thread. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Address the confirmed findings from an adversarially-verified review of this PR: - wait_prices_updated_sync/async never raise and return False until a fetch actually succeeds (new _update_succeeded flag): previously the first waiter got the raw fetch exception and later waiters got True with bundled data still active. The stored exception is no longer consumed by module-level waits, so it still surfaces exactly once to the updater's owner via wait()/stop(), or in close()'s log. - UpdatePrices.wait() no longer returns True when the only completed fetch attempt failed. - Document the blocking window: close()/takeover join the updater thread under the global lock, so lifecycle calls can block for roughly the request timeout mid-fetch (calc_price is unaffected). - Extend the fork TODO with the inherited-lock deadlock mode; warn against copying handles; document stop() as a no-op when unstarted. - New tests: 16-thread acquire/close stress, stale handle closed while a different live updater runs, failed-fetch wait semantics, env var not affecting manual updaters, stop() on an unstarted instance, and the inert-handle INFO log. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Both planned consumers (logfire, pydantic-ai) expose their own opt-in flags, so a library-side opt-out has no user while auto-update is not on by default. It can be reintroduced if defaults ever flip. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Previously UpdatePricesHandle.close() (and takeover in start()) joined the background thread while holding the global lock, so a fetch in flight could block the closer - and every other updater API call - for roughly the request timeout (10-15s). That cost lands directly on consumers: logfire calls close() from configure() and atexit. Following the pattern used by LaunchDarkly FDv2, statsig, and OpenTelemetry (join with a short timeout outside any lock, then abandon the daemon thread), and LaunchDarkly FDv2's stop-flag fencing gate before publishing: - Snapshot installs are gated: the background thread re-checks _stop_event under a new _snapshot_lock before calling set_custom_snapshot, and the stop path sets the event and clears the snapshot under the same lock, so a draining fetch can never install its result after stop. - close() and takeover now detach under the global lock (bookkeeping only), then join with a 5s grace period outside it, logging a warning and abandoning the daemon thread on expiry. - Manual UpdatePrices.stop() keeps its unbounded join (owners expect completion and the re-raised stored exception) but joins outside the global lock, so it no longer blocks unrelated API calls. httpx requests cannot be aborted cross-thread (encode/httpx#1633), so the in-flight fetch itself remains bounded only by the request timeout. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…in the child Previously the module-level state did not survive os.fork() (e.g. gunicorn with preload_app=True): the child inherited bookkeeping claiming an updater was running while its thread was dead - so prices stayed frozen at the fork-time snapshot - and a fork landing while another thread held the global lock left it permanently held in the child, deadlocking every lock-taking updater call. With logfire starting the updater from configure(), preload+fork is a mainstream deployment path for it. Following the OpenTelemetry BatchProcessor pattern: - Fork hooks are registered lazily on first updater start. before/ after_in_parent hold the global lock across the fork so the child inherits consistent bookkeeping, never a torn mid-mutation state. - after_in_child replaces the lock and restarts a running updater on the same UpdatePrices instance - fresh events and snapshot lock, new daemon thread - preserving instance identity so handles held by consumers (logfire, pydantic-ai) remain valid claims in the child. - On any revival failure the child degrades to a clean reset (bundled prices, no deadlock) with an error log. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
| def wait(self, timeout: float | None = None) -> bool: | ||
| """Wait for the prices to be updated in the background task. | ||
|
|
||
| Raises the stored exception if the update attempt failed. |
There was a problem hiding this comment.
I haven't changed the contract, it was just not documented
| except Exception as e: | ||
| self._background_exc = e | ||
| self._prices_updated.set() | ||
| logger.error('Error updating genai-prices in the background (%s): %s', type(e).__name__, e) |
There was a problem hiding this comment.
I changed this to keep it cleaner but I don't feel strongly about it.
| - If an `UpdatePrices` instance has already been started manually, `update_prices_in_background()` does not start | ||
| a second updater and returns a handle that does nothing on close: prices are already being kept up to date, and | ||
| the manual updater's lifetime stays with whoever started it. | ||
| - If `UpdatePrices.start()` is called while the shared updater is running, the shared updater is stopped and the |
There was a problem hiding this comment.
I don't like the idea of this additional API, it adds weird complexity, this particular bullet point of how they interact is the strongest case.
I think UpdatePrices should be updated to behave more like the new API, meaning multiple calls to it are allowed and use a shared object with 'ref counting'. Multiple calls with different urls or other config can raise an error or emit a warning.
The instantiable UpdatePrices class let N "independent" instances all write one process-global snapshot, which forced the takeover/precedence state machine, the "only one active" RuntimeError, and a second snapshot lock. Make update_prices_in_background() the single, ref-counted entry point with first-wins config, and reduce UpdatePrices to a deprecated shim over it. Fold the snapshot fencing onto the one module lock, so the whole updater is guarded by a single RLock over a trivial state machine. Revert-on-last-close is preserved. Adds docs/adr/0001 capturing the decision and its open questions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lead with update_prices_in_background() as the primary, configurable, ref-counted entry point with first-wins config and library/app guidance; document UpdatePrices as deprecated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
3 issues found across 6 files (changes from recent commits).
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
- update_prices_in_background() now also warns when request_timeout differs from the running updater, matching the documented first-wins contract. - Restore the historical UpdatePrices.stop() behaviour of re-raising a stored background fetch error: split handle teardown into _release() (returns the exception) so close() logs it while the deprecated stop() raises it. - Remove docs/adr/0001 (internal decision aid, not repo material). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ound Per review (alexmojaki): don't add a second public surface. Collapse back onto UpdatePrices as the single API and make starting it a ref-counted claim on one shared, process-wide updater (first-wins config, warn on mismatch). Remove update_prices_in_background() and UpdatePricesHandle. UpdatePrices is no longer deprecated; it is the primary library API. As the library-shutdown-friendly surface, stop() now logs background fetch errors instead of raising (wait() still raises). Internals unchanged: one RLock, the folded snapshot fencing, and os.fork() restart all carry over. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Race found by Codex review: when stop() wins the fence and an in-flight fetch is discarded by _install_snapshot, the background loop still set _update_succeeded=True and signalled the event, so a waiter parked before the stop could get True while the snapshot had reverted to bundled data. _install_snapshot now returns whether it installed; the loop reflects that in _update_succeeded. Adds a deterministic regression test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
What
Makes
UpdatePricesa ref-counted claim on one shared, process-wide background updater, so several libraries (e.g. Logfire, Pydantic AI) can opt into background price updates independently. Context: pydantic/logfire#1960.Changes
UpdatePricesis a claim on a single shared updater, not a private thread. The firststart()launches the background daemon thread; laterstart()s on other instances join it and bump a reference count. The thread stops, and prices revert to the bundled data, only when the last started instance is stopped.RuntimeError(starting the same instance twice still does).url/update_interval/request_timeout; starting another with different settings logs a warning and joins the running updater.stop()never raises — a background fetch error is logged instead;wait()still raises it.update_prices_in_background()andUpdatePricesHandleare removed;UpdatePricesis the single public surface (an internal_BackgroundUpdateris the worker).os.fork()is handled (e.g. gunicornpreload_app=True): the daemon thread is restarted in the child.