Skip to content

Make the shared background updater safe for library consumers#404

Open
adtyavrdhn wants to merge 19 commits into
mainfrom
genai-prices-update
Open

Make the shared background updater safe for library consumers#404
adtyavrdhn wants to merge 19 commits into
mainfrom
genai-prices-update

Conversation

@adtyavrdhn

@adtyavrdhn adtyavrdhn commented Jun 10, 2026

Copy link
Copy Markdown
Member

What

Makes UpdatePrices a ref-counted claim on one shared, process-wide background updater, so several libraries (e.g. Logfire, Pydantic AI) can opt into background price updates independently. Context: pydantic/logfire#1960.

Changes

  • Starting an UpdatePrices is a claim on a single shared updater, not a private thread. The first start() launches the background daemon thread; later start()s on other instances join it and bump a reference count. The thread stops, and prices revert to the bundled data, only when the last started instance is stopped.
  • Safe to create and start from anywhere, including multiple libraries at once — no duplicate threads, and a second instance no longer raises RuntimeError (starting the same instance twice still does).
  • Config is first-wins: the first instance to start fixes url / update_interval / request_timeout; starting another with different settings logs a warning and joins the running updater.
  • stop() never raises — a background fetch error is logged instead; wait() still raises it.
  • update_prices_in_background() and UpdatePricesHandle are removed; UpdatePrices is the single public surface (an internal _BackgroundUpdater is the worker).
  • A single module lock guards both the ref-counting and the snapshot install/revert fencing; a fetch in flight when the updater stops can finish but never installs its result, and is not reported to waiters as a successful update.
  • os.fork() is handled (e.g. gunicorn preload_app=True): the daemon thread is restarted in the child.
  • README updated.

adtyavrdhn and others added 5 commits May 26, 2026 15:23
update_prices_in_background() is intended for logfire and pydantic-ai to
opt into price updates independently without duplicating updater threads.
Harden the handle/refcount design for that use:

- Bind each UpdatePricesHandle to the updater it was created for, so
  directly-constructed or stale handles are inert and cannot stop an
  updater they never claimed
- Return an inert handle instead of raising when a manually started
  UpdatePrices is already keeping prices fresh
- Never raise from UpdatePricesHandle.close(); log background fetch
  errors instead so consumers can close in shutdown paths
- Add GENAI_PRICES_DISABLE_AUTO_UPDATE env var as a process-wide opt-out
- Document shared-updater semantics: revert to bundled data on last
  close, non-blocking first fetch, and the inert-handle caveat

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
update_prices_in_background() is intended for logfire and pydantic-ai to
opt into price updates independently without duplicating updater threads.
Harden the handle/refcount design for that use:

- Bind each UpdatePricesHandle to the updater it was created for, so
  directly-constructed or stale handles are inert and cannot stop an
  updater they never claimed
- Return an inert handle instead of raising when a manually started
  UpdatePrices is already keeping prices fresh
- Never raise from UpdatePricesHandle.close(); log background fetch
  errors instead so consumers can close in shutdown paths
- Add GENAI_PRICES_DISABLE_AUTO_UPDATE env var as a process-wide opt-out
- Document shared-updater semantics: revert to bundled data on last
  close, non-blocking first fetch, and the inert-handle caveat

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 4 files

Re-trigger cubic

…s into genai-prices-update

# Conflicts:
#	packages/python/genai_prices/update_prices.py

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Open in Devin Review

@adtyavrdhn adtyavrdhn requested a review from dmontagu June 10, 2026 08:23
adtyavrdhn and others added 6 commits June 10, 2026 14:20
…pdater

A manually started UpdatePrices now takes precedence over the shared
updater from update_prices_in_background() regardless of start order:
start() stops the shared updater and takes over instead of raising
RuntimeError, and existing handles become inert. Two manually created
instances still conflict loudly.

Both precedence paths (takeover, and deferring to an already-running
manual updater) are logged at INFO on the existing genai-prices logger,
matching the module's lifecycle-at-INFO convention.

Also documents the known fork limitation (e.g. gunicorn preload_app):
the module-level updater state does not survive os.fork(), so a forked
worker re-enabling background updates never starts a new thread.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Address the confirmed findings from an adversarially-verified review of
this PR:

- wait_prices_updated_sync/async never raise and return False until a
  fetch actually succeeds (new _update_succeeded flag): previously the
  first waiter got the raw fetch exception and later waiters got True
  with bundled data still active. The stored exception is no longer
  consumed by module-level waits, so it still surfaces exactly once to
  the updater's owner via wait()/stop(), or in close()'s log.
- UpdatePrices.wait() no longer returns True when the only completed
  fetch attempt failed.
- Document the blocking window: close()/takeover join the updater
  thread under the global lock, so lifecycle calls can block for
  roughly the request timeout mid-fetch (calc_price is unaffected).
- Extend the fork TODO with the inherited-lock deadlock mode; warn
  against copying handles; document stop() as a no-op when unstarted.
- New tests: 16-thread acquire/close stress, stale handle closed while
  a different live updater runs, failed-fetch wait semantics, env var
  not affecting manual updaters, stop() on an unstarted instance, and
  the inert-handle INFO log.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Both planned consumers (logfire, pydantic-ai) expose their own opt-in
flags, so a library-side opt-out has no user while auto-update is not
on by default. It can be reintroduced if defaults ever flip.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Previously UpdatePricesHandle.close() (and takeover in start()) joined the
background thread while holding the global lock, so a fetch in flight could
block the closer - and every other updater API call - for roughly the
request timeout (10-15s). That cost lands directly on consumers: logfire
calls close() from configure() and atexit.

Following the pattern used by LaunchDarkly FDv2, statsig, and OpenTelemetry
(join with a short timeout outside any lock, then abandon the daemon
thread), and LaunchDarkly FDv2's stop-flag fencing gate before publishing:

- Snapshot installs are gated: the background thread re-checks _stop_event
  under a new _snapshot_lock before calling set_custom_snapshot, and the
  stop path sets the event and clears the snapshot under the same lock, so
  a draining fetch can never install its result after stop.
- close() and takeover now detach under the global lock (bookkeeping only),
  then join with a 5s grace period outside it, logging a warning and
  abandoning the daemon thread on expiry.
- Manual UpdatePrices.stop() keeps its unbounded join (owners expect
  completion and the re-raised stored exception) but joins outside the
  global lock, so it no longer blocks unrelated API calls.

httpx requests cannot be aborted cross-thread (encode/httpx#1633), so the
in-flight fetch itself remains bounded only by the request timeout.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…in the child

Previously the module-level state did not survive os.fork() (e.g. gunicorn
with preload_app=True): the child inherited bookkeeping claiming an updater
was running while its thread was dead - so prices stayed frozen at the
fork-time snapshot - and a fork landing while another thread held the
global lock left it permanently held in the child, deadlocking every
lock-taking updater call. With logfire starting the updater from
configure(), preload+fork is a mainstream deployment path for it.

Following the OpenTelemetry BatchProcessor pattern:

- Fork hooks are registered lazily on first updater start. before/
  after_in_parent hold the global lock across the fork so the child
  inherits consistent bookkeeping, never a torn mid-mutation state.
- after_in_child replaces the lock and restarts a running updater on the
  same UpdatePrices instance - fresh events and snapshot lock, new daemon
  thread - preserving instance identity so handles held by consumers
  (logfire, pydantic-ai) remain valid claims in the child.
- On any revival failure the child degrades to a clean reset (bundled
  prices, no deadlock) with an error log.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Comment thread packages/python/genai_prices/update_prices.py
def wait(self, timeout: float | None = None) -> bool:
"""Wait for the prices to be updated in the background task.

Raises the stored exception if the update attempt failed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't changed the contract, it was just not documented

except Exception as e:
self._background_exc = e
self._prices_updated.set()
logger.error('Error updating genai-prices in the background (%s): %s', type(e).__name__, e)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to keep it cleaner but I don't feel strongly about it.

@adtyavrdhn adtyavrdhn requested a review from alexmojaki June 18, 2026 10:00
Comment thread packages/python/README.md Outdated
- If an `UpdatePrices` instance has already been started manually, `update_prices_in_background()` does not start
a second updater and returns a handle that does nothing on close: prices are already being kept up to date, and
the manual updater's lifetime stays with whoever started it.
- If `UpdatePrices.start()` is called while the shared updater is running, the shared updater is stopped and the

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the idea of this additional API, it adds weird complexity, this particular bullet point of how they interact is the strongest case.

I think UpdatePrices should be updated to behave more like the new API, meaning multiple calls to it are allowed and use a shared object with 'ref counting'. Multiple calls with different urls or other config can raise an error or emit a warning.

adtyavrdhn and others added 2 commits June 26, 2026 19:31
The instantiable UpdatePrices class let N "independent" instances all write
one process-global snapshot, which forced the takeover/precedence state
machine, the "only one active" RuntimeError, and a second snapshot lock.

Make update_prices_in_background() the single, ref-counted entry point with
first-wins config, and reduce UpdatePrices to a deprecated shim over it.
Fold the snapshot fencing onto the one module lock, so the whole updater is
guarded by a single RLock over a trivial state machine. Revert-on-last-close
is preserved.

Adds docs/adr/0001 capturing the decision and its open questions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lead with update_prices_in_background() as the primary, configurable,
ref-counted entry point with first-wins config and library/app guidance;
document UpdatePrices as deprecated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 6 files (changes from recent commits).

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

Comment thread docs/adr/0001-singleton-background-updater.md Outdated
Comment thread packages/python/genai_prices/update_prices.py Outdated
Comment thread packages/python/genai_prices/update_prices.py Outdated
adtyavrdhn and others added 3 commits June 26, 2026 19:44
- update_prices_in_background() now also warns when request_timeout differs
  from the running updater, matching the documented first-wins contract.
- Restore the historical UpdatePrices.stop() behaviour of re-raising a stored
  background fetch error: split handle teardown into _release() (returns the
  exception) so close() logs it while the deprecated stop() raises it.
- Remove docs/adr/0001 (internal decision aid, not repo material).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ound

Per review (alexmojaki): don't add a second public surface. Collapse back
onto UpdatePrices as the single API and make starting it a ref-counted claim
on one shared, process-wide updater (first-wins config, warn on mismatch).
Remove update_prices_in_background() and UpdatePricesHandle.

UpdatePrices is no longer deprecated; it is the primary library API. As the
library-shutdown-friendly surface, stop() now logs background fetch errors
instead of raising (wait() still raises). Internals unchanged: one RLock, the
folded snapshot fencing, and os.fork() restart all carry over.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Race found by Codex review: when stop() wins the fence and an in-flight
fetch is discarded by _install_snapshot, the background loop still set
_update_succeeded=True and signalled the event, so a waiter parked before
the stop could get True while the snapshot had reverted to bundled data.

_install_snapshot now returns whether it installed; the loop reflects that
in _update_succeeded. Adds a deterministic regression test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants