Skip to content

feat(proxy): add per-service proxy auth via get_proxy_params()#12724

Open
mekarpeles wants to merge 4 commits into
masterfrom
12715/per-service-proxy-params
Open

feat(proxy): add per-service proxy auth via get_proxy_params()#12724
mekarpeles wants to merge 4 commits into
masterfrom
12715/per-service-proxy-params

Conversation

@mekarpeles
Copy link
Copy Markdown
Member

@mekarpeles mekarpeles commented May 12, 2026

Summary

Fixes #12715. The global setup_requests() approach (setting HTTP_PROXY/HTTPS_PROXY env vars with no credentials) is insufficient for services that now require per-service proxy auth under the new squid ACL policy.

  • Adds get_proxy_params(service_tag) to utils.py — reads http_proxy for the shared proxy URL and http_proxy_services.<service> for credentials (user:password), returns a requests-compatible proxies dict with credentials embedded, or None to fall through to the global env var default set by setup_requests()
  • Updates recaptcha.py to pass proxies=get_proxy_params("recaptcha") on the verify call
  • Updates affiliate_server.py load_config to read http_proxy + http_proxy_services.amazon directly
  • Documents the new config shape in conf/openlibrary.yml (commented out — no proxy needed in dev)

Config shape (in /olsystem/etc/openlibrary.yml)

# All services route through the same proxy URL.
# setup_requests() sets HTTP_PROXY/HTTPS_PROXY from this value — every request
# uses it by default. Services listed in http_proxy_services additionally
# authenticate with per-service credentials embedded in the proxy URL.
http_proxy: 'http://http-proxy.us.archive.org:8080'

http_proxy_services:
  recaptcha: 'user:password'
  amazon: 'user:password'

When http_proxy_services.<service> is absent, get_proxy_params returns None and requests falls back to the HTTP_PROXY env var (same proxy, no credentials).

Testing

  • get_proxy_params("recaptcha") returns None when http_proxy is unset → existing env var behaviour unchanged
  • get_proxy_params("recaptcha") returns None when service is absent from http_proxy_services → falls back to env var proxy (no credentials)
  • get_proxy_params("recaptcha") returns {"http": "http://user:pass@proxy:3128", "https": ...} when service entry is present

Stakeholders

/cc @cdrini @scottbarnes (ops — needs /olsystem/etc/openlibrary.yml updated with service credentials once this merges)

Copilot AI review requested due to automatic review settings May 12, 2026 04:55
@github-actions github-actions Bot added the Priority: 0 Fix now: Issue prevents users from using the site or active data corruption. [managed] label May 12, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds per-service HTTP proxy configuration to support authenticated squid ACLs (fixing #12715) while keeping existing global http_proxy behavior as the default fallback.

Changes:

  • Introduces get_proxy_params(service_tag) to build a requests-compatible proxies dict from http_proxies.<service> config.
  • Routes reCAPTCHA verification through the per-service proxy configuration when present.
  • Updates the affiliate server to prefer http_proxies.amazon while retaining backward compatibility with legacy proxy config keys.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
scripts/affiliate_server.py Prefer http_proxies.amazon for Amazon proxy settings, with fallback to legacy flat keys.
openlibrary/plugins/upstream/utils.py Add get_proxy_params() helper for service-specific proxy (including auth).
openlibrary/plugins/recaptcha/recaptcha.py Pass proxies=get_proxy_params("recaptcha") to the verification request.
conf/openlibrary.yml Document the new http_proxies config shape (commented examples).

Comment thread openlibrary/plugins/upstream/utils.py Outdated
Comment thread openlibrary/plugins/upstream/utils.py
@mekarpeles mekarpeles force-pushed the 12715/per-service-proxy-params branch from bfc7d8c to 940f8f8 Compare May 12, 2026 05:03
Comment thread scripts/affiliate_server.py
@mekarpeles mekarpeles force-pushed the 12715/per-service-proxy-params branch from 93e1643 to 7822fec Compare May 12, 2026 15:39
@mekarpeles
Copy link
Copy Markdown
Member Author

Investigation: Amazon 403 on worker restart

Root cause identified. The creatorsapi_python_sdk's OAuth2TokenManager.refresh_token() (in creatorsapi_python_sdk/auth/oauth2_token_manager.py) uses bare requests.post() to fetch OAuth tokens — no proxy configuration of its own, just whatever HTTP_PROXY/HTTPS_PROXY env vars are set.

The flow:

  1. load_config() creates AmazonCreatorsAPI(proxy_url=..., proxy_creds=...), which injects a custom RESTClientObject with proxy+auth into self.api._api_client.rest_client — this covers API calls only.
  2. setup_requests() sets HTTP_PROXY/HTTPS_PROXY to the bare proxy URL (no auth).
  3. OAuth2TokenManager caches the access token in-process memory. On every worker restart, the cache is cold and refresh_token() is called.
  4. refresh_token() does requests.post("https://api.amazon.com/auth/o2/token", ...) — this goes through the bare env-var proxy, which has no auth → 403 under the new squid ACL policy.

Why it worked initially: The token was valid in-process during the session. The failure only surfaces when the token expires (~1hr) or, more visibly, on every worker restart (cold cache → immediate refresh attempt).

Our PR does not fix this. Our change is backward-compatible in production (falls back to config.get("http_proxy") since http_proxies.amazon won't be set yet), but neither the old nor the new code routes the OAuth token fetch through an authenticated proxy.

Fix options (not in this PR — needs a follow-up):

  1. Add token endpoints to no_proxy_addresses (simplest — config-only, no code change):

    no_proxy_addresses:
      - api.amazon.com          # LWA v3.1 token endpoint
      - api.amazon.co.uk        # LWA v3.2
      - creatorsapi.auth.us-east-1.amazoncognito.com  # Cognito v2.1

    This lets the token fetch go direct, bypassing the squid proxy entirely.

  2. Set the auth-bearing proxy in env vars — change setup_requests() to use the authenticated URL for HTTP_PROXY/HTTPS_PROXY. This makes all requests calls (including OAuth refresh) go through the authenticated proxy. Riskier: affects everything globally.

  3. Patch OAuth2TokenManager in our code to inject a proxy-aware requests.Session — surgical but couples us to the SDK's internals.

Recommended: Option 1 (add to no_proxy_addresses in olsystem config) as an immediate ops fix, independent of this PR. This PR can still merge — it doesn't worsen the situation.

@jimchamp jimchamp added the Needs: Special Deploy This PR will need a non-standard deploy to production label May 18, 2026
Copy link
Copy Markdown
Collaborator

@jimchamp jimchamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain that the is_lwa method exists on OAuth2Config objects. Other than that, this looks okay to me.

Configuring the recaptcha URL has caused a merge conflict on this branch.

Comment thread openlibrary/core/vendors.py
@jimchamp jimchamp added the Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] label May 20, 2026
The global setup_requests() approach (HTTP_PROXY env vars, no auth)
is insufficient for services that require per-service proxy credentials.

Adds get_proxy_params(service_tag) to utils.py: reads from a new
http_proxies config section (url/user/password per service) and returns
a requests-compatible proxies dict with credentials embedded in the URL,
or None to fall through to the global env var default.

Updates recaptcha to pass proxies=get_proxy_params("recaptcha") so
that service-specific proxy auth is used when configured.

Updates affiliate_server load_config to read amazon proxy settings
from http_proxies.amazon, falling back to the legacy http_proxy /
http_proxy_creds flat keys for backward compatibility.

Documents the new config section in conf/openlibrary.yml.

Closes #12715

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
URL-encode user and password before embedding in the proxy netloc so
special characters (@ : % etc.) don't break the URL. Also relax the
auth guard to inject credentials whenever user is set, even when
password is empty.

Adds TestGetProxyParams unit tests covering: no config → None,
unknown service → None, url-only, url+auth, and special-char encoding.

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
OAuth2TokenManager.refresh_token() calls bare requests.post(), which
reads HTTP_PROXY from the environment.  After the per-service proxy
config lands (http_proxy = bare squid URL, http_proxies.amazon = creds),
that env var no longer carries auth → Amazon token endpoint returns 403.

Inject a _ProxyAwareTokenManager subclass into the SDK's ApiClient when
proxy_creds are present.  The subclass overrides refresh_token() to use
a requests.Session with the authenticated proxy URL embedded directly,
bypassing env-var lookup entirely.  The existing rest_client injection
(urllib3 / RESTClientObject) already handles all other API calls; this
commit covers the one OAuth path that goes through requests directly.
@mekarpeles mekarpeles force-pushed the 12715/per-service-proxy-params branch from 9939df3 to ed7d782 Compare May 20, 2026 23:07
@github-actions github-actions Bot removed the Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] label May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Needs: Special Deploy This PR will need a non-standard deploy to production Priority: 0 Fix now: Issue prevents users from using the site or active data corruption. [managed]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Setting global http_proxy via os now preventing domain-specific urls that require auth

3 participants