feat(proxy): add per-service proxy auth via get_proxy_params()#12724
feat(proxy): add per-service proxy auth via get_proxy_params()#12724mekarpeles wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Adds per-service HTTP proxy configuration to support authenticated squid ACLs (fixing #12715) while keeping existing global http_proxy behavior as the default fallback.
Changes:
- Introduces
get_proxy_params(service_tag)to build arequests-compatibleproxiesdict fromhttp_proxies.<service>config. - Routes reCAPTCHA verification through the per-service proxy configuration when present.
- Updates the affiliate server to prefer
http_proxies.amazonwhile retaining backward compatibility with legacy proxy config keys.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| scripts/affiliate_server.py | Prefer http_proxies.amazon for Amazon proxy settings, with fallback to legacy flat keys. |
| openlibrary/plugins/upstream/utils.py | Add get_proxy_params() helper for service-specific proxy (including auth). |
| openlibrary/plugins/recaptcha/recaptcha.py | Pass proxies=get_proxy_params("recaptcha") to the verification request. |
| conf/openlibrary.yml | Document the new http_proxies config shape (commented examples). |
bfc7d8c to
940f8f8
Compare
93e1643 to
7822fec
Compare
Investigation: Amazon 403 on worker restartRoot cause identified. The The flow:
Why it worked initially: The token was valid in-process during the session. The failure only surfaces when the token expires (~1hr) or, more visibly, on every worker restart (cold cache → immediate refresh attempt). Our PR does not fix this. Our change is backward-compatible in production (falls back to Fix options (not in this PR — needs a follow-up):
Recommended: Option 1 (add to |
jimchamp
left a comment
There was a problem hiding this comment.
I'm not certain that the is_lwa method exists on OAuth2Config objects. Other than that, this looks okay to me.
Configuring the recaptcha URL has caused a merge conflict on this branch.
The global setup_requests() approach (HTTP_PROXY env vars, no auth)
is insufficient for services that require per-service proxy credentials.
Adds get_proxy_params(service_tag) to utils.py: reads from a new
http_proxies config section (url/user/password per service) and returns
a requests-compatible proxies dict with credentials embedded in the URL,
or None to fall through to the global env var default.
Updates recaptcha to pass proxies=get_proxy_params("recaptcha") so
that service-specific proxy auth is used when configured.
Updates affiliate_server load_config to read amazon proxy settings
from http_proxies.amazon, falling back to the legacy http_proxy /
http_proxy_creds flat keys for backward compatibility.
Documents the new config section in conf/openlibrary.yml.
Closes #12715
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
URL-encode user and password before embedding in the proxy netloc so special characters (@ : % etc.) don't break the URL. Also relax the auth guard to inject credentials whenever user is set, even when password is empty. Adds TestGetProxyParams unit tests covering: no config → None, unknown service → None, url-only, url+auth, and special-char encoding. [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci
OAuth2TokenManager.refresh_token() calls bare requests.post(), which reads HTTP_PROXY from the environment. After the per-service proxy config lands (http_proxy = bare squid URL, http_proxies.amazon = creds), that env var no longer carries auth → Amazon token endpoint returns 403. Inject a _ProxyAwareTokenManager subclass into the SDK's ApiClient when proxy_creds are present. The subclass overrides refresh_token() to use a requests.Session with the authenticated proxy URL embedded directly, bypassing env-var lookup entirely. The existing rest_client injection (urllib3 / RESTClientObject) already handles all other API calls; this commit covers the one OAuth path that goes through requests directly.
9939df3 to
ed7d782
Compare
for more information, see https://pre-commit.ci
Summary
Fixes #12715. The global
setup_requests()approach (settingHTTP_PROXY/HTTPS_PROXYenv vars with no credentials) is insufficient for services that now require per-service proxy auth under the new squid ACL policy.get_proxy_params(service_tag)toutils.py— readshttp_proxyfor the shared proxy URL andhttp_proxy_services.<service>for credentials (user:password), returns arequests-compatibleproxiesdict with credentials embedded, orNoneto fall through to the global env var default set bysetup_requests()recaptcha.pyto passproxies=get_proxy_params("recaptcha")on the verify callaffiliate_server.pyload_configto readhttp_proxy+http_proxy_services.amazondirectlyconf/openlibrary.yml(commented out — no proxy needed in dev)Config shape (in
/olsystem/etc/openlibrary.yml)When
http_proxy_services.<service>is absent,get_proxy_paramsreturnsNoneandrequestsfalls back to theHTTP_PROXYenv var (same proxy, no credentials).Testing
get_proxy_params("recaptcha")returnsNonewhenhttp_proxyis unset → existing env var behaviour unchangedget_proxy_params("recaptcha")returnsNonewhen service is absent fromhttp_proxy_services→ falls back to env var proxy (no credentials)get_proxy_params("recaptcha")returns{"http": "http://user:pass@proxy:3128", "https": ...}when service entry is presentStakeholders
/cc @cdrini @scottbarnes (ops — needs
/olsystem/etc/openlibrary.ymlupdated with service credentials once this merges)