Skip to content

update from Gluejar#94

Open
eshellman wants to merge 1049 commits intoEbookFoundation:masterfrom
Gluejar:master
Open

update from Gluejar#94
eshellman wants to merge 1049 commits intoEbookFoundation:masterfrom
Gluejar:master

Conversation

@eshellman
Copy link
Copy Markdown

No description provided.

rdhyee and others added 30 commits March 3, 2026 09:21
When DOAB returns 429 Too Many Requests, pyoai raises
urllib.error.HTTPError which previously propagated unhandled, crashing
the management command without printing the "loaded N records" summary.
The Retry-After header (typically 86400s = 24h) was silently discarded.

Now catches 429 specifically, logs the Retry-After value and how many
records were harvested before being cut off, then returns normally.
Any other HTTPError is still re-raised.

Fixes #1100

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Changed from 15s to 60s read timeout per Eric's feedback that 15s
is too aggressive for some publisher servers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three pre-existing bugs in the DOAB loader:

1. groups(1) -> group(1) in Springer FTP cover handling
   re.Match.groups() returns a tuple; .format(tuple) produces a broken
   URL like '.../('filename.jpg',)/...'. group(1) returns the string.

2. Non-FTP 302 redirect fetched url (original) instead of redirurl
   (the redirect destination). Now correctly follows the redirect.

3. item_type filter was case-sensitive and checked only the first list
   element via unlist(). Records where DOAB returns ['Book'] or
   ['peer-reviewed', 'book'] were silently skipped. Now checks the
   full list case-insensitively.

Fixes #1102

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…_streamdata

Two reliability improvements:

1. Per-record exception isolation in load_doab_oai()
   add_by_doab() was called with no try/except. A DB constraint
   violation or unexpected metadata error at any record aborted the
   entire remaining harvest window silently. Now catches Exception,
   logs the DOAB ID and error, and continues to the next record.

2. Explicit 429 handling in get_streamdata()
   Previously a 429 response caused response.json() to raise
   ValueError (DOAB returns HTML on rate-limit), logged only as
   "decoder error" — masking the true cause. Now checks status code
   before calling .json() and logs a clear rate-limit error with the
   Retry-After value.

Note: surfacing partial 429 harvests in management command output
requires coordination with PR #1101 and is tracked in #1105.

Fixes #1105 (partial — management command visibility pending #1101)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ContentTyper: timeout=(5, 60) on all 4 requests calls
- load_ebookfile: timeout=(10, 60) on 3 requests calls (file downloads)
- get_soup: timeout=(10, 30) + Timeout exception handler

Read timeout changed from 15s to 60s per Eric's feedback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Merge as part of DOAB harvester fix rollout (Wave 1)
Add 60s timeout to DOAB cover image HTTP requests
Handle DOAB OAI HTTP 429 rate-limit gracefully
Fix Springer cover URL bug, redirect target bug, and item_type filter
Add timeouts to ContentTyper, load_ebookfile, and get_soup HTTP calls
DOAB harvest reliability: per-record exception isolation and 429 in get_streamdata
Remove django-tastypie and python-mimeparse from requirements,
delete api/resources.py (Tastypie resource definitions), remove
tastypie from INSTALLED_APPS, clean up tastypie imports and API key
signal/retrieval code from core/signals.py, api/views.py, and
frontend/views/__init__.py. Update api/urls.py to remove v1_api
registration. Update api_help.html to remove REST API documentation.
Remove Tastypie-specific tests while preserving all OPDS, ONIX,
widget, AllowedRepo, and webhook functionality.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file is generated by Ansible during deployment and contains
server-specific config. It was already in .gitignore but was still
tracked, requiring a stash/pop cycle on every production merge.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove Booxtream watermarking code paths that were specific to
BUY2UNGLUE campaigns, which no longer exist:

- Remove B2U transaction handling branch in signals.py (provisioning
  books, watermark_acq call, gift handling for B2U purchases)
- Remove Campaign.watermark_success() and Campaign.make_unglued_ebf()
  methods (only called for B2U campaign success)
- Remove B2U-specific template blocks in manage_campaign.html
  (watermark checkbox) and edition_uploads.html (B2U upload UI)
- Clean up unused imports (BUY2UNGLUE, Library, ungluify, date_today)

Preserved all library lending Booxtream integration:
- Acq.get_watermarked(), Acq.borrow() watermark_acq calls
- BooXtream class, Boox model, watermark_acq task
- Campaign.do_watermark field (used by library lending path)
- All Booxtream settings and configuration

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Makes the Turnstile challenge invisible unless user interaction is
actually needed, reducing visual clutter. Applied to all 4 templates
that use the cf-turnstile widget:
- base.html (search bar)
- registration/welcome.html (welcome search)
- supporter.html (supporter page search)
- add_your_books.html (book submission search)

Fixes #1092

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Eric approved during 2026-03-19 meeting
Tested on test.unglue.it, Eric approved direction, ry:approved. Deploying to production.
Tested on test.unglue.it, Eric approved direction, ry:approved. Deploying to production.
Rebased on fix/all-facet-alias-1117 which handles the 'all' base
token. This commit adds keyword isolation on top:

- get_other_groups(): keyword active → no compounds; non-keyword
  active → keywords excluded from sidebar/OPDS
- get_facet_object(): raises InvalidFacetCombination for keyword
  compounds → 404 in web, OPDS, OPDS-JSON, ONIX views
- explore.html: removed per-subject links from sidebar
- 10 new tests (frontend + API) for isolation + 404 behavior

Refs #1110, #1095

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Eric approved. Merging keyword facet isolation (Phase 1 of #1110).
Context:
  On 2026-04-23, a template-not-found bug on test.unglue.it produced
  7,700+ admin emails in a single day — bots hitting /accounts/register/
  triggered Django's AdminEmailHandler on every 500 response, with no
  built-in rate limit. One of the two ADMINS addresses
  (rdhyee+ungluebugs@gluejar.com) had silently stopped reaching a human,
  so the flood was invisible to one admin while hammering the other.

  Full incident and design discussion:
  EbookFoundation/security-private#11
  Issue:
  #1127

Changes:

1. utils/custom_logging.py: add RateLimitFilter
   - Dedupes log records by (module, funcName, lineno, exc_type)
     within a configurable window (default 60s).
   - Per-process state; with n mod_wsgi workers the effective rate is
     ~n emails per window per error signature. Cross-worker
     coordination (Redis) is a possible future improvement.
   - Opportunistic cleanup of the dedupe dict prevents unbounded growth
     when many distinct error signatures fire.

2. settings/common.py: attach the filter to the mail_admins handler.
   - `require_debug_false` still runs (keeps admin emails off in DEBUG);
     the new `mail_admins_rate_limit` filter then dedupes.
   - Verified with a small isolated test that repeated identical records
     are suppressed and distinct signatures or post-window records pass.

3. settings/dev.py: update the stale ADMINS entry to
   raymond.yee+ungluebugs@gmail.com (a mailbox Raymond actively reads).
   The production ADMINS value is rendered by Ansible in prod.py.j2 —
   that's tracked in the companion provisioning PR.

Not addressed here (separate / already deferred):
- Per-environment ADMINS override (dj42/test should not page prod
  admins): handled by the companion regluit-provisioning PR via a
  `disable_admin_emails` group_var.
- Sentry or other structured error aggregation: proper replacement
  for the email-as-error-channel pattern. Evaluation deferred.

Closes part of #1127.
…sponse

Codex finding from review: the previous signature keyed on
(record.module, record.funcName, record.lineno, exc_type) — but for
django.request 500 errors those three fields always identify Django's
shared log_response call site, NOT the application frame that raised.
Effect: every 500 from every view would collide on one signature, and
only the first would ever pass the rate-limit filter in any given
window. Two different bugs in two different views within 60s → admin
only sees one email, the other is silently dropped.

Fix: derive the signature from the traceback leaf frame (walk
record.exc_info[2] to the deepest frame), plus exc_type, plus
request.path when the record has a request attribute (django.request
always does on 500s).

Trade-off noted in the docstring: same bug hitting different paths now
produces one email per distinct path per window instead of one email
total per window. Slight amplification, but bounded, and preserves the
"this bug affects many URLs" signal.

Falls back gracefully to record.module/lineno when exc_info is empty
(e.g., logger.error() call without an exception), so the filter stays
well-defined for non-exception log records.

Verified against simulated django.request records:
- Two different bugs at different leaf frames → both pass (was: one
  suppressed)
- Same bug at two different paths → both pass once, then suppress
- Same bug same path → first passes, second suppresses
- Non-exception records → fall back to record.module/lineno signature

Refs: #1127, EbookFoundation/security-private#11
Closes part of #1131.

The SEND_TEST_EMAIL_JOB is a dev-era artifact that's been hardcoded to
mail `unglueit@ebookfoundation.org` 72 times/day whenever a deploy is
tagged `deploy_type=test`. On production it's never registered
(`'prod' == 'test'` is False), but on test.unglue.it it fired 1,796 times
through 2026-02-24, polluting Eric's inbox with "hi there 20,25,30" /
"testing 1, 2, 3" messages. See #1131 for the full analysis.

The companion change removing the `if '{{ deploy_type }}' == 'test':`
branch from `prod.py.j2` is in EbookFoundation/regluit-provisioning.

Also removes the commented-out reference in settings/dev.py so the
symbol name doesn't dangle.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Eric reviewed the rate-limit filter on the 2026-04-23 email-flood follow-up
and pushed back: he treats admin-email volume as a live intensity gauge
(e.g., bot sessions missing against the 20-minute Apache restart cycle —
the count of emails is his feel for how bad things are). A dedup filter
that collapses bursts of identical errors dampens that gauge even though
it preserves the *presence* of the signal.

The real amplification vector was addressed in regluit-provisioning#27
(per-env ADMINS override — staging/dev environments now render
`ADMINS = ()`, eliminating the Apr 20-23 class of incident at the source).
CloudWatch SES reputation alarms are also in place to catch any future
prod-side runaway before it reaches pathological scale.

What remains in this PR is only the stale-address fix in settings/dev.py
that was bundled with the filter work. That single-line change stands on
its own merit — the old `rdhyee+ungluebugs@gluejar.com` alias stopped
reaching Raymond a while back, which is why the April incident went
undetected for 72 hours.

The filter code (utils/custom_logging.py RateLimitFilter + LOGGING
wiring in settings/common.py) is reverted in this commit, keeping
utils/custom_logging.py at its pre-branch state (GroupWriteRotatingFileHandler
only).

See #1131 discussion thread and the Eric meeting notes 2026-04-23.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rotection

Update stale admin email address
Release: #1128 (update stale admin email address)
…task

Remove dead SEND_TEST_EMAIL_JOB ("HiThere" test task)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants