Skip to content

Perf: batch site.get() calls across 6 N+1 hotspots (refs #12432)#12748

Open
WeiGuang-2099 wants to merge 22 commits into
internetarchive:masterfrom
WeiGuang-2099:finalsubmit
Open

Perf: batch site.get() calls across 6 N+1 hotspots (refs #12432)#12748
WeiGuang-2099 wants to merge 22 commits into
internetarchive:masterfrom
WeiGuang-2099:finalsubmit

Conversation

@WeiGuang-2099
Copy link
Copy Markdown

@WeiGuang-2099 WeiGuang-2099 commented May 15, 2026

Refs #12432

Summary

This PR addresses several N+1 query patterns flagged in #12432 by replacing
per-iteration web.ctx.site.get() calls with single batched
web.ctx.site.get_many() calls. All six changes follow the same shape:
collect keys up front, batch-load, then assemble the existing data shape
expected by templates/callers. Backend only, no intended behavior changes.

One of these locations had a pre-existing # TODO: should do in one web.ctx.get_many fetch comment, which this PR resolves.

Technical

Three changes are in openlibrary/plugins/upstream/mybooks.py, one is in
openlibrary/plugins/openlibrary/api.py, and two additional changes are in
openlibrary/plugins/openlibrary/bulk_tag.py and openlibrary/views/loanstats.py.

File Function / block Change Author
plugins/upstream/mybooks.py MyBooksTemplate.render_template() — loans block (~L74) Replace per-loan site.get(loan["book"]) with get_many(book_keys) and keyed hydration. Resolves the existing TODO comment. @Winterhuli
plugins/upstream/mybooks.py get_observations() (~L590) Batch work hydration via get_many(work_keys) instead of per-entry self._get_work() @Hussain5001
plugins/upstream/mybooks.py PatronBooknotes.get_notes() (~L578) Batch works and authors via get_many, pre-build works_by_key / authors_by_key / work_details_by_key dicts @MuchiniGun
plugins/openlibrary/api.py POST() (L881) Replace [site.get(key) for key in edition_keys] with get_many(edition_keys) @WeiGuang-2099
plugins/openlibrary/bulk_tag.py bulk_tag_works.POST() Replace per-work site.get(f"/works/{work}") with a single get_many(work_keys) and keyed lookup @WeiGuang-2099
views/loanstats.py readinglog_stats.GET() Batch-fetch Solr misses with get_many(missed_keys) instead of per-item site.get(key) fallback @WeiGuang-2099

Testing

  • Added regression test openlibrary/plugins/upstream/tests/test_mybooks.py
    for PatronBooknotes.get_notes(). It verifies:
    • works / authors / editions are fetched via batched get_many()
    • no per-item site.get() is performed inside the loop
    • the data shape consumed by the template is preserved
  • Ran: docker compose run --rm home pytest openlibrary/plugins/upstream/tests/test_mybooks.py -> 1 passed, 1 warning
  • Ran: python -m pytest openlibrary/plugins/openlibrary/tests/test_bulk_tag.py -q -> 1 passed, 1 warning in 0.06s

No production-scale benchmark was run. The additional bulk_tag.py and
loanstats.py changes are structural transformations from repeated
loop-based site.get() calls to a single get_many() over the same key set,
and are intended to be behavior-preserving.

Scope

This PR fixes six of the N+1 hotspots in the audit done for #12432. Other
call sites flagged in the audit may remain, so this PR uses Refs rather
than Closes.

Screenshot

N/A — backend-only change, no UI changes.

Stakeholders

@WeiGuang-2099 @Winterhuli @Hussain5001 @MuchiniGun

Winterhuli and others added 10 commits May 7, 2026 21:07
Find all locations of web.ctx.site.get() and whether they are called …
All parts have been modified, so just delete this file
fix(api): replace list comprehension web.ctx.site.get(key) with get_m…
Batch load works in get_observations using get_many
Reduce N+1 lookups in PatronBooknotes.get_notes by batching work,
author, and edition hydration for the notes page
Batch note page lookups in mybooks
@github-actions github-actions Bot added the Priority: 2 Important, as time permits. [managed] label May 15, 2026
@WeiGuang-2099 WeiGuang-2099 changed the title Fix: reduce N+1 queries in mybooks notes loading Perf: batch site.get() calls across 4 N+1 hotspots (refs #12432) May 18, 2026
Copy link
Copy Markdown
Collaborator

@jimchamp jimchamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @WeiGuang-2099.

I noticed the following issues with these changes:

  • It looks like you added some code that was previously removed.
  • zip is being used in an unsafe way.
  • PatronBooknotes._get_work() is no longer referenced, yet the method remains.

Comment thread openlibrary/plugins/openlibrary/api.py Outdated
Comment thread openlibrary/plugins/upstream/mybooks.py Outdated
book_keys = [loan["book"] for loan in myloans]
books = web.ctx.site.get_many(book_keys)

for loan, book in zip(myloans, books):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it will incorrectly match loan data to editions. There is no guarantee that get_many will return editions in the same order as they are in myloans.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update this to avoid relying on get_many() result order.
We can build a books_by_key map from the batched results and match each loan using loan["book"]

Comment on lines +581 to +582
if not notes:
return notes
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this added?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial idea was just to avoid making empty get_many([]) calls when there are no notes, but I see it's unnecessary since the normal batch flow handles empty notes. I will remove the guard.

Comment thread openlibrary/plugins/upstream/mybooks.py Outdated
work_keys = [f"/works/OL{entry['work_id']}W" for entry in observations]
works = web.ctx.site.get_many(work_keys)

for entry, work_key, work in zip(observations, work_keys, works):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works can be in any order, potentially causing observation data to be assigned to the wrong object.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update this as well to avoid assuming get_many() preserves input order.
We can build a works_by_key map and look up each observation’s work by work_key

Comment thread openlibrary/plugins/upstream/mybooks.py Outdated
Comment on lines +591 to +594
"cover_url": (work.get_cover_url("S") or "https://openlibrary.org/static/images/icons/avatar_book-sm.png"),
"title": work.get("title"),
"authors": [authors_by_key[a.author.key].name for a in work.get("authors", []) if a.author.key in authors_by_key],
"first_publish_year": work.first_publish_year or None,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this was copied from the PatronBooknotes._get_work_details method. Why wasn't that used here?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had copied it inline because I was trying to batch the author lookup for get_notes(). I'll use _get_work_details() so it can optionally receive a preloaded authors_by_key map.

@jimchamp jimchamp added the Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] label May 18, 2026
`if len(editions) > 1` block removed

Co-authored-by: jimchamp <28732543+jimchamp@users.noreply.github.com>
@github-actions github-actions Bot removed the Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] label May 19, 2026
@jimchamp jimchamp added the Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] label May 19, 2026
- get_many() results now matches by document key and doesn't rely on the order of the response
- removed redundant empty-notes check
- Reuse _get_work_details() with a slight refactoring so it accepts option authors_by_key
- Removed unused `_get_work`
@github-actions github-actions Bot removed the Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] label May 21, 2026
@WeiGuang-2099 WeiGuang-2099 changed the title Perf: batch site.get() calls across 4 N+1 hotspots (refs #12432) Perf: batch site.get() calls across 6 N+1 hotspots (refs #12432) May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Priority: 2 Important, as time permits. [managed]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants