Perf: batch site.get() calls across 6 N+1 hotspots (refs #12432) by WeiGuang-2099 · Pull Request #12748 · internetarchive/openlibrary

WeiGuang-2099 · 2026-05-15T13:13:32Z

Summary

This PR addresses several N+1 query patterns flagged in #12432 by replacing
per-iteration web.ctx.site.get() calls with single batched
web.ctx.site.get_many() calls. All six changes follow the same shape:
collect keys up front, batch-load, then assemble the existing data shape
expected by templates/callers. Backend only, no intended behavior changes.

One of these locations had a pre-existing # TODO: should do in one web.ctx.get_many fetch comment, which this PR resolves.

Technical

Three changes are in openlibrary/plugins/upstream/mybooks.py, one is in
openlibrary/plugins/openlibrary/api.py, and two additional changes are in
openlibrary/plugins/openlibrary/bulk_tag.py and openlibrary/views/loanstats.py.

File	Function / block	Change	Author
`plugins/upstream/mybooks.py`	`MyBooksTemplate.render_template()` — loans block (~L74)	Replace per-loan `site.get(loan["book"])` with `get_many(book_keys)` and keyed hydration. Resolves the existing `TODO` comment.	@Winterhuli
`plugins/upstream/mybooks.py`	`get_observations()` (~L590)	Batch work hydration via `get_many(work_keys)` instead of per-entry `self._get_work()`	@Hussain5001
`plugins/upstream/mybooks.py`	`PatronBooknotes.get_notes()` (~L578)	Batch works and authors via `get_many`, pre-build `works_by_key` / `authors_by_key` / `work_details_by_key` dicts	@MuchiniGun
`plugins/openlibrary/api.py`	`POST()` (L881)	Replace `[site.get(key) for key in edition_keys]` with `get_many(edition_keys)`	@WeiGuang-2099
`plugins/openlibrary/bulk_tag.py`	`bulk_tag_works.POST()`	Replace per-work `site.get(f"/works/{work}")` with a single `get_many(work_keys)` and keyed lookup	@WeiGuang-2099
`views/loanstats.py`	`readinglog_stats.GET()`	Batch-fetch Solr misses with `get_many(missed_keys)` instead of per-item `site.get(key)` fallback	@WeiGuang-2099

Testing

Added regression test openlibrary/plugins/upstream/tests/test_mybooks.py
for PatronBooknotes.get_notes(). It verifies:
- works / authors / editions are fetched via batched get_many()
- no per-item site.get() is performed inside the loop
- the data shape consumed by the template is preserved
Ran: docker compose run --rm home pytest openlibrary/plugins/upstream/tests/test_mybooks.py -> 1 passed, 1 warning
Ran: python -m pytest openlibrary/plugins/openlibrary/tests/test_bulk_tag.py -q -> 1 passed, 1 warning in 0.06s

No production-scale benchmark was run. The additional bulk_tag.py and
loanstats.py changes are structural transformations from repeated
loop-based site.get() calls to a single get_many() over the same key set,
and are intended to be behavior-preserving.

Scope

This PR fixes six of the N+1 hotspots in the audit done for #12432. Other
call sites flagged in the audit may remain, so this PR uses Refs rather
than Closes.

Screenshot

N/A — backend-only change, no UI changes.

Stakeholders

@WeiGuang-2099 @Winterhuli @Hussain5001 @MuchiniGun

…in a loop.

…any at plugins/openlibrary/api.py:881

Find all locations of web.ctx.site.get() and whether they are called …

All parts have been modified, so just delete this file

fix(api): replace list comprehension web.ctx.site.get(key) with get_m…

Batch load works in get_observations using get_many

Reduce N+1 lookups in PatronBooknotes.get_notes by batching work, author, and edition hydration for the notes page

Batch note page lookups in mybooks

for more information, see https://pre-commit.ci

jimchamp

Thanks @WeiGuang-2099.

I noticed the following issues with these changes:

It looks like you added some code that was previously removed.
zip is being used in an unsafe way.
PatronBooknotes._get_work() is no longer referenced, yet the method remains.

jimchamp · 2026-05-18T18:47:32Z

+            book_keys = [loan["book"] for loan in myloans]
+            books = web.ctx.site.get_many(book_keys)
+
+            for loan, book in zip(myloans, books):


This looks like it will incorrectly match loan data to editions. There is no guarantee that get_many will return editions in the same order as they are in myloans.

Will update this to avoid relying on get_many() result order.
We can build a books_by_key map from the batched results and match each loan using loan["book"]

jimchamp · 2026-05-18T18:51:37Z

+        if not notes:
+            return notes


Why was this added?

My initial idea was just to avoid making empty get_many([]) calls when there are no notes, but I see it's unnecessary since the normal batch flow handles empty notes. I will remove the guard.

jimchamp · 2026-05-18T18:55:56Z

+        work_keys = [f"/works/OL{entry['work_id']}W" for entry in observations]
+        works = web.ctx.site.get_many(work_keys)
+
+        for entry, work_key, work in zip(observations, work_keys, works):


works can be in any order, potentially causing observation data to be assigned to the wrong object.

Will update this as well to avoid assuming get_many() preserves input order.
We can build a works_by_key map and look up each observation’s work by work_key

jimchamp · 2026-05-18T18:59:35Z

+                "cover_url": (work.get_cover_url("S") or "https://openlibrary.org/static/images/icons/avatar_book-sm.png"),
+                "title": work.get("title"),
+                "authors": [authors_by_key[a.author.key].name for a in work.get("authors", []) if a.author.key in authors_by_key],
+                "first_publish_year": work.first_publish_year or None,


It looks like this was copied from the PatronBooknotes._get_work_details method. Why wasn't that used here?

I had copied it inline because I was trying to batch the author lookup for get_notes(). I'll use _get_work_details() so it can optionally receive a preloaded authors_by_key map.

`if len(editions) > 1` block removed Co-authored-by: jimchamp <28732543+jimchamp@users.noreply.github.com>

- get_many() results now matches by document key and doesn't rely on the order of the response - removed redundant empty-notes check - Reuse _get_work_details() with a slight refactoring so it accepts option authors_by_key - Removed unused `_get_work`

Avoid N+1 work fetches in bulk tag and reading log stats

fix: avoid relying on get_many() result order in mybooks

for more information, see https://pre-commit.ci

Winterhuli and others added 10 commits May 7, 2026 21:07

Find all locations of web.ctx.site.get() and whether they are called …

e84146c

…in a loop.

replace per-loan site.get with site.get_many

0910ea0

fix(api): replace list comprehension web.ctx.site.get(key) with get_m…

8036a9b

…any at plugins/openlibrary/api.py:881

Merge pull request #1 from WeiGuang-2099/Cheng

c4a1a99

Find all locations of web.ctx.site.get() and whether they are called …

Delete N+1 queries.txt

bb14d0b

All parts have been modified, so just delete this file

Merge pull request #2 from WeiGuang-2099/yuheng

0af2fef

fix(api): replace list comprehension web.ctx.site.get(key) with get_m…

Batch load works in get_observations using get_many

ef3d073

Merge pull request #3 from WeiGuang-2099/hussain

11fed38

Batch load works in get_observations using get_many

Batch note page lookups in mybooks

3dd00c0

Reduce N+1 lookups in PatronBooknotes.get_notes by batching work, author, and edition hydration for the notes page

Merge pull request #4 from WeiGuang-2099/owen

6c42073

Batch note page lookups in mybooks

github-actions Bot assigned jimchamp May 15, 2026

github-actions Bot added the Priority: 2 Important, as time permits. [managed] label May 15, 2026

WeiGuang-2099 and others added 2 commits May 16, 2026 23:51

Merge branch 'master' into finalsubmit

b97ff2e

[pre-commit.ci] auto fixes from pre-commit.com hooks

c0478d9

for more information, see https://pre-commit.ci

WeiGuang-2099 changed the title ~~Fix: reduce N+1 queries in mybooks notes loading~~ Perf: batch site.get() calls across 4 N+1 hotspots (refs #12432) May 18, 2026

jimchamp requested changes May 18, 2026

View reviewed changes

jimchamp added the Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] label May 18, 2026

Update openlibrary/plugins/openlibrary/api.py

53172e8

`if len(editions) > 1` block removed Co-authored-by: jimchamp <28732543+jimchamp@users.noreply.github.com>

github-actions Bot removed the Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] label May 19, 2026

jimchamp added the Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] label May 19, 2026

github-actions Bot removed the Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] label May 21, 2026

MuchiniGun and others added 8 commits May 21, 2026 16:38

Updating the remaining changes

6d96026

fix: avoid relying on get_many result order in mybooks

8286b6a

Avoid N+1 work fetches in bulk tag and reading log stats

a4807d6

Merge pull request #7 from WeiGuang-2099/yuheng

22e2a59

Avoid N+1 work fetches in bulk tag and reading log stats

Merge branch 'master' into finalsubmit

54ae068

Merge branch 'finalsubmit' into Cheng

17a9045

Merge pull request #8 from WeiGuang-2099/Cheng

a457177

fix: avoid relying on get_many() result order in mybooks

[pre-commit.ci] auto fixes from pre-commit.com hooks

85822dc

for more information, see https://pre-commit.ci

WeiGuang-2099 changed the title ~~Perf: batch site.get() calls across 4 N+1 hotspots (refs #12432)~~ Perf: batch site.get() calls across 6 N+1 hotspots (refs #12432) May 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Perf: batch site.get() calls across 6 N+1 hotspots (refs #12432)#12748

Perf: batch site.get() calls across 6 N+1 hotspots (refs #12432)#12748
WeiGuang-2099 wants to merge 22 commits into
internetarchive:masterfrom
WeiGuang-2099:finalsubmit

WeiGuang-2099 commented May 15, 2026 •

edited

Loading

Uh oh!

jimchamp left a comment

Uh oh!

Uh oh!

jimchamp May 18, 2026

Uh oh!

Hussain5001 May 21, 2026

Uh oh!

jimchamp May 18, 2026

Uh oh!

MuchiniGun May 21, 2026

Uh oh!

jimchamp May 18, 2026

Uh oh!

Hussain5001 May 21, 2026

Uh oh!

jimchamp May 18, 2026

Uh oh!

MuchiniGun May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

WeiGuang-2099 commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Technical

Testing

Scope

Screenshot

Stakeholders

Uh oh!

jimchamp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

WeiGuang-2099 commented May 15, 2026 •

edited

Loading