Perf: batch site.get() calls across 6 N+1 hotspots (refs #12432)#12748
Perf: batch site.get() calls across 6 N+1 hotspots (refs #12432)#12748WeiGuang-2099 wants to merge 22 commits into
Conversation
…any at plugins/openlibrary/api.py:881
Find all locations of web.ctx.site.get() and whether they are called …
All parts have been modified, so just delete this file
fix(api): replace list comprehension web.ctx.site.get(key) with get_m…
Batch load works in get_observations using get_many
Reduce N+1 lookups in PatronBooknotes.get_notes by batching work, author, and edition hydration for the notes page
Batch note page lookups in mybooks
jimchamp
left a comment
There was a problem hiding this comment.
Thanks @WeiGuang-2099.
I noticed the following issues with these changes:
- It looks like you added some code that was previously removed.
zipis being used in an unsafe way.PatronBooknotes._get_work()is no longer referenced, yet the method remains.
| book_keys = [loan["book"] for loan in myloans] | ||
| books = web.ctx.site.get_many(book_keys) | ||
|
|
||
| for loan, book in zip(myloans, books): |
There was a problem hiding this comment.
This looks like it will incorrectly match loan data to editions. There is no guarantee that get_many will return editions in the same order as they are in myloans.
There was a problem hiding this comment.
Will update this to avoid relying on get_many() result order.
We can build a books_by_key map from the batched results and match each loan using loan["book"]
| if not notes: | ||
| return notes |
There was a problem hiding this comment.
My initial idea was just to avoid making empty get_many([]) calls when there are no notes, but I see it's unnecessary since the normal batch flow handles empty notes. I will remove the guard.
| work_keys = [f"/works/OL{entry['work_id']}W" for entry in observations] | ||
| works = web.ctx.site.get_many(work_keys) | ||
|
|
||
| for entry, work_key, work in zip(observations, work_keys, works): |
There was a problem hiding this comment.
works can be in any order, potentially causing observation data to be assigned to the wrong object.
There was a problem hiding this comment.
Will update this as well to avoid assuming get_many() preserves input order.
We can build a works_by_key map and look up each observation’s work by work_key
| "cover_url": (work.get_cover_url("S") or "https://openlibrary.org/static/images/icons/avatar_book-sm.png"), | ||
| "title": work.get("title"), | ||
| "authors": [authors_by_key[a.author.key].name for a in work.get("authors", []) if a.author.key in authors_by_key], | ||
| "first_publish_year": work.first_publish_year or None, |
There was a problem hiding this comment.
It looks like this was copied from the PatronBooknotes._get_work_details method. Why wasn't that used here?
There was a problem hiding this comment.
I had copied it inline because I was trying to batch the author lookup for get_notes(). I'll use _get_work_details() so it can optionally receive a preloaded authors_by_key map.
`if len(editions) > 1` block removed Co-authored-by: jimchamp <28732543+jimchamp@users.noreply.github.com>
- get_many() results now matches by document key and doesn't rely on the order of the response - removed redundant empty-notes check - Reuse _get_work_details() with a slight refactoring so it accepts option authors_by_key - Removed unused `_get_work`
Avoid N+1 work fetches in bulk tag and reading log stats
fix: avoid relying on get_many() result order in mybooks
for more information, see https://pre-commit.ci
Refs #12432
Summary
This PR addresses several N+1 query patterns flagged in #12432 by replacing
per-iteration
web.ctx.site.get()calls with single batchedweb.ctx.site.get_many()calls. All six changes follow the same shape:collect keys up front, batch-load, then assemble the existing data shape
expected by templates/callers. Backend only, no intended behavior changes.
One of these locations had a pre-existing
# TODO: should do in one web.ctx.get_many fetchcomment, which this PR resolves.Technical
Three changes are in
openlibrary/plugins/upstream/mybooks.py, one is inopenlibrary/plugins/openlibrary/api.py, and two additional changes are inopenlibrary/plugins/openlibrary/bulk_tag.pyandopenlibrary/views/loanstats.py.plugins/upstream/mybooks.pyMyBooksTemplate.render_template()— loans block (~L74)site.get(loan["book"])withget_many(book_keys)and keyed hydration. Resolves the existingTODOcomment.plugins/upstream/mybooks.pyget_observations()(~L590)get_many(work_keys)instead of per-entryself._get_work()plugins/upstream/mybooks.pyPatronBooknotes.get_notes()(~L578)get_many, pre-buildworks_by_key/authors_by_key/work_details_by_keydictsplugins/openlibrary/api.pyPOST()(L881)[site.get(key) for key in edition_keys]withget_many(edition_keys)plugins/openlibrary/bulk_tag.pybulk_tag_works.POST()site.get(f"/works/{work}")with a singleget_many(work_keys)and keyed lookupviews/loanstats.pyreadinglog_stats.GET()get_many(missed_keys)instead of per-itemsite.get(key)fallbackTesting
openlibrary/plugins/upstream/tests/test_mybooks.pyfor
PatronBooknotes.get_notes(). It verifies:get_many()site.get()is performed inside the loopdocker compose run --rm home pytest openlibrary/plugins/upstream/tests/test_mybooks.py->1 passed, 1 warningpython -m pytest openlibrary/plugins/openlibrary/tests/test_bulk_tag.py -q->1 passed, 1 warning in 0.06sNo production-scale benchmark was run. The additional
bulk_tag.pyandloanstats.pychanges are structural transformations from repeatedloop-based
site.get()calls to a singleget_many()over the same key set,and are intended to be behavior-preserving.
Scope
This PR fixes six of the N+1 hotspots in the audit done for #12432. Other
call sites flagged in the audit may remain, so this PR uses
Refsratherthan
Closes.Screenshot
N/A — backend-only change, no UI changes.
Stakeholders
@WeiGuang-2099 @Winterhuli @Hussain5001 @MuchiniGun