Skip to content

Warm caches when batch creating missing indexables#23355

Open
leonidasmi wants to merge 2 commits into
trunkfrom
prime-caches-find-by-multiple-ids-and-type
Open

Warm caches when batch creating missing indexables#23355
leonidasmi wants to merge 2 commits into
trunkfrom
prime-caches-find-by-multiple-ids-and-type

Conversation

@leonidasmi

@leonidasmi leonidasmi commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Context

  • On post indexation, the post builder on a cold cache, fires 2 queries per post:
    • get_post($post_id) → 1 query for the post row
    • The first WPSEO_Meta::get_value() → 1 query that loads all that post's meta.
  • _prime_post_caches($ids) replaces all of that with ~3 batched queries for the whole chunk (posts row + postmeta + the term-relationship prime).
  • So, we're shaving around 47 queries (25x2 - 3) per 25-post batch during each post indexation action
  • Similarly on term indexation, the term builder on a cold cache, fires ~1 query per term:
    • get_term($term_id) → 1 query for the term row
  • _prime_term_caches($ids) now replaces the per-term get_term rows with ~2 batched queries
  • So, we're shaving around 23 queryies (25 - 2) per 25-term batch during each term indexation action
  • There would be similar gains in link indexation, but these mostly happen after post and term indexation is finished so the related indexables are already created so there's usually no need for queries anyway.
  • The post builder is also firing those 2 queries per post when the schema aggregation endpoint is hit and indexables are disabled
    • Which means that again _prime_post_caches($ids) replaces all of that with ~3 batched queries, but the impact is much bigger because the batch is larger (1000 posts, maximum) and this is always ran when the endpoint is hit, because indexable data are not persisted

Summary

This PR can be summarized in the following changelog entry:

  • Improves performance when running the SEO optimization by warming post and term caches in bulk.

Relevant technical choices:

  • I've added all instances where the affected find_by_multiple_ids_and_type() function is used, in the test instructions to verify the optimization and to perform regression tests. There's two exceptions:
    • There's the Meta_Surface::for_posts()` one that is affected by our changes but is not used anymore by our plugins, since this PR. That being said, since it's part of our Surface API it might be used in third-party code. I couldn't find any such instance using https://veloria.dev/, so there's no impact check instructions for that.
    • There's also Premium using the find_by_multiple_ids_and_type()` function when rendering the schema for the publishing pages. But on 99.99% of cases, the indexables for these pages will already be present when doing that, so the code will now go into bulk creating indexables
  • Decided to use _prime_term_caches( $ids ) instead of _prime_term_caches( $ids, false ), even though the term builder doesn't need term meta, which means that we're doing one extra query per batch for no reason. My reasons for that:
    • The win is negligible. We would be trading away 1 batched query per pass of 25 terms against the ~23 we already eliminated
    • false is fragile. That query is useless now but if we ever add term meta in the term builder, we would introduce 25 more queries per batch silently.

Test instructions

Test instructions for the acceptance test before the PR gets merged

This PR can be acceptance tested by following these steps:

Verify the performance optimization in SEOO

  • Using our docker repo, add the following count-indexing-queries.php file in the wordpress folder:
<?php

global $wpdb;

$container = YoastSEO()->classes;

$actions = [
	'post'      => Yoast\WP\SEO\Actions\Indexing\Indexable_Post_Indexation_Action::class,
	'term'      => Yoast\WP\SEO\Actions\Indexing\Indexable_Term_Indexation_Action::class,
	'post-link' => Yoast\WP\SEO\Actions\Indexing\Post_Link_Indexing_Action::class,
	'term-link' => Yoast\WP\SEO\Actions\Indexing\Term_Link_Indexing_Action::class,
];

foreach ( $actions as $label => $class ) {
	$action = $container->get( $class );

	$before     = $wpdb->num_queries;
	$indexables = $action->index();
	$after      = $wpdb->num_queries;

	WP_CLI::log(
			sprintf(
					'%-10s | indexables built: %3d | queries: %d',
					$label,
					count( $indexables ),
					( $after - $before )
			)
	);
}
  • This should log the number of queries for each of the actions that is affected by our changes
  • Reset indexables and run the following command in WP CLI: ./wp.sh eval-file count-indexing-queries.php
  • Check the message shown in the console:
post       | indexables built:  25 | queries: 154
term       | indexables built:  25 | queries: 123
post-link  | indexables built:   5 | queries: 34
term-link  | indexables built:   5 | queries: 17
  • Switch to trunk/production version, reset indexables and repeat the WP CLI command
  • Compare the number of queries with the previous run and confirm that they are much less (except probably from the post- and term-link ones)

Regression test SEOO

  • Reset indexables and run the SEOO with this PR/RC. Take a note of the data in the indexable and link tables afterwards
  • Switch to trunk/production version, reset indexables and run the SEOO again. Compare the indexable and link tables afterwards with what you got before
    • Easiest thing to check is the tables' size, they should be the same
    • Randomly check a couple of posts/pages/terms in each case and verify that they mostly contain the same data (timestamps excluded)
    • Randomly check a couple of links in each case and verify that they contain the same data

Regression test SEOO with persistent object cache

  • Reset indexables via the test helper
  • Install and activate the SQLite Object Cache plugin (or use Redis if your stack has it); confirm wp-content/object-cache.php now exists
  • Run the SEOO with this PR/RC and take a note of the data in the indexable and link table afterwards
  • Flush the cache of the SQLite Object Cache plugin and reset indexables again
  • Run the SEOO with trunk/production version and compare the data in the indexable and link table afterwards with the ones you took a note above. They should be more or less the same (with differences in timestamps mostly)
  • Deactivate the plugin and confirm that there's no wp-content/object-cache.php drop-in afterwards

Verify the performance optimization in schema aggregator

  • Add the following snippet in a mu-plugin:
add_filter( 'rest_pre_dispatch', function ( $result, $server, $request ) {
	if ( strpos( $request->get_route(), 'schema-aggregator/get-schema' ) !== false ) {
			global $wpdb;
			$GLOBALS['_sa_queries_before'] = $wpdb->num_queries;
	}
	return $result;
}, 10, 3 );

add_filter( 'rest_post_dispatch', function ( $response, $server, $request ) {
	if ( isset( $GLOBALS['_sa_queries_before'] ) && strpos( $request->get_route(), 'schema-aggregator/get-schema' ) !== false ) {
			global $wpdb;
			$delta = $wpdb->num_queries - $GLOBALS['_sa_queries_before'];
			error_log( sprintf( '[schema-aggregator] %s -> %d queries', $request->get_route(), $delta ) );
			unset( $GLOBALS['_sa_queries_before'] );
	}
	return $response;
}, 10, 3 );
  • This logs the number of queries that run when the schema aggregator endpoint is hit
  • Reset indexables and then disable them
  • Have a couple of posts that are:
    • draft
    • private
    • no-indexed
    • public
  • GET the http://example.com/wp-json/yoast/v1/schema-aggregator/get-schema/post endpoint
    • check the [schema-aggregator] /yoast/v1/schema-aggregator/get-schema/post -> X queries number in your debug.log
    • delete transients and switch to trunk/production
    • GET the same endpoint and first confirm the output stays the same.
    • also, check the check the [schema-aggregator] /yoast/v1/schema-aggregator/get-schema/post -> X queries number in your debug.log and confirm that it's much smaller

Relevant test scenarios

  • Changes should be tested with the browser console open
  • Changes should be tested on different posts/pages/taxonomies/custom post types/custom taxonomies
  • Changes should be tested on different editors (Default Block/Gutenberg/Classic/Elementor/other)
  • Changes should be tested on different browsers
  • Changes should be tested on multisite

Test instructions for QA when the code is in the RC

  • QA should use the same steps as above.

Impact check

This PR affects the following parts of the plugin, which may require extra testing:

Other environments

  • This PR also affects Shopify. I have added a changelog entry starting with [shopify-seo], added test instructions for Shopify and attached the Shopify label to this PR.
  • This PR also affects Yoast SEO for Google Docs. I have added a changelog entry starting with [yoast-doc-extension], added test instructions for Yoast SEO for Google Docs and attached the Google Docs Add-on label to this PR.

Documentation

  • I have written documentation for this change. For example, comments in the Relevant technical choices, comments in the code, documentation on Confluence / shared Google Drive / Yoast developer portal, or other.

Quality assurance

  • I have tested this code to the best of my abilities.
  • During testing, I had activated all plugins that Yoast SEO provides integrations for.
  • I have added unit tests to verify the code works as intended.
  • If any part of the code is behind a feature flag, my test instructions also cover cases where the feature flag is switched off.
  • I have written this PR in accordance with my team's definition of done.
  • I have checked that the base branch is correctly set.
  • I have run grunt build:images and committed the results, if my PR introduces or edits images or SVGs.

Innovation

  • No innovation project is applicable for this PR.
  • This PR falls under an innovation project. I have attached the innovation label.
  • I have added my hours to the WBSO document.

Fixes #

@leonidasmi leonidasmi marked this pull request as draft June 12, 2026 12:32
@leonidasmi leonidasmi added the changelog: enhancement Needs to be included in the 'Enhancements' category in the changelog label Jun 12, 2026
@coveralls

Copy link
Copy Markdown

Coverage Report for CI Build 25144886830

Warning

Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes.
Quick fix: rebase this PR. Learn more →

Coverage decreased (-0.4%) to 53.681%

Details

  • Coverage decreased (-0.4%) from the base build.
  • Patch coverage: 5 of 5 lines across 1 file are fully covered (100%).
  • 21 coverage regressions across 1 file.

Uncovered Changes

No uncovered changes found.

Coverage Regressions

21 previously-covered lines in 1 file lost coverage.

File Lines Losing Coverage Coverage
src/builders/indexable-link-builder.php 21 73.13%

Coverage Stats

Coverage Status
Relevant Lines: 67717
Covered Lines: 36204
Line Coverage: 53.46%
Relevant Branches: 16577
Covered Branches: 9046
Branch Coverage: 54.57%
Branches in Coverage %: Yes
Coverage Strength: 44437.89 hits per line

💛 - Coveralls

@leonidasmi leonidasmi self-assigned this Jun 15, 2026
@leonidasmi leonidasmi marked this pull request as ready for review June 15, 2026 13:00
@leonidasmi leonidasmi removed their assignment Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog: enhancement Needs to be included in the 'Enhancements' category in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants