Skip to content

yoastseo package: define serializable PaperDTO schema and conversion function#23336

Open
FAMarfuaty wants to merge 17 commits into
trunkfrom
634-api-migration-define-a-serializable-paperkeyphrase-input-contract-for-non-wordpress-consumers
Open

yoastseo package: define serializable PaperDTO schema and conversion function#23336
FAMarfuaty wants to merge 17 commits into
trunkfrom
634-api-migration-define-a-serializable-paperkeyphrase-input-contract-for-non-wordpress-consumers

Conversation

@FAMarfuaty

@FAMarfuaty FAMarfuaty commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Context

The API-driven content-analysis migration needs a documented, serializable input that the analysis engine can validate, so non-WordPress consumers (a hosted web API, the Shopify app, the Google Docs extension) stop hand-rolling Paper input and each shipping their own variant of input bugs (e.g. the Shopify siteUrl/domain confusion in Yoast/lingo-other-tasks#97).

This PR adds that input contract — a serializable PaperDTO plus a toPaper boundary that validates it and constructs the engine's internal Paper — exposed as an opt-in yoastseo/contract entry, and converts the in-repo content-analysis-api reference app to use it as the first consumer. The WordPress plugin is intentionally untouched: it keeps constructing Paper directly.

Summary

This PR can be summarized in the following changelog entry:

  • [yoastseo 0.1.0 enhancement] Adds a serializable PaperDTO input contract and toPaper boundary, exposed via the yoastseo/contract entry point, for validating and mapping analysis input.

Relevant technical choices:

  • zod as the single source of truth, exposed at a separate yoastseo/contract entry (mirroring yoastseo/researcher) rather than from the package root. The plugin loads yoastseo as a webpack external/global, so keeping the contract off the root keeps zod out of the WordPress bundle — only consumers that import the contract pay for it. No exports map was added (it would break existing deep importers of the open build/ tree); the entry is a root redirect dir listed in files.
  • The contract is opt-in, not enforced in the engine. Enforcing it in the core engine (so all consumers, incl. WP, must adhere) is a larger, breaking, major-version change tracked separately in Yoast/lingo-other-tasks#654; this PR deliberately stops at exposing the boundary.
  • keyphrase is canonical; keyword is accepted as a deprecated alias. Every current consumer speaks the engine's keyword, so the alias lets them adopt without renaming; keyphrase is the name we steer toward.
  • WP-transitional fields (wpBlocks/shortcodes/isFrontPage) are included as optional + deprecated, despite the goal of a neutral contract. They are real analysis inputs that change WordPress scores (shortcodes are stripped before counting/matching; blocks drive tree construction), so a remote/API analysis can only reproduce in-browser scores if it can send them. They are marked deprecated pending the neutral structured-content work (#264).
  • siteUrl/domain deferred. No consumer feeds them through Paper today and no assessment reads them (competing-links gets the site URL from context); they’ll be added with that assessment’s refactor so the semantics are shaped against a real reader.
  • customData kept open (z.record, shape-checked but contents unvalidated) to avoid coupling the contract to platform/product (Shopify/WooCommerce) shapes.

Test instructions

Test instructions for the acceptance test before the PR gets merged

This is a developer-facing change to the yoastseo package (no UI). It can be verified as follows:

  • In packages/yoastseo, run the contract unit tests using yarn test [full path to paperDtoSpec.js].
  • Build the package (yarn build) and confirm the public entry resolves: node -e 'const { toPaper } = require("yoastseo/contract"); console.log(toPaper({ text: "A cat post", keyphrase: "cat food" }).getKeyword());' prints cat food.
  • Confirm invalid input is rejected: toPaper({ text: 123 }) and toPaper({ text: "x", keyphrse: "typo" }) both throw.
  • Confirm the contract bundles in a browser/webpack build: in apps/content-analysis-webworker, run yarn install && yarn build (the app is not part of the root Yarn workspace, so its webpack devDependency must be installed first) — it compiles with no errors, exercising import { toPaper } from "yoastseo/contract" through the bundler (the content-analysis-api consumer only exercises the Node require path).
  • Confirm the WordPress plugin is unaffected: the editor analysis still runs exactly as before (no packages/js changes in this PR).

Relevant test scenarios

  • Changes should be tested with the browser console open
  • Changes should be tested on different posts/pages/taxonomies/custom post types/custom taxonomies
  • Changes should be tested on different editors (Default Block/Gutenberg/Classic/Elementor/other)
  • Changes should be tested on different browsers
  • Changes should be tested on multisite

Test instructions for QA when the code is in the RC

  • QA should use the same steps as above.

Not applicable — this is a non-user-facing package change with no RC-visible behaviour.

Impact check

This PR affects the following parts, which may require extra testing:

  • packages/yoastseo: a new opt-in yoastseo/contract entry and zod added as a dependency (install-size only; the contract is off the package root, so the WordPress bundle does not gain zod). The bundle impact for consumers that do import the contract has not been measured with the analyzer — flagged for follow-up.
  • apps/content-analysis-api: routes refactored to build Paper via the contract (now returns a 400 on structurally invalid input instead of silently analysing it). In-repo reference app; no plugin impact.
  • apps/content-analysis-webworker: the browser/webpack demo now builds its Paper via the contract too — both confirms the yoastseo/contract entry resolves and bundles client-side, and serves as the second reference consumer. In-repo demo; no plugin impact.
  • WordPress plugin (packages/js): no changes — analysis input is built exactly as before.

Other environments

  • This PR also affects Shopify. I have added a changelog entry starting with [shopify-seo], added test instructions for Shopify and attached the Shopify label to this PR.
  • This PR also affects Yoast SEO for Google Docs. I have added a changelog entry starting with [yoast-doc-extension], added test instructions for Yoast SEO for Google Docs and attached the Google Docs Add-on label to this PR.

Documentation

  • I have written documentation for this change. (README "Serializable input contract" section, GLOSSARY PaperDTO entry, and JSDoc on the schema/mapper.)

Quality assurance

  • I have tested this code to the best of my abilities.
  • During testing, I had activated all plugins that Yoast SEO provides integrations for. (Not applicable — package unit tests.)
  • I have added unit tests to verify the code works as intended.
  • If any part of the code is behind a feature flag, my test instructions also cover cases where the feature flag is switched off. (No feature flag.)
  • I have written this PR in accordance with my team's definition of done.
  • I have checked that the base branch is correctly set.
  • I have run grunt build:images and committed the results, if my PR introduces or edits images or SVGs. (No images.)

Innovation

  • No innovation project is applicable for this PR.
  • This PR falls under an innovation project. I have attached the innovation label.
  • I have added my hours to the WBSO document.

Fixes Yoast/lingo-other-tasks#634

@coveralls

coveralls commented Jun 5, 2026

Copy link
Copy Markdown

Coverage Report for CI Build 1

Warning

Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes.
Quick fix: rebase this PR. Learn more →

Coverage increased (+0.2%) to 57.64%

Details

  • Coverage increased (+0.2%) from the base build.
  • Patch coverage: 7 of 7 lines across 1 file are fully covered (100%).
  • 73 coverage regressions across 9 files.

Uncovered Changes

No uncovered changes found.

Coverage Regressions

73 previously-covered lines in 9 files lost coverage.

File Lines Losing Coverage Coverage
packages/js/src/elementor/initialize.js 23 0.0%
packages/js/src/inline-links/edit-link.js 20 5.56%
packages/js/src/settings/app.js 11 0.0%
packages/js/src/ai-consent/initialize.js 6 0.0%
packages/js/src/ai-consent/store/index.js 4 0.0%
packages/js/src/ai-generator/components/errors/generic-alert.js 3 28.57%
packages/js/src/ai-consent/components/revoke-consent.js 2 0.0%
packages/ui-library/src/components/sidebar-navigation/stories.js 2 0.0%
packages/ui-library/src/components/sidebar-navigation/index.js 2 92.86%

Coverage Stats

Coverage Status
Relevant Lines: 28572
Covered Lines: 16866
Line Coverage: 59.03%
Relevant Branches: 18130
Covered Branches: 10053
Branch Coverage: 55.45%
Branches in Coverage %: Yes
Coverage Strength: 105321.74 hits per line

💛 - Coveralls

FAMarfuaty and others added 3 commits June 5, 2026 07:42
Adds a root `contract/` redirect mirroring `yoastseo/researcher` so consumers import `yoastseo/contract` instead of deep-requiring `build/`. Keeps zod off the package-root bundle (only pulled when the contract is imported) and avoids an `exports` map, which would break existing deep importers.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown

A merge conflict has been detected for the proposed code changes in this PR. Please resolve the conflict by either rebasing the PR or merging in changes from the base branch.

@FAMarfuaty FAMarfuaty added the changelog: non-user-facing Needs to be included in the 'Non-userfacing' category in the changelog label Jun 9, 2026
…igration-define-a-serializable-paperkeyphrase-input-contract-for-non-wordpress-consumers
@FAMarfuaty FAMarfuaty marked this pull request as ready for review June 9, 2026 06:37
FAMarfuaty and others added 4 commits June 9, 2026 09:14
…ict extension

Addresses PR review: replace lodash.isUndefined with nullish coalescing for the keyphrase/keyword alias (no extra dependency, avoids the no-undefined rule), and note in createToPaper's JSDoc that .extend() preserves .strict() so open-ended extra keys need .passthrough().

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@FAMarfuaty FAMarfuaty added the innovation Innovative issue. Relating to performance, memory or data-flow. label Jun 11, 2026
@vraja-pro

Copy link
Copy Markdown
Contributor

/build-zip

@github-actions

Copy link
Copy Markdown

📦 Plugin zip built successfully!

Download it from the workflow run.

@vraja-pro vraja-pro left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • When building I get changes in the yarn.lock files, should be added to this PR?
  • Should the change log entry include the version increase? like [yoastseo 0.1.0]?
  • Lint warnings were reduced to 18, lets update the package.json (now its 19)
  • when doing yarn build in app/content-analysis-webworker I get:
warning package.json: No license field
$ webpack
/bin/sh: webpack: command not found
error Command failed with exit code 127.

Some feedback from Claude that seems valid:

  1. createToPaper is promised but not delivered

The PR description calls out extensibility as a key feature:

▎ "Extensible by consumers via createToPaper(paperDtoSchema.extend({…}))"

But createToPaper is not exported anywhere in the diff — only toPaper and paperDtoSchema. A consumer who wants to extend must know the internal mapping (keyphrase → keyword, etc.) to rebuild the attributes object themselves, which defeats the stated goal. Either implementit or drop the claim from the description.


  1. Routes read request.body.locale directly after the contract validated it

In every route handler, after paperFromRequest validates the body through Zod, the code still reads the rawrequest body:

  const paper = paperFromRequest( request, response ); // Zod validates here
  if ( ! paper ) { return; }
  
  const language = request.body.locale || "en";       // raw body again — bypasses the contract

paper.getLocale() returns the validated locale (e.g. "en_US"). The language prefix could be extracted from that. As-is, the contract validates the locale field but the routes ignore the validated value and reach back into the raw body, which is inconsistent.


  1. meta-description and seo-title routes check request.body before the contract
  // analyze.js — /analyze/meta-description
  if ( ! request.body.description ) {
      return response.status( 400 ).json( { error: "Description is required" } );
  }
  const paper = paperFromRequest( request, response ); // Zod runs after

But /analyze/keyphrase does it the right way — contract first, then paper.hasKeyword(). The meta-description and seo-title routes do a falsy check on the raw body before Zod has had a chance to type-check. The pattern should be consistent: validate through the contract first, then inspect the resulting Paper via its accessors.


  1. Test accesses private _attributes directly
  // paperDtoSpec.js  
  expect( paper._attributes.wpBlocks ).toEqual( wpBlocks );

_attributes is internal state. If Paper has no public getter for wpBlocks, the test should assert behaviour(e.g. that the resulting analysis score changes) rather
than poking at private state. This will break silently if Paper's internals are ever restructured.


  1. undefined values passed explicitly to Paper
  const attributes = {
      keyword: keyphrase,      // undefined when neither keyphrase nor keyword supplied
      synonyms: data.synonyms, // undefined
      locale: data.locale,     // undefined
      ...
  }; 
  return new Paper( data.text, attributes );

The attributes object is built with explicit undefined values for absent optional fields. This is different from omitting the keys entirely. Paper's constructor
likely handles undefined gracefully, but it's fragile — if it ever iterates Object.keys(attributes), it would see all keys, not just the ones with values. A defensive approach:

Object.fromEntries( Object.entries( attributes ).filter( ( [ , v ] ) => v !== undefined ) )

  1. GET endpoints with request bodies (pre-existing, but worsened)

All routes use app.get(...) but read request.body. GET bodies are not part of the HTTP spec and are stripped by many proxies and CDNs. This is pre-existing, but this PR cements the pattern by wrapping it in paperFromRequest. Worth tracking — the right fix would be to switch these to POST, which is a separate task.

@FAMarfuaty

Copy link
Copy Markdown
Contributor Author

Thanks for the review! Addressed in the latest push.

Build/process questions

  • yarn.lock — the zod resolution is already committed in the branch's yarn.lock (the working tree is clean on a fresh install). Any further local churn you saw is yarn/environment drift, not a missing change, so there's nothing extra to commit here.
  • Lint warnings — fixed: --max-warnings is now 18.
  • webpack: command not foundapps/* aren't part of the root Yarn workspace, so the app's webpack devDependency isn't installed by a root yarn install. Run yarn install && yarn build inside apps/content-analysis-webworker. I've clarified this in the test instructions.

Code findings

  1. createToPaper — we've decided not to pursue consumer extensibility for this contract, so there's no createToPaper. I've dropped the claim from the description/changelog; paperDtoSchema + toPaper are the whole surface.
  2. locale from raw body — routes now derive the language from paper.getLocale() via a small paperLanguage helper, instead of reaching back into request.body.
  3. meta-description/seo-title order — both now validate through the contract first, then gate on paper.getDescription()/paper.getTitle(), consistent with /analyze/keyphrase.
  4. test reads _attributes — Paper exposes no public getter for wpBlocks/shortcodes (only isFrontPage() is public), and asserting full analysis behaviour would make a focused mapping unit test slow and brittle. I've kept the _attributes assertion as the most direct check that the mapping lands; happy to revisit if we add public getters.
  5. explicit undefinedtoPaper now filters undefined values so only supplied keys reach Paper.
  6. GET with request body — agreed it's pre-existing; switching to POST is a separate task, out of scope here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog: non-user-facing Needs to be included in the 'Non-userfacing' category in the changelog innovation Innovative issue. Relating to performance, memory or data-flow.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants