Skip to content

docs: explain custom analyzer Docker language config#2037

Open
ynachiket wants to merge 1 commit into
microsoft:mainfrom
ynachiket:ynachiket/docs-custom-docker-languages-1663-20260524
Open

docs: explain custom analyzer Docker language config#2037
ynachiket wants to merge 1 commit into
microsoft:mainfrom
ynachiket:ynachiket/docs-custom-docker-languages-1663-20260524

Conversation

@ynachiket
Copy link
Copy Markdown
Contributor

Change

Adds a focused installation docs section for customizing the analyzer Docker image when deploying additional NLP languages. The new section covers:

  • which analyzer/NLP/recognizer YAML files map to Docker build arguments
  • a minimal Spanish spaCy configuration example
  • building the image so referenced NLP models are installed
  • runtime config mounting when models already exist in the image
  • common pitfalls around supported_languages, NLP recognizer warnings, missing model downloads, and memory growth

Fixes #1663.

Validation

  • ruby -ryaml ... parsed the new YAML snippet and checked the Spanish model and recognizer language lists
  • rg content check for the new Docker language guidance, build arguments, Spanish model, and health/recognizer endpoints
  • local link target check for the referenced analyzer configuration docs
  • git diff --check

I did not run the full MkDocs build because the docs dependency set is not installed in this local checkout; this is a single-page Markdown documentation update.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Docker-focused installation guidance for customizing Presidio Analyzer images to support additional NLP languages, addressing the linked request for clearer YAML and build instructions.

Changes:

  • Documents analyzer Docker build arguments for analyzer, NLP, and recognizer registry configuration.
  • Adds a minimal Spanish spaCy combined-configuration example and custom image build/run commands.
  • Lists common pitfalls around language alignment, NLP recognizers, missing model downloads, and runtime validation endpoints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

More elaborate description how to build custom Docker images for Presidio

2 participants