Skip to content

docs: add guide for building custom Docker images with additional languages#2054

Open
Jah-yee wants to merge 1 commit into
microsoft:mainfrom
Jah-yee:fix/docker-docs-add-custom-languages
Open

docs: add guide for building custom Docker images with additional languages#2054
Jah-yee wants to merge 1 commit into
microsoft:mainfrom
Jah-yee:fix/docker-docs-add-custom-languages

Conversation

@Jah-yee
Copy link
Copy Markdown

@Jah-yee Jah-yee commented Jun 3, 2026

Fixes #1663

What this PR does

Adds a new section Building custom Docker images for additional languages to , directly addressing issue #1663.

Changes

  • Documents which YAML files to modify and what each controls (, , )
  • Explains how to add language entries to
  • Provides command with flags for custom configs
  • Explains how to add spaCy language models via
  • Documents three common pitfalls:
    • OOM from adding too many languages at once
    • NLP recognizer warnings after adding new languages
    • Memory tuning for production
  • Links to related docs (NLP engine config, supported entities, development)

Testing

  • renders correctly as part of the Docusaurus docs build

…guages

Fixes microsoft#1663

- Documents which YAML files to modify (default.yaml, default_recognizers.yaml,
  default_analyzer.yaml) and what each controls
- Explains how to add language entries to default_recognizers.yaml
- Provides docker build command with --build-arg flags for custom configs
- Explains how to add spaCy language models via default.yaml
- Documents three common pitfalls: OOM from too many languages, NLP recognizer
  warnings, and memory tuning for production
- Links to related docs (NLP engine config, supported entities, development)
Copilot AI review requested due to automatic review settings June 3, 2026 06:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates the installation documentation and adds guidance for building custom Docker images to support additional languages beyond English.

Changes:

  • Reformats/rewrites the existing installation instructions while keeping the same structure (pip, Docker, source).
  • Adds a new section explaining how to customize analyzer Docker builds for additional languages via YAML config and spaCy models.
  • Includes operational notes/pitfalls for multi-language deployments (memory/worker tuning).

Comment thread docs/installation.md
## Description

This document describes the installation of the entire
Presidio suite using `pip` (as Python packages) or using `Docker` (As containerized services).
Comment thread docs/installation.md

### Supported Python Versions

Presidio is supported for the following python versions:
Comment thread docs/installation.md
Comment on lines +180 to +184
| Build argument | Default value | Purpose |
|---|---|---|
| `NLP_CONF_FILE` | `presidio_analyzer/conf/default.yaml` | Defines which NLP engine and model to use |
| `ANALYZER_CONF_FILE` | `presidio_analyzer/conf/default_analyzer.yaml` | Analyzer behavior (thresholds, entity mapping) |
| `RECOGNIZER_REGISTRY_CONF_FILE` | `presidio_analyzer/conf/default_recognizers.yaml` | Which recognizers to load per language |
Comment thread docs/installation.md
Comment on lines +195 to +199
- name: SpacyRecognizer
supported_languages:
- language: en
- language: de # ← add your language here
type: predefined
Comment thread docs/installation.md
```

Not all recognizers support every language. Check the recognizer's source code or the
[supported entities list](../supported_entities.md) for per-language coverage.
Comment thread docs/installation.md
Comment on lines +278 to +279
- [NLP engine configuration](../analyzer/nlp_engines/spacy_stanza.md)
- [Supported entities list](../supported_entities.md)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

More elaborate description how to build custom Docker images for Presidio

3 participants