docs: add guide for building custom Docker images with additional languages#2054
Open
Jah-yee wants to merge 1 commit into
Open
docs: add guide for building custom Docker images with additional languages#2054Jah-yee wants to merge 1 commit into
Jah-yee wants to merge 1 commit into
Conversation
…guages Fixes microsoft#1663 - Documents which YAML files to modify (default.yaml, default_recognizers.yaml, default_analyzer.yaml) and what each controls - Explains how to add language entries to default_recognizers.yaml - Provides docker build command with --build-arg flags for custom configs - Explains how to add spaCy language models via default.yaml - Documents three common pitfalls: OOM from too many languages, NLP recognizer warnings, and memory tuning for production - Links to related docs (NLP engine config, supported entities, development)
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Updates the installation documentation and adds guidance for building custom Docker images to support additional languages beyond English.
Changes:
- Reformats/rewrites the existing installation instructions while keeping the same structure (pip, Docker, source).
- Adds a new section explaining how to customize analyzer Docker builds for additional languages via YAML config and spaCy models.
- Includes operational notes/pitfalls for multi-language deployments (memory/worker tuning).
| ## Description | ||
|
|
||
| This document describes the installation of the entire | ||
| Presidio suite using `pip` (as Python packages) or using `Docker` (As containerized services). |
|
|
||
| ### Supported Python Versions | ||
|
|
||
| Presidio is supported for the following python versions: |
Comment on lines
+180
to
+184
| | Build argument | Default value | Purpose | | ||
| |---|---|---| | ||
| | `NLP_CONF_FILE` | `presidio_analyzer/conf/default.yaml` | Defines which NLP engine and model to use | | ||
| | `ANALYZER_CONF_FILE` | `presidio_analyzer/conf/default_analyzer.yaml` | Analyzer behavior (thresholds, entity mapping) | | ||
| | `RECOGNIZER_REGISTRY_CONF_FILE` | `presidio_analyzer/conf/default_recognizers.yaml` | Which recognizers to load per language | |
Comment on lines
+195
to
+199
| - name: SpacyRecognizer | ||
| supported_languages: | ||
| - language: en | ||
| - language: de # ← add your language here | ||
| type: predefined |
| ``` | ||
|
|
||
| Not all recognizers support every language. Check the recognizer's source code or the | ||
| [supported entities list](../supported_entities.md) for per-language coverage. |
Comment on lines
+278
to
+279
| - [NLP engine configuration](../analyzer/nlp_engines/spacy_stanza.md) | ||
| - [Supported entities list](../supported_entities.md) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #1663
What this PR does
Adds a new section Building custom Docker images for additional languages to , directly addressing issue #1663.
Changes
Testing