feat(analyzer): Deduplicate NER model instances in multilingual configuration to reduce memory usage by yuriihavrylko · Pull Request #2052 · microsoft/presidio

yuriihavrylko · 2026-05-31T15:12:55Z

Change Description

Problem

When running Presidio Analyzer with a multilingual HuggingFace NER configuration (e.g., 8 languages), the RecognizerListLoader creates one recognizer instance per language. Each instance calls load() during __init__, loading a separate copy of the same model into memory. For a HuggingFace transformer model (~400MB), 8 languages means ~3.2GB of duplicated weights. The same issue affects GLiNER and spaCy multilingual models (xx_ent_wiki_sm).

Before fix: ~3.5GB memory (Docker, CPU, 8 languages with dslim/bert-base-NER)
After fix: ~1.1GB memory (same setup)

The same improvement applies to GPU deployments - model weights are the bottleneck regardless of device (RAM or VRAM).

Root causes

HuggingFaceNerRecognizer: N identical HuggingFace pipelines loaded (one per language)
GLiNERRecognizer: N identical GLiNER models loaded (one per language)
SpacyNlpEngine: N identical spaCy Language instances for the same xx_ent_wiki_sm model, each with independently growing StringStore/Vocab

Changes

Model sharing (memory fix)

HuggingFaceNerRecognizer: added class-level _shared_pipelines cache keyed by (model_name, tokenizer_name, aggregation_strategy, device). Instances with identical config reuse the same pipeline.
GLiNERRecognizer: added class-level _shared_models cache keyed by (model_name, map_location, load_onnx_model, onnx_model_file). Same pattern.
SpacyNlpEngine.load(): uses a local loaded_models dict to share a single spacy.Language instance when multiple languages use the same model_name.

Tests

Added sharing/non-sharing tests for all three recognizers
Added autouse cache-clearing fixtures to prevent test pollution

Thread safety note

The class-level caches are safe for multiprocessing deployments (e.g., gunicorn sync workers, which is the default). For multithreaded deployments, a lock would be needed - but Presidio's standard deployment pattern uses multiprocessing.

Issue reference

No linked issue

Checklist

I have reviewed the contribution guidelines
I have signed the CLA (if required)
My code includes unit tests
All unit tests and lint checks pass locally
My PR contains documentation updates / additions if required

…nizers to avoid in-memory duplicates

…e recognizers

…lp engine, and recognizers

omri374 · 2026-06-01T06:34:05Z

+    # Class-level cache for sharing GLiNER models across instances.
+    # Keyed by (model_name, map_location, load_onnx_model, onnx_model_file).
+    # Avoids loading duplicate copies when the same model serves multiple languages.
+    _shared_models: dict = {}


Can we think of an alternative approach using dependency injection or model registry? This would never get released.
For example:

model = GLiNER.from_pretrained(...) recognizer_en = GLiNERRecognizer(..., model=model) recognizer_es = GLiNERRecognizer(..., model=model) recognizer_fr = GLiNERRecognizer(..., model=model)

Or

self.gliner = GLiNERModelRegistry.get_model(...)

The user should be able to control this model registry (add, remove, update)

Good point - the initial approach is naive, but it showcases the reality of the issue and the potential gains in multilingual setups. My intent was to implement it for both programmatic and yaml config-based use cases.

Here are the options I'm considering based on DI and model registry ideas:

Option A: DI + loader-level sharing
Dependency injection as you shown, plus

# YAML path: # RecognizerListLoader detects same-model recognizers and # injects the loaded model into subsequent instances

Simple, no new classes. But couples the loader to each recognizer's internals (gliner_model= vs ner_pipeline=, different cache key shapes). Every new recognizer type would need loader changes.

Option B: ModelRegistry

# Programmatic — user controls the registry: registry = ModelRegistry() rec_en = GLiNERRecognizer(model_registry=registry, supported_language="en") rec_es = GLiNERRecognizer(model_registry=registry, supported_language="es") # First instance loads and registers, second reuses # Or direct injection (no registry needed): model = GLiNER.from_pretrained(...) rec = GLiNERRecognizer(gliner_model=model, ...) # YAML path — automatic: # RecognizerListLoader creates a ModelRegistry and injects it # into recognizers that accept `model_registry` parameter

The caching logic (key shape, what to store) stays inside each recognizer - the loader just provides the shared bucket and doesn't need model-specific knowledge.

Option C: Multi-language recognizer

GLiNERRecognizer(supported_languages=["en", "es", "de", ...])

Eliminates the problem at the root - one instance, one model, no sharing needed. But EntityRecognizer is built around supported_language (singular). Cleanest long-term solution but a major refactor, not a bug fix/small improvement scope.

Option D: Lazy loading
Remove self.load() from EntityRecognizer.__init__, load on first analyze() call instead. Makes DI/registry/sharing trivially easy since all instances are created cheaply and models configured afterward. But it's a base class behavior change that affects every recognizer.

I prefer option B for this PR - covers both use cases, keeps the loader generic, gives the user full control.

Options C and D are worth considering as longer-term architectural improvements.

What do you think?

omri374 · 2026-06-01T14:30:02Z

Thanks @yuriihavrylko, great addition! I've added one comment on the design- let's discuss.

yuriihavrylko added 4 commits May 31, 2026 11:49

feat: implement model sharing for spacy, gliner and HuggingFace recog…

4315b87

…nizers to avoid in-memory duplicates

test: add shared model caching tests for spacy, gliner and HuggingFac…

5c6bc67

…e recognizers

fix: exclude None values in recognizer registry configuration validation

e2e9a08

docs: add configuration files for multilingual support in analyzer, n…

0e04855

…lp engine, and recognizers

Copilot AI review requested due to automatic review settings May 31, 2026 15:12

github-actions Bot added the external label May 31, 2026

omri374 reviewed Jun 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(analyzer): Deduplicate NER model instances in multilingual configuration to reduce memory usage#2052

feat(analyzer): Deduplicate NER model instances in multilingual configuration to reduce memory usage#2052
yuriihavrylko wants to merge 4 commits into
microsoft:mainfrom
yuriihavrylko:feat/deduplicate-ner-model-instances

yuriihavrylko commented May 31, 2026

Uh oh!

omri374 Jun 1, 2026

Uh oh!

yuriihavrylko Jun 4, 2026

Uh oh!

omri374 commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yuriihavrylko commented May 31, 2026

Change Description

Problem

Root causes

Changes

Model sharing (memory fix)

Tests

Thread safety note

Issue reference

Checklist

Uh oh!

omri374 Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

yuriihavrylko Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

omri374 commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants