Skip to content

feat(analyzer): Deduplicate NER model instances in multilingual configuration to reduce memory usage#2052

Open
yuriihavrylko wants to merge 4 commits into
microsoft:mainfrom
yuriihavrylko:feat/deduplicate-ner-model-instances
Open

feat(analyzer): Deduplicate NER model instances in multilingual configuration to reduce memory usage#2052
yuriihavrylko wants to merge 4 commits into
microsoft:mainfrom
yuriihavrylko:feat/deduplicate-ner-model-instances

Conversation

@yuriihavrylko
Copy link
Copy Markdown
Contributor

Change Description

Problem

When running Presidio Analyzer with a multilingual HuggingFace NER configuration (e.g., 8 languages), the RecognizerListLoader creates one recognizer instance per language. Each instance calls load() during __init__, loading a separate copy of the same model into memory. For a HuggingFace transformer model (~400MB), 8 languages means ~3.2GB of duplicated weights. The same issue affects GLiNER and spaCy multilingual models (xx_ent_wiki_sm).

Before fix: ~3.5GB memory (Docker, CPU, 8 languages with dslim/bert-base-NER)
After fix: ~1.1GB memory (same setup)

The same improvement applies to GPU deployments - model weights are the bottleneck regardless of device (RAM or VRAM).

Root causes

  1. HuggingFaceNerRecognizer: N identical HuggingFace pipelines loaded (one per language)
  2. GLiNERRecognizer: N identical GLiNER models loaded (one per language)
  3. SpacyNlpEngine: N identical spaCy Language instances for the same xx_ent_wiki_sm model, each with independently growing StringStore/Vocab

Changes

Model sharing (memory fix)

  • HuggingFaceNerRecognizer: added class-level _shared_pipelines cache keyed by (model_name, tokenizer_name, aggregation_strategy, device). Instances with identical config reuse the same pipeline.
  • GLiNERRecognizer: added class-level _shared_models cache keyed by (model_name, map_location, load_onnx_model, onnx_model_file). Same pattern.
  • SpacyNlpEngine.load(): uses a local loaded_models dict to share a single spacy.Language instance when multiple languages use the same model_name.

Tests

  • Added sharing/non-sharing tests for all three recognizers
  • Added autouse cache-clearing fixtures to prevent test pollution

Thread safety note

The class-level caches are safe for multiprocessing deployments (e.g., gunicorn sync workers, which is the default). For multithreaded deployments, a lock would be needed - but Presidio's standard deployment pattern uses multiprocessing.

Issue reference

No linked issue

Checklist

  • I have reviewed the contribution guidelines
  • I have signed the CLA (if required)
  • My code includes unit tests
  • All unit tests and lint checks pass locally
  • My PR contains documentation updates / additions if required

Copilot AI review requested due to automatic review settings May 31, 2026 15:12
# Class-level cache for sharing GLiNER models across instances.
# Keyed by (model_name, map_location, load_onnx_model, onnx_model_file).
# Avoids loading duplicate copies when the same model serves multiple languages.
_shared_models: dict = {}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we think of an alternative approach using dependency injection or model registry? This would never get released.
For example:

model = GLiNER.from_pretrained(...)

recognizer_en = GLiNERRecognizer(..., model=model)
recognizer_es = GLiNERRecognizer(..., model=model)
recognizer_fr = GLiNERRecognizer(..., model=model)

Or

self.gliner = GLiNERModelRegistry.get_model(...)

The user should be able to control this model registry (add, remove, update)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - the initial approach is naive, but it showcases the reality of the issue and the potential gains in multilingual setups. My intent was to implement it for both programmatic and yaml config-based use cases.

Here are the options I'm considering based on DI and model registry ideas:

Option A: DI + loader-level sharing
Dependency injection as you shown, plus

# YAML path:
# RecognizerListLoader detects same-model recognizers and
# injects the loaded model into subsequent instances

Simple, no new classes. But couples the loader to each recognizer's internals (gliner_model= vs ner_pipeline=, different cache key shapes). Every new recognizer type would need loader changes.

Option B: ModelRegistry

# Programmatic — user controls the registry:
registry = ModelRegistry()
rec_en = GLiNERRecognizer(model_registry=registry, supported_language="en")
rec_es = GLiNERRecognizer(model_registry=registry, supported_language="es")
# First instance loads and registers, second reuses

# Or direct injection (no registry needed):
model = GLiNER.from_pretrained(...)
rec = GLiNERRecognizer(gliner_model=model, ...)

# YAML path — automatic:
# RecognizerListLoader creates a ModelRegistry and injects it
# into recognizers that accept `model_registry` parameter

The caching logic (key shape, what to store) stays inside each recognizer - the loader just provides the shared bucket and doesn't need model-specific knowledge.

Option C: Multi-language recognizer

GLiNERRecognizer(supported_languages=["en", "es", "de", ...])

Eliminates the problem at the root - one instance, one model, no sharing needed. But EntityRecognizer is built around supported_language (singular). Cleanest long-term solution but a major refactor, not a bug fix/small improvement scope.

Option D: Lazy loading
Remove self.load() from EntityRecognizer.__init__, load on first analyze() call instead. Makes DI/registry/sharing trivially easy since all instances are created cheaply and models configured afterward. But it's a base class behavior change that affects every recognizer.

I prefer option B for this PR - covers both use cases, keeps the loader generic, gives the user full control.

Options C and D are worth considering as longer-term architectural improvements.

What do you think?

@omri374
Copy link
Copy Markdown
Collaborator

omri374 commented Jun 1, 2026

Thanks @yuriihavrylko, great addition! I've added one comment on the design- let's discuss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants