Add Indian UPI ID recognizer for NPCI payment compliance#2036
Open
shauryaraghav wants to merge 2 commits into
Open
Add Indian UPI ID recognizer for NPCI payment compliance#2036shauryaraghav wants to merge 2 commits into
shauryaraghav wants to merge 2 commits into
Conversation
Author
|
@microsoft-github-policy-service agree |
1 similar comment
Author
|
@microsoft-github-policy-service agree |
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new India-specific predefined recognizer (InUpiRecognizer) for detecting UPI (Unified Payments Interface) IDs of the form username@handle, with a high-confidence regex for known bank/PSP handles (score 0.7) and a generic medium-confidence regex (score 0.4), plus context words and unit tests.
Changes:
- New
InUpiRecognizerclass with two regex patterns and UPI-related context words. - Exported the new class from the India package
__init__.pyand the top-levelpredefined_recognizers/__init__.py. - Added parametrized unit tests covering known handles, unknown handles, invalid inputs, and a sentence example.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| presidio-analyzer/presidio_analyzer/predefined_recognizers/country_specific/india/in_upi_recognizer.py | New InUpiRecognizer with high/medium patterns and IN_UPI entity. |
| presidio-analyzer/presidio_analyzer/predefined_recognizers/country_specific/india/init.py | Exports InUpiRecognizer from the India package. |
| presidio-analyzer/presidio_analyzer/predefined_recognizers/init.py | Re-exports InUpiRecognizer from the top-level recognizers package. |
| presidio-analyzer/tests/test_in_upi_recognizer.py | Adds parametrized tests for known/unknown handles and invalid inputs. |
| "SgUenRecognizer", | ||
| "InVoterRecognizer", | ||
| "InPassportRecognizer", | ||
| "InUpiRecognizer", |
Comment on lines
+29
to
+35
| r"\b([a-zA-Z0-9.\-_]{2,256}@(okicici|okhdfcbank|okaxis|oksbi|paytm|ybl|upi|apl|ibl|axl|waicici|wahdfcbank|timecosmos|rapl|mbk|ikwik|freecharge))\b", | ||
| 0.7, | ||
| ), | ||
| Pattern( | ||
| "UPI ID (Medium)", | ||
| r"\b([a-zA-Z0-9.\-_]{2,256}@[a-zA-Z]{2,64})\b", | ||
| 0.4, |
Comment on lines
+29
to
+31
| # Invalid UPI IDs | ||
| ("notaupiid", 0, (), ()), | ||
| ("@okicici", 0, (), ()), |
| entities, | ||
| ): | ||
| results = recognizer.analyze(text, entities) | ||
| print(results) |
Comment on lines
+5
to
+22
| class InUpiRecognizer(PatternRecognizer): | ||
| """ | ||
| Recognizes Indian UPI (Unified Payments Interface) IDs. | ||
|
|
||
| UPI IDs are used for digital payments in India and follow the format: | ||
| username@bankhandle (e.g., shaurya@okicici, 9876543210@paytm) | ||
|
|
||
| Common UPI handles include: okicici, okhdfcbank, okaxis, paytm, | ||
| ybl, upi, apl, ibl, axl, timecosmos, waicici, wahdfcbank | ||
|
|
||
| This recognizer identifies UPI IDs using regex and context words. | ||
| Reference: https://www.npci.org.in/what-we-do/upi/product-overview | ||
|
|
||
| :param patterns: List of patterns to be used by this recognizer | ||
| :param context: List of context words to increase confidence in detection | ||
| :param supported_language: Language this recognizer supports | ||
| :param supported_entity: The entity this recognizer can detect | ||
| """ |
8 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Change Description
Adds a new recognizer for Indian UPI (Unified Payments Interface) IDs.
Presidio currently supports Indian PII like Aadhaar, PAN, GSTIN and
Passport but has no support for UPI IDs, which are widely used for
digital payments across India.
Two confidence levels:
Issue reference
N/A - New feature addition
Checklist
Testing
Added 8 unit tests covering:
All 8 tests pass.
Reference
https://www.npci.org.in/what-we-do/upi/product-overview