Skip to content

Add Indian UPI ID recognizer for NPCI payment compliance#2036

Open
shauryaraghav wants to merge 2 commits into
microsoft:mainfrom
shauryaraghav:main
Open

Add Indian UPI ID recognizer for NPCI payment compliance#2036
shauryaraghav wants to merge 2 commits into
microsoft:mainfrom
shauryaraghav:main

Conversation

@shauryaraghav
Copy link
Copy Markdown

Change Description

Adds a new recognizer for Indian UPI (Unified Payments Interface) IDs.

Presidio currently supports Indian PII like Aadhaar, PAN, GSTIN and
Passport but has no support for UPI IDs, which are widely used for
digital payments across India.

Two confidence levels:

  • High (0.7): Matches known UPI handles like okicici, okhdfcbank, okaxis, paytm, ybl etc.
  • Medium (0.4): Matches any valid username@bankhandle format

Issue reference

N/A - New feature addition

Checklist

  • I have reviewed the contribution guidelines
  • I have signed the CLA (if required)
  • My code includes unit tests
  • All unit tests and lint checks pass locally
  • My PR contains documentation updates / additions if required

Testing

Added 8 unit tests covering:

  • Valid UPI IDs with known handles
  • Valid UPI IDs with unknown handles
  • Invalid formats
  • UPI IDs within sentences

All 8 tests pass.

Reference

https://www.npci.org.in/what-we-do/upi/product-overview

@shauryaraghav
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

1 similar comment
@shauryaraghav
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new India-specific predefined recognizer (InUpiRecognizer) for detecting UPI (Unified Payments Interface) IDs of the form username@handle, with a high-confidence regex for known bank/PSP handles (score 0.7) and a generic medium-confidence regex (score 0.4), plus context words and unit tests.

Changes:

  • New InUpiRecognizer class with two regex patterns and UPI-related context words.
  • Exported the new class from the India package __init__.py and the top-level predefined_recognizers/__init__.py.
  • Added parametrized unit tests covering known handles, unknown handles, invalid inputs, and a sentence example.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
presidio-analyzer/presidio_analyzer/predefined_recognizers/country_specific/india/in_upi_recognizer.py New InUpiRecognizer with high/medium patterns and IN_UPI entity.
presidio-analyzer/presidio_analyzer/predefined_recognizers/country_specific/india/init.py Exports InUpiRecognizer from the India package.
presidio-analyzer/presidio_analyzer/predefined_recognizers/init.py Re-exports InUpiRecognizer from the top-level recognizers package.
presidio-analyzer/tests/test_in_upi_recognizer.py Adds parametrized tests for known/unknown handles and invalid inputs.

"SgUenRecognizer",
"InVoterRecognizer",
"InPassportRecognizer",
"InUpiRecognizer",
Comment on lines +29 to +35
r"\b([a-zA-Z0-9.\-_]{2,256}@(okicici|okhdfcbank|okaxis|oksbi|paytm|ybl|upi|apl|ibl|axl|waicici|wahdfcbank|timecosmos|rapl|mbk|ikwik|freecharge))\b",
0.7,
),
Pattern(
"UPI ID (Medium)",
r"\b([a-zA-Z0-9.\-_]{2,256}@[a-zA-Z]{2,64})\b",
0.4,
Comment on lines +29 to +31
# Invalid UPI IDs
("notaupiid", 0, (), ()),
("@okicici", 0, (), ()),
entities,
):
results = recognizer.analyze(text, entities)
print(results)
Comment on lines +5 to +22
class InUpiRecognizer(PatternRecognizer):
"""
Recognizes Indian UPI (Unified Payments Interface) IDs.

UPI IDs are used for digital payments in India and follow the format:
username@bankhandle (e.g., shaurya@okicici, 9876543210@paytm)

Common UPI handles include: okicici, okhdfcbank, okaxis, paytm,
ybl, upi, apl, ibl, axl, timecosmos, waicici, wahdfcbank

This recognizer identifies UPI IDs using regex and context words.
Reference: https://www.npci.org.in/what-we-do/upi/product-overview

:param patterns: List of patterns to be used by this recognizer
:param context: List of context words to increase confidence in detection
:param supported_language: Language this recognizer supports
:param supported_entity: The entity this recognizer can detect
"""
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants