fix(safaa): warn when threshold argument cannot be honored by Valyrian-Code · Pull Request #52 · fossology/safaa

Valyrian-Code · 2026-05-23T11:03:34Z

Description

SafaaAgent.predict documents a configurable threshold parameter, but the shipped classifier (SGDClassifier(loss='hinge')) does not implement predict_proba. At runtime:

hasattr(self.false_positive_detector, "predict_proba")

returns False, execution falls through to the binary predict() branch, and threshold is silently ignored.

This PR emits a UserWarning when a non-default threshold is passed to a model that cannot honor it, and documents the limitation in the docstring. Default usage remains warning-free, so existing callers are unaffected.

Changes

[Safaa.py](https://github.com/fossology/safaa/blob/main/Safaa/src/safaa/Safaa.py?utm_source=chatgpt.com)
- add import warnings
- emit UserWarning when threshold != 0.5 but the loaded model lacks predict_proba
- update the predict() docstring to document the constraint and workaround (loss='log_loss' or 'modified_huber')
tests/__init__.py, tests/test_safaa.py
- add 9 pytest tests covering warning behavior, threshold semantics, and probabilistic-model behavior
[pyproject.toml](https://github.com/fossology/safaa/blob/main/pyproject.toml?utm_source=chatgpt.com)
- add pytest to dev dependencies

Test coverage

TestPredictThreshold verifies:

no warning at the default threshold
warnings for non-default thresholds (0.0, 1.0, 0.99, 0.01)
warning text mentions predict_proba
explicitly passing threshold=0.5 remains silent
predictions still succeed when warnings are emitted
threshold behavior works correctly when predict_proba is available
the >= threshold boundary is inclusive
no warning is emitted for probabilistic classifiers

How to test

poetry install
poetry run pytest tests/test_safaa.py -v

Manual reproduction

import warnings
from safaa.Safaa import SafaaAgent

agent = SafaaAgent()

with warnings.catch_warnings(record=True) as ws:
    warnings.simplefilter("always")
    agent.predict(["Copyright 2024 Foo"], threshold=0.3)

    print(ws[0].message)

Output:

UserWarning: The loaded false positive detector does not support
probability estimates (predict_proba); the 'threshold' argument is being ignored...

This closes #51.

Note: this PR shares the tests/ scaffolding introduced in #48 and #50. Depending on merge order, a small rebase may be needed for tests/__init__.py and the test module header.

SafaaAgent.predict accepts a `threshold` parameter, but only the predict_proba branch consults it. The shipped SGD classifier uses loss='hinge' and therefore has no predict_proba (sklearn's @available_if descriptor raises AttributeError on access), so the hasattr() check is False at runtime and execution falls to the binary predict() path that ignores the threshold entirely. Callers tuning the sensitivity get the default SVM decision boundary with no indication that their argument had no effect. This was working when the original model was trained with a probability-supporting loss; the regression slipped in when the SGD(hinge) model replaced it. Emit a UserWarning when a non-default threshold is passed but the loaded model cannot honor it, and document the constraint in the docstring. Default usage stays warning-free. Tests cover: no warning at default threshold, warning at non-default threshold, warning text mentions predict_proba, warnings at extreme threshold values, predictions remain valid alongside the warning, and a monkeypatched fake classifier proves the threshold actually controls output when predict_proba is available (with a boundary-inclusive check). Signed-off-by: RAJVEER42 <irajveer.bishnoi2310@gmail.com>

Valyrian-Code · 2026-05-23T11:08:43Z

Hi @GMishx & @Kaushl2208

Hi! I opened a small PR with bug fixe and regression test. I’d appreciate a review whenever you have time thanks for maintaining the project!

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI review requested due to automatic review settings May 23, 2026 11:03

Valyrian-Code mentioned this pull request May 23, 2026

predict() threshold argument is silently ignored by the shipped SGD(hinge) model #51

Open

Copilot AI reviewed May 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(safaa): warn when threshold argument cannot be honored#52

fix(safaa): warn when threshold argument cannot be honored#52
Valyrian-Code wants to merge 1 commit into
fossology:mainfrom
Valyrian-Code:fix/threshold-warning

Valyrian-Code commented May 23, 2026 •

edited

Loading

Uh oh!

Valyrian-Code commented May 23, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Valyrian-Code commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Test coverage

How to test

Manual reproduction

Uh oh!

Valyrian-Code commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Valyrian-Code commented May 23, 2026 •

edited

Loading

Valyrian-Code commented May 23, 2026 •

edited

Loading