Skip to content

fix(safaa): warn when threshold argument cannot be honored#52

Open
Valyrian-Code wants to merge 1 commit into
fossology:mainfrom
Valyrian-Code:fix/threshold-warning
Open

fix(safaa): warn when threshold argument cannot be honored#52
Valyrian-Code wants to merge 1 commit into
fossology:mainfrom
Valyrian-Code:fix/threshold-warning

Conversation

@Valyrian-Code

@Valyrian-Code Valyrian-Code commented May 23, 2026

Copy link
Copy Markdown

Description

SafaaAgent.predict documents a configurable threshold parameter, but the shipped classifier (SGDClassifier(loss='hinge')) does not implement predict_proba. At runtime:

hasattr(self.false_positive_detector, "predict_proba")

returns False, execution falls through to the binary predict() branch, and threshold is silently ignored.

This PR emits a UserWarning when a non-default threshold is passed to a model that cannot honor it, and documents the limitation in the docstring. Default usage remains warning-free, so existing callers are unaffected.

Changes

Test coverage

TestPredictThreshold verifies:

  • no warning at the default threshold
  • warnings for non-default thresholds (0.0, 1.0, 0.99, 0.01)
  • warning text mentions predict_proba
  • explicitly passing threshold=0.5 remains silent
  • predictions still succeed when warnings are emitted
  • threshold behavior works correctly when predict_proba is available
  • the >= threshold boundary is inclusive
  • no warning is emitted for probabilistic classifiers

How to test

poetry install
poetry run pytest tests/test_safaa.py -v

Manual reproduction

import warnings
from safaa.Safaa import SafaaAgent

agent = SafaaAgent()

with warnings.catch_warnings(record=True) as ws:
    warnings.simplefilter("always")
    agent.predict(["Copyright 2024 Foo"], threshold=0.3)

    print(ws[0].message)

Output:

UserWarning: The loaded false positive detector does not support
probability estimates (predict_proba); the 'threshold' argument is being ignored...

This closes #51.

Note: this PR shares the tests/ scaffolding introduced in #48 and #50. Depending on merge order, a small rebase may be needed for tests/__init__.py and the test module header.

SafaaAgent.predict accepts a `threshold` parameter, but only the
predict_proba branch consults it. The shipped SGD classifier uses
loss='hinge' and therefore has no predict_proba (sklearn's
@available_if descriptor raises AttributeError on access), so the
hasattr() check is False at runtime and execution falls to the binary
predict() path that ignores the threshold entirely.

Callers tuning the sensitivity get the default SVM decision boundary
with no indication that their argument had no effect. This was working
when the original model was trained with a probability-supporting loss;
the regression slipped in when the SGD(hinge) model replaced it.

Emit a UserWarning when a non-default threshold is passed but the
loaded model cannot honor it, and document the constraint in the
docstring. Default usage stays warning-free.

Tests cover: no warning at default threshold, warning at non-default
threshold, warning text mentions predict_proba, warnings at extreme
threshold values, predictions remain valid alongside the warning, and a
monkeypatched fake classifier proves the threshold actually controls
output when predict_proba is available (with a boundary-inclusive
check).

Signed-off-by: RAJVEER42 <irajveer.bishnoi2310@gmail.com>
Copilot AI review requested due to automatic review settings May 23, 2026 11:03
@Valyrian-Code

Valyrian-Code commented May 23, 2026

Copy link
Copy Markdown
Author

Hi @GMishx & @Kaushl2208

Hi! I opened a small PR with bug fixe and regression test. I’d appreciate a review whenever you have time thanks for maintaining the project!

GIF

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

predict() threshold argument is silently ignored by the shipped SGD(hinge) model

2 participants