Skip to content

Python: Add exception, reachability, and other kinds of modelling#21668

Open
tausbn wants to merge 7 commits intomainfrom
tausbn/python-port-shared-infrastructure
Open

Python: Add exception, reachability, and other kinds of modelling#21668
tausbn wants to merge 7 commits intomainfrom
tausbn/python-port-shared-infrastructure

Conversation

@tausbn
Copy link
Copy Markdown
Contributor

@tausbn tausbn commented Apr 8, 2026

A grab bag of infrastructure that is needed for the next bunch of queries to be ported.

Should be reviewed commit-by-commit.

@tausbn tausbn added the no-change-note-required This PR does not need a change note label Apr 8, 2026
@github-actions github-actions bot added the Python label Apr 8, 2026
@tausbn tausbn force-pushed the tausbn/python-port-shared-infrastructure branch from c51a476 to c70427d Compare April 8, 2026 14:48
tausbn added 6 commits April 8, 2026 15:54
The implementation is essentially the same as the one from
`BasicBlockWithPointsTo`, with the main difference being that this one
uses the exception machinery we just added (and some extensions added in
this commit).
Adds support for finding instances, and adds things like a
`BaseException` convenience class.
Used for queries where we mention the class of a literal in the alert
message.
Adds `if False: ...` and `if typing.TYPE_CHECKING: ...` to the set of
nodes that are unlikely to be reachable.
Adds `maybeUndefined` to the reachability module, modelling which
names/variables may be undefined at runtime. The approach is very close
to the one used in points-to, though it of course relies on our new
modelling of exceptions/reachability instead.
@tausbn tausbn force-pushed the tausbn/python-port-shared-infrastructure branch from c70427d to ca59ca0 Compare April 8, 2026 15:58
@tausbn tausbn marked this pull request as ready for review April 13, 2026 11:20
@tausbn tausbn requested a review from a team as a code owner April 13, 2026 11:20
Copilot AI review requested due to automatic review settings April 13, 2026 11:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces shared infrastructure in the Python dataflow internals to support upcoming query ports, including centralized exception-type modeling and reachability utilities.

Changes:

  • Refactors IncorrectExceptOrder to reuse a shared ExceptionTypes model instead of defining exception hierarchy logic locally.
  • Adds ExceptionTypes and Reachability modules (and supporting predicates) to DataFlowDispatch.qll for exception reasoning and reachability/undefined-name analysis.
  • Extends DuckTyping with new predicates for globally defined names and monkey-patched builtins.
Show a summary per file
File Description
python/ql/src/Exceptions/IncorrectExceptOrder.ql Switches to shared exception-type infrastructure via ExceptionTypes.
python/ql/lib/semmle/python/dataflow/new/internal/DataFlowDispatch.qll Adds new internal modules/predicates for exception hierarchy and reachability reasoning.

Copilot's findings

  • Files reviewed: 2/2 changed files
  • Comments generated: 0

Copy link
Copy Markdown
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my comments can mostly be addressed by explanations and documentation :-)

class UserExceptType extends ExceptType, TUserExceptType {
Class cls;

UserExceptType() { this = TUserExceptType(cls) }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is effectively what was there before, but would it be possible to not have all classes here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in: would you prefer that these were moved to a separate file? If not, then I'm not sure what you're asking.

* is dominated by a call to a never-returning function or an unconditional raise.
*/
predicate neverReturns(Function f) {
exists(f.getANormalExit()) and
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this conjunct included? If it is to keep the predicate small, then it seems to have a more specialized context than its name and doc suggests.

* and simple name lookups.
*/
private predicate unlikelyToRaise(ControlFlowNode node) {
exists(node.getAnExceptionalSuccessor()) and
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect this is here to keep the predicate small.

isCallToNeverReturningFunction(node) and
succ = node.getASuccessor() and
not succ = node.getAnExceptionalSuccessor() and
not succ.getNode() instanceof Yield
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does yield circumvent an exception?

/**
* Holds if it is highly unlikely for control to flow from `node` to `succ`.
*/
predicate unlikelySuccessor(ControlFlowNode node, ControlFlowNode succ) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the doc should make clear that an unlikelySuccessor is a successor.

* Holds if basic block `b` is likely to be reachable from the entry of its
* enclosing scope.
*/
predicate likelyReachable(BasicBlock b) { startBbLikelyReachable(b) }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected us to export a predicate on control flow nodes here..

Comment on lines +2483 to +2484
isAlwaysFalseGuard(node) and
succ = node.getATrueSuccessor()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit specific and ad-hoc. Do we happen to know that this is the prevalent pattern? I guess if false is a quick way to comment out stuff. Did the extractor prune these at some point?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was indeed added to remove a bunch of false positives. I think these were implicitly pruned by the reachability check inherent to points-to (i.e. the fact that unreachable branches simply have no points-to information available).

Comment on lines +2014 to +2019
subscr.getObject() =
API::moduleImport("builtins")
.getMember("__dict__")
.getAValueReachableFromSource()
.asCfgNode() and
subscr.getIndex().getNode().(StringLiteral).getText() = name
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you can simply do

Suggested change
subscr.getObject() =
API::moduleImport("builtins")
.getMember("__dict__")
.getAValueReachableFromSource()
.asCfgNode() and
subscr.getIndex().getNode().(StringLiteral).getText() = name
subscr =
API::moduleImport("builtins")
.getMember("__dict__")
.getSubscript(name)
.getAValueReachableFromSource()
.asCfgNode()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-change-note-required This PR does not need a change note Python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants