Python: Add exception, reachability, and other kinds of modelling#21668
Python: Add exception, reachability, and other kinds of modelling#21668
Conversation
c51a476 to
c70427d
Compare
The implementation is essentially the same as the one from `BasicBlockWithPointsTo`, with the main difference being that this one uses the exception machinery we just added (and some extensions added in this commit).
Adds support for finding instances, and adds things like a `BaseException` convenience class.
Used for queries where we mention the class of a literal in the alert message.
Adds `if False: ...` and `if typing.TYPE_CHECKING: ...` to the set of nodes that are unlikely to be reachable.
Adds `maybeUndefined` to the reachability module, modelling which names/variables may be undefined at runtime. The approach is very close to the one used in points-to, though it of course relies on our new modelling of exceptions/reachability instead.
c70427d to
ca59ca0
Compare
There was a problem hiding this comment.
Pull request overview
This PR introduces shared infrastructure in the Python dataflow internals to support upcoming query ports, including centralized exception-type modeling and reachability utilities.
Changes:
- Refactors
IncorrectExceptOrderto reuse a sharedExceptionTypesmodel instead of defining exception hierarchy logic locally. - Adds
ExceptionTypesandReachabilitymodules (and supporting predicates) toDataFlowDispatch.qllfor exception reasoning and reachability/undefined-name analysis. - Extends
DuckTypingwith new predicates for globally defined names and monkey-patched builtins.
Show a summary per file
| File | Description |
|---|---|
| python/ql/src/Exceptions/IncorrectExceptOrder.ql | Switches to shared exception-type infrastructure via ExceptionTypes. |
| python/ql/lib/semmle/python/dataflow/new/internal/DataFlowDispatch.qll | Adds new internal modules/predicates for exception hierarchy and reachability reasoning. |
Copilot's findings
- Files reviewed: 2/2 changed files
- Comments generated: 0
yoff
left a comment
There was a problem hiding this comment.
I think my comments can mostly be addressed by explanations and documentation :-)
| class UserExceptType extends ExceptType, TUserExceptType { | ||
| Class cls; | ||
|
|
||
| UserExceptType() { this = TUserExceptType(cls) } |
There was a problem hiding this comment.
I know this is effectively what was there before, but would it be possible to not have all classes here?
There was a problem hiding this comment.
As in: would you prefer that these were moved to a separate file? If not, then I'm not sure what you're asking.
| * is dominated by a call to a never-returning function or an unconditional raise. | ||
| */ | ||
| predicate neverReturns(Function f) { | ||
| exists(f.getANormalExit()) and |
There was a problem hiding this comment.
Why is this conjunct included? If it is to keep the predicate small, then it seems to have a more specialized context than its name and doc suggests.
| * and simple name lookups. | ||
| */ | ||
| private predicate unlikelyToRaise(ControlFlowNode node) { | ||
| exists(node.getAnExceptionalSuccessor()) and |
There was a problem hiding this comment.
I expect this is here to keep the predicate small.
| isCallToNeverReturningFunction(node) and | ||
| succ = node.getASuccessor() and | ||
| not succ = node.getAnExceptionalSuccessor() and | ||
| not succ.getNode() instanceof Yield |
There was a problem hiding this comment.
How does yield circumvent an exception?
| /** | ||
| * Holds if it is highly unlikely for control to flow from `node` to `succ`. | ||
| */ | ||
| predicate unlikelySuccessor(ControlFlowNode node, ControlFlowNode succ) { |
There was a problem hiding this comment.
Perhaps the doc should make clear that an unlikelySuccessor is a successor.
| * Holds if basic block `b` is likely to be reachable from the entry of its | ||
| * enclosing scope. | ||
| */ | ||
| predicate likelyReachable(BasicBlock b) { startBbLikelyReachable(b) } |
There was a problem hiding this comment.
I would have expected us to export a predicate on control flow nodes here..
| isAlwaysFalseGuard(node) and | ||
| succ = node.getATrueSuccessor() |
There was a problem hiding this comment.
This seems a bit specific and ad-hoc. Do we happen to know that this is the prevalent pattern? I guess if false is a quick way to comment out stuff. Did the extractor prune these at some point?
There was a problem hiding this comment.
This was indeed added to remove a bunch of false positives. I think these were implicitly pruned by the reachability check inherent to points-to (i.e. the fact that unreachable branches simply have no points-to information available).
| subscr.getObject() = | ||
| API::moduleImport("builtins") | ||
| .getMember("__dict__") | ||
| .getAValueReachableFromSource() | ||
| .asCfgNode() and | ||
| subscr.getIndex().getNode().(StringLiteral).getText() = name |
There was a problem hiding this comment.
I believe you can simply do
| subscr.getObject() = | |
| API::moduleImport("builtins") | |
| .getMember("__dict__") | |
| .getAValueReachableFromSource() | |
| .asCfgNode() and | |
| subscr.getIndex().getNode().(StringLiteral).getText() = name | |
| subscr = | |
| API::moduleImport("builtins") | |
| .getMember("__dict__") | |
| .getSubscript(name) | |
| .getAValueReachableFromSource() | |
| .asCfgNode() |
A grab bag of infrastructure that is needed for the next bunch of queries to be ported.
Should be reviewed commit-by-commit.