-
Notifications
You must be signed in to change notification settings - Fork 158
Configurable pattern matching semantics in response to #174 #175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
boggle
wants to merge
38
commits into
opencypher:master
Choose a base branch
from
boggle:isomatch
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 5 commits
Commits
Show all changes
38 commits
Select commit
Hold shift + click to select a range
8b6513b
Add proposal for isomorphic pattern matching in response to #174
boggle ed1a824
Support walks, trails, paths; shortest and cheapest paths matching
boggle 6117618
Rework CIP
boggle 5742fdf
Simplified set of match modes and default handling
boggle 13f3711
Fix example
boggle fe68d48
Small fixups
boggle 856ae1b
Removed duplicate line
boggle dd397a5
Restructure CIP and improve language
Mats-SX ce78377
Rename pattern binder to pattern variable
Mats-SX cd54871
Remove section on multiline patterns
Mats-SX 9386b70
Remove section on pre-parser options
Mats-SX 048ac32
Move functions to appendix
Mats-SX 00dc3c1
Rename Walks section to walks, trails, paths
Mats-SX 91622f5
Morphism CIP: add definitions of WTP
db32e46
Structural improvements
7cccfff
Tie pattern variable classes together with formal definitions
8f2982d
Tidy up referencing
254b890
Added plural forms to grammar
boggle d6384c2
Mandate correct cardinality warnings consistently
boggle b789402
Updated grammar to include all proposed syntactic forms
boggle e908196
Update grammar to allow match modes etc without giving a variable name
boggle dd74857
Reinstated pre-parser suggestions
f98570c
Add exemplar data graph: used for examples in CIPs
fce68fa
Clarified relationship of WTP to *morphisms
boggle 40713bb
Add expansive example using data graph
50e7ac3
Added a detailed example; fix-ups; language changes
5a8436d
Formatted query headings so they are more readable in default GH view
08afc88
Added TOC
efb9fee
Removed Cycles and Circuits for now
boggle 1c39c76
Change to minimize breaking existing queries
boggle f8c45b4
Update examples to reflect latest changes
boggle c6b01b5
Textual tidy-ups
ac72e91
Updated default rules + Moved disjoint back into proposal
boggle c139130
Reformatted title
8cb74ae
Updated to reflect recent discussions
boggle ad333a0
Merge remote-tracking branch 'me/isomatch' into isomatch
boggle 3b4255f
Fix query result blocks
boggle b74e09e
Clarified DIFFERENT(vars) modifier
boggle File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
271 changes: 271 additions & 0 deletions
271
cip/1.accepted/CIP2017-01-18-configurable-pattern-matching-semantics.adoc
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,271 @@ | ||
| = CIP2017-01-18 - Configurable Pattern Matching Semantics | ||
| :numbered: | ||
| :toc: | ||
| :toc-placement: macro | ||
| :source-highlighter: codemirror | ||
|
|
||
| *Author:* Stefan Plantikow <stefan.plantikow@neotechnology.com> | ||
|
|
||
| This proposal is a response to CIR-2017-174. | ||
|
|
||
| == Motivation | ||
|
|
||
| Currently Cypher uses pattern matching semantics that treats all patterns that occur in a `MATCH` clause as a unit (called a *uniqueness scope*) and only considers pattern instances that bind different relationships to each fixed length relationship pattern variable and to each element of a variable length relationship pattern variable. | ||
| This has come to be called *cypermorphism* informally and is a variation of edge isomorphism. | ||
|
|
||
| Cyphermorphism lies at the intersection of returning as many results as possible while still ruling out returning an infinite number of paths when matching graphs that contain cycles. | ||
|
|
||
| However, the notion of *uniqueness scope* has proven to be non-standard and is occasionally confusing for users and cyphermorphic matching is not tractable in terms of computational complexity for some graphs. | ||
|
|
||
| The CIP aims to address these issues. | ||
|
|
||
| == Background | ||
|
|
||
| This CIP relies on the terminology introduced by the openCypher grammar. | ||
|
|
||
| Most notably, a pattern in Cypher consists of a comma separated list of *pattern parts*. | ||
| Pattern parts may be bound to a path variable and consist of a linear chain of connected node and relationship patterns. | ||
|
|
||
| While Cypher allows omitting path, node, and relationship variables in a pattern this is just syntactic sugar, i.e. all parts of a pattern should be considered to be bound to a variable name from the viewpoint of pattern matching semantics (names are either provided in the query or automatically generated by a conforming implementation). | ||
|
|
||
| == Proposal | ||
|
|
||
| This CIP has been submitted in the belief that *CIP2017-02-06 Path Pattern Queries* will be accepted and is aligned with it. | ||
|
|
||
| === Deprecations | ||
|
|
||
| This CIP proposes to replace the notion of *uniqueness scope* and *cyphermorphism* and all associated rules by providing new, configurable pattern matching semantics for Cypher as outlined in this section. | ||
|
|
||
| This CIP proposes to deprecate support for binding relationship list variables in variable length relationship patterns. | ||
|
|
||
| This CIP proposes to deprecate the existing syntax for both `shortestPath` and `allShortestPaths` matching of Cypher. | ||
|
|
||
|
|
||
| === Basic pattern matching semantics | ||
|
|
||
| Each pattern consists of one or more top-level pattern parts that are given in a comma separated list. | ||
|
|
||
| [source=cypher] | ||
| ---- | ||
| MATCH (a)-->(b), (c)<--(d) | ||
| RETURN * | ||
| ---- | ||
|
|
||
| The solution (set of succesful matches) of a pattern is the cross product over the solutions of all it's top-level pattern parts, i.e. the above is the same as | ||
|
|
||
| [source=cypher] | ||
| ---- | ||
| MATCH (a)-->(b) | ||
| // sequence of matches acts like a cross product:name: value | ||
| // for each incoming row with a and b, find all matches (c)<--(d) | ||
| MATCH (c)<--(d) | ||
| RETURN * | ||
| ---- | ||
|
|
||
| (ignoring uniqueness). | ||
|
|
||
| Binding any two node patterns, relationship patterns, or path patterns that are contained in the same pattern are bound to the same pattern variable describes an implicit join, i.e. | ||
|
|
||
| [source=cypher] | ||
| ---- | ||
| MATCH (a)-->()<--(a)-->(b) | ||
| RETURN a | ||
| ---- | ||
|
|
||
| is semantically the same as | ||
|
|
||
| [source=cypher] | ||
| ---- | ||
| MATCH (n1)-->(n2), (n3)<--(n4), (n4)-->(b) WHERE n1 = n4 AND n2 = n3 | ||
| RETURN v1 AS a | ||
| ---- | ||
|
|
||
| === Pattern binders | ||
|
|
||
| This CIP proposes to name the path variable that occurs before a pattern element of a pattern part to *pattern binder* in the grammar. | ||
| Note that such variables are always bound to a linear sequence of node, relationship, and path patterns of its pattern element. | ||
|
|
||
| === Walks | ||
|
|
||
| This CIP introduces the following kinds of walks: | ||
|
|
||
| * `WALK`: A walk is an arbitrary, non-empty sequence of alternating nodes and relationships that starts with a node and ends with a node. | ||
| * `TRAIL`: A trail is a walk that does not contain the same relationship twice. | ||
| * `PATH`: A simple path is a trail that does not contain the same node twice unless that node is both the start node and the end node of the path. | ||
|
|
||
| Note that every `PATH` is a `TRAIL` and that every `TRAIL` is a `WALK`. | ||
|
|
||
| This CIP proposes to rename the cypher type `PATH` to `WALK`. | ||
|
|
||
| === Pattern binder class | ||
|
|
||
| This CIP proposes introducing the notion of a *pattern binder class* that may be writtern before a pattern binder in a read-only pattern (i.e. a pattern that is not used as an argument to an updating clause) and restricts the set of valid pattern matches for the following pattern element. | ||
| The proposed pattern binder classes in both singular and plural form are: | ||
|
|
||
| * `WALK` (plural: `WALKS`) This pattern binder should only be bound to a `WALK` that matches all node, relationship, and path patterns given in the following pattern element. | ||
| * `TRAIL` (plural: `TRAILS`) This pattern binder should only be bound to a `TRAIL` that matches all node, relationship, and path patterns given in the following pattern element | ||
| * `PATH` (plural: `PATHS`) This pattern binder should only be bound to a simple `PATH` that matches all node, relationship, and path patterns given in the following pattern element | ||
|
|
||
| This CIP proposes the default pattern binder class to be `WALK`. | ||
|
|
||
| The pattern binder class may be futher qualified with one of the following prefixes: | ||
|
|
||
| * `OPEN WALK[S]|TRAIL[S]|PATH[S]` This pattern binder should only be bound to walks (or trails, or paths respectively) whose start and end nodes are _not the same node_ | ||
| * `CLOSED WALK[S]|TRAIL[S]|PATH[S]` This pattern binder should only be bound to walks (or trails, or paths respectively) whose start and end nodes are _the same node_ | ||
|
|
||
| The following additional pattern binder classes are proposed to accomodate existing terminology that is commonly used in graph theory: | ||
|
|
||
| * `CIRCUIT` is a synonym for `CLOSED TRAIL` | ||
| * `CYCLE` is a synonym for `CLOSED PATH` | ||
| * `CIRCUITS` is a synonym for `CLOSED TRAILS` | ||
| * `CYCLES` is a synonym for `CLOSED PATHS` | ||
|
|
||
| Implementations are advised to signal a warning for every use of an `OPEN` pattern binder class if the two endpoints of the pattern element are both unbound and both use the same variable name. | ||
|
|
||
| Implementations are advised to signal a warning for every use of an `CLOSED` pattern binder class if the two endpoints of the pattern element are both unbound and both use a different variable name. | ||
|
|
||
| === Pattern match modes | ||
|
|
||
| This CIP proposes introducing the notion of a *pattern match mode* that may be writtern before a pattern binder in a read-only pattern (i.e. a pattern that is not used as an argument to an updating clause) and restricts the set of valid pattern matches for the following pattern element. | ||
|
|
||
| A pattern match mode is always written before any pattern binder class that has been explicitly given for the same pattern binder. | ||
|
|
||
| ==== Matching node patterns | ||
|
|
||
| A node pattern always matches all described nodes from the graph. | ||
|
|
||
| Different pattern match modes do not influence the set of matched nodes. | ||
|
|
||
| ==== MATCH ALL mode | ||
|
|
||
| This CIP proposes the new `MATCH ALL` pattern match mode that matches every walk (or trail, or path respectively) as described by all node, relationship, and path patterns given in the following pattern elements. | ||
|
|
||
| `MATCH ALL` may only be used in conjunction with a binder class in plural form (i.e. `WALKS`, `TRAILS`, `PATHS`). | ||
|
|
||
| This CIP proposes that an error should be raised for any use of `MATCH ALL` without an explicit binder class in combination with variable length relationship or path patterns. | ||
|
|
||
| Implementations are advised to signal a warning for any use of `MATCH ALL (OPEN|CLOSED) WALKS` that may return an infinite or prohibitively large result. | ||
|
|
||
| ==== MATCH ALL SHORTEST mode | ||
|
|
||
| This CIP proposes the new `MATCH ALL SHORTEST` pattern match mode that matches every _shortest_ walk (or trail, or path respectively) as described by all node, relationship, and path patterns in the following pattern elements. | ||
|
|
||
| `MATCH ALL SHORTEST` may only be used in conjunction with a binder class in plural form (i.e. `WALKS`, `TRAILS`, `PATHS`). | ||
|
|
||
| ==== MATCH SHORTEST mode | ||
|
|
||
| This CIP proposes the new `MATCH SHORTEST` pattern match mode that matches one _shortest_ walk (or trail, or path respectively) as described by all node, relationship, and path patterns in the following pattern elements. | ||
|
|
||
| `MATCH SHORTEST` may only be used in conjunction with a binder class in singular form (i.e. `WALK`, `TRAIL`, `PATH`). | ||
|
|
||
| === Default MATCH mode | ||
|
|
||
| This CIP proposes a new default pattern match mode that assigns a different pattern match mode to each type of pattern element: | ||
|
|
||
| * Simple relationship patterns (e.g. `()-[]->()`) are to be matched using `MATCH ALL` (which is identical to `MATCH ALL SHORTEST` for simple relationship patterns) | ||
| * Bounded variable length relationship patterns (e.g. `()-[*2..4]->()`) are to be matched using `MATCH ALL` | ||
| * Unbounded variable length relationship patterns (e.g. `()-[*]->()`) are to be matched using `MATCH ALL` | ||
| * Path patterns (e.g. `()-/../->()`) are to be matched using `MATCH ALL SHORTEST` | ||
|
|
||
| This CIP proposes that an error should be raised for any use of the default pattern match mode without an explicit binder class in combination with variable length relationship patterns. | ||
|
|
||
| The default pattern match mode may only be used in conjunction with a binder class in plural form (i.e. `WALKS`, `TRAILS`, `PATHS`). | ||
|
|
||
| This changes Cypher to use homomorphic matching for simple relationship patterns. | ||
|
|
||
| === Predicates and functions for working with walks | ||
|
|
||
| This CIP proposes to introduce additional predicates and functions for working with walks | ||
|
|
||
| * `isOpen(p)`: true if the start node and the end node of `p` are not the same node | ||
| * `isClosed(p)`: true if the start node and the end node of `p` are the same node | ||
| * `toTrail(p)`: `p` if `p` contains no duplicate relationships, `NULL` otherwise | ||
| * `toPath(p)`: `p` if `p` contains no duplicate relationships and either no duplicate nodes at all or the start node and the end node are the same node, `NULL` otherwise | ||
| * `toCircuit(p)`: return `toTrail(p)` if `closed(p)` is true, `NULL` otherwise | ||
| * `toCycle(p)`: returns `toPath(p)` if `closed(p)` is true, `NULL` otherwise | ||
| * `disjoint(list1, list2, ..., list_n)` is true if the lists do not share any elements | ||
|
|
||
| === Multiline patterns | ||
|
|
||
| Finally, this CIP proposes additional syntax for splitting a pattern binding accross multiple lines: | ||
|
|
||
| [source=cypher] | ||
| ---- | ||
| MATCH p=(a)-/~very_long_path_pattern/->(b)-/~another-long_path_pattern/->(c) | ||
| RETURN * | ||
| ---- | ||
|
|
||
| may be split as: | ||
|
|
||
| [source=cypher] | ||
| ---- | ||
| MATCH p=(a)-/~very_long_path_pattern/->(b) | ||
| + (b)-/~another-long_path_pattern/->(c) | ||
| RETURN * | ||
| ---- | ||
|
|
||
| This additional syntax is necessary due to the changes uniqueness scoping rules for pattern binders. | ||
| Splitting the pattern using `,` instead of the proposed `+` would have changed the result by only binding the first part of the pattern to `p`. | ||
|
|
||
| == Examples | ||
|
|
||
| The following examples demonstrates various ways in which the newly proposed constructs may be used if this CIP is adopted. | ||
|
|
||
| === Matching shortest paths | ||
|
|
||
| [source=cypher] | ||
| ---- | ||
| // MATCH p=shortestPath((a)-[:X*]->()) today becomes: | ||
| MATCH SHORTEST TRAIL p=(a)-[:X*]->() | ||
| RETURN * | ||
|
|
||
| // MATCH p=shortestPaths((a)-[:X*]->()) may be approximated using path patterns: | ||
| MATCH SHORTEST p=(a)-/:X*/->() | ||
| RETURN * | ||
|
|
||
| // MATCH p=allShortestPaths((a)-[:X*]->()) today becomes: | ||
| MATCH ALL SHORTEST TRAILS p=(a)-[:X*]->() | ||
| RETURN * | ||
|
|
||
| // MATCH p=allShortestPaths((a)-[:X*]->()) today may be approximated using path patterns: | ||
| MATCH p=(a)-/:X*/->() | ||
| RETURN * | ||
| ---- | ||
|
|
||
| === Matching with existing semantics | ||
|
|
||
| `disjoint` may be used to precisely express Cypher's current pattern matching semantics. | ||
|
|
||
| [source=cypher] | ||
| ---- | ||
| // Today (using same uniqueness scope for pat1, pat2, and pat) | ||
| MATCH pat1=..., pat2=..., pat3=... | ||
|
|
||
| // This CIP | ||
| MATCH TRAILS pat1=... | ||
| MATCH TRAILS pat2=... | ||
| MATCH TRAILS pat3=... | ||
| WHERE disjoint(rels(pat1), rels(pat2), rels(pat3)) | ||
| ---- | ||
|
|
||
| == Pre-parser options | ||
|
|
||
| It is suggested that a conforming implementation should provide pre-parser options for defining the default pattern binder class as well as the default pattern match mode: | ||
|
|
||
| for each pattern match mode as well as the default pattern match mode for each class of pattern parts: | ||
|
|
||
| * `binder-class=walk[s]|trail[s]|path[s]` for configuring a different default pattern binder class | ||
| * `match-mode=all|all-shortest|shortest` for configuring a different default pattern match mode | ||
|
|
||
| == Benefits to this proposal | ||
|
|
||
| This proposal adds a facility to Cypher for selecting from multiple desirable pattern matching semantics. | ||
|
|
||
| == Caveats to this proposal | ||
|
|
||
| A moderate increase in language complexity. | ||
|
|
||
| A substantial departure from current pattern matching semantics. | ||
| However, care has been taken to retain access to current semantics. | ||
|
|
||
| `MATCH ALL [OPEN|CLOSED] WALKS` allows for non-terminating queries. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought these two were synonymous; what is the variation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'Academic' edge isomorphism only talks about a single, connected candidate walk while cyphermorphism considers all relationships bound by any pattern in the same match (even relationships bound by different, disconnected walks) for uniqueness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha! Thanks for the clarification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this is a difference of "morphism". If one followed strict isomorphism ("path isomorphism" in Walks, Trails, Paths terms, no repeated vertices, and therefore also no repeated edges), then Cypher's current "pattern gluing" rules would apply (unless we change those rules), and we would end up evaluating matches against the compound, glued pattern, but using isomorphic semantics. Gluing may be syntactic salt, but is orthogonal to "morphism". Cyphermorphism, in my view, is no different to "Trail morphism", or "edge isomorphism".