Skip to content
Open
Changes from 5 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
8b6513b
Add proposal for isomorphic pattern matching in response to #174
boggle Jan 25, 2017
ed1a824
Support walks, trails, paths; shortest and cheapest paths matching
boggle Jul 16, 2017
6117618
Rework CIP
boggle Jul 16, 2017
5742fdf
Simplified set of match modes and default handling
boggle Jul 19, 2017
13f3711
Fix example
boggle Jul 19, 2017
fe68d48
Small fixups
boggle Jul 23, 2017
856ae1b
Removed duplicate line
boggle Jul 23, 2017
dd397a5
Restructure CIP and improve language
Mats-SX Jul 24, 2017
ce78377
Rename pattern binder to pattern variable
Mats-SX Jul 24, 2017
cd54871
Remove section on multiline patterns
Mats-SX Jul 24, 2017
9386b70
Remove section on pre-parser options
Mats-SX Jul 24, 2017
048ac32
Move functions to appendix
Mats-SX Jul 24, 2017
00dc3c1
Rename Walks section to walks, trails, paths
Mats-SX Jul 24, 2017
91622f5
Morphism CIP: add definitions of WTP
Jul 24, 2017
db32e46
Structural improvements
Jul 24, 2017
7cccfff
Tie pattern variable classes together with formal definitions
Jul 24, 2017
8f2982d
Tidy up referencing
Jul 24, 2017
254b890
Added plural forms to grammar
boggle Jul 24, 2017
d6384c2
Mandate correct cardinality warnings consistently
boggle Jul 24, 2017
b789402
Updated grammar to include all proposed syntactic forms
boggle Jul 24, 2017
e908196
Update grammar to allow match modes etc without giving a variable name
boggle Jul 24, 2017
dd74857
Reinstated pre-parser suggestions
Jul 25, 2017
f98570c
Add exemplar data graph: used for examples in CIPs
Jul 25, 2017
fce68fa
Clarified relationship of WTP to *morphisms
boggle Jul 26, 2017
40713bb
Add expansive example using data graph
Jul 26, 2017
50e7ac3
Added a detailed example; fix-ups; language changes
Jul 26, 2017
5a8436d
Formatted query headings so they are more readable in default GH view
Jul 26, 2017
08afc88
Added TOC
Jul 26, 2017
efb9fee
Removed Cycles and Circuits for now
boggle Jul 26, 2017
1c39c76
Change to minimize breaking existing queries
boggle Jul 26, 2017
f8c45b4
Update examples to reflect latest changes
boggle Jul 26, 2017
c6b01b5
Textual tidy-ups
Jul 27, 2017
ac72e91
Updated default rules + Moved disjoint back into proposal
boggle Jul 27, 2017
c139130
Reformatted title
Jan 17, 2018
8cb74ae
Updated to reflect recent discussions
boggle Mar 26, 2018
ad333a0
Merge remote-tracking branch 'me/isomatch' into isomatch
boggle Mar 26, 2018
3b4255f
Fix query result blocks
boggle Mar 26, 2018
b74e09e
Clarified DIFFERENT(vars) modifier
boggle Mar 26, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,271 @@
= CIP2017-01-18 - Configurable Pattern Matching Semantics
:numbered:
:toc:
:toc-placement: macro
:source-highlighter: codemirror

*Author:* Stefan Plantikow <stefan.plantikow@neotechnology.com>

This proposal is a response to CIR-2017-174.

== Motivation

Currently Cypher uses pattern matching semantics that treats all patterns that occur in a `MATCH` clause as a unit (called a *uniqueness scope*) and only considers pattern instances that bind different relationships to each fixed length relationship pattern variable and to each element of a variable length relationship pattern variable.
This has come to be called *cypermorphism* informally and is a variation of edge isomorphism.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought these two were synonymous; what is the variation?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'Academic' edge isomorphism only talks about a single, connected candidate walk while cyphermorphism considers all relationships bound by any pattern in the same match (even relationships bound by different, disconnected walks) for uniqueness.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha! Thanks for the clarification.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this is a difference of "morphism". If one followed strict isomorphism ("path isomorphism" in Walks, Trails, Paths terms, no repeated vertices, and therefore also no repeated edges), then Cypher's current "pattern gluing" rules would apply (unless we change those rules), and we would end up evaluating matches against the compound, glued pattern, but using isomorphic semantics. Gluing may be syntactic salt, but is orthogonal to "morphism". Cyphermorphism, in my view, is no different to "Trail morphism", or "edge isomorphism".


Cyphermorphism lies at the intersection of returning as many results as possible while still ruling out returning an infinite number of paths when matching graphs that contain cycles.

However, the notion of *uniqueness scope* has proven to be non-standard and is occasionally confusing for users and cyphermorphic matching is not tractable in terms of computational complexity for some graphs.

The CIP aims to address these issues.

== Background

This CIP relies on the terminology introduced by the openCypher grammar.

Most notably, a pattern in Cypher consists of a comma separated list of *pattern parts*.
Pattern parts may be bound to a path variable and consist of a linear chain of connected node and relationship patterns.

While Cypher allows omitting path, node, and relationship variables in a pattern this is just syntactic sugar, i.e. all parts of a pattern should be considered to be bound to a variable name from the viewpoint of pattern matching semantics (names are either provided in the query or automatically generated by a conforming implementation).

== Proposal

This CIP has been submitted in the belief that *CIP2017-02-06 Path Pattern Queries* will be accepted and is aligned with it.

=== Deprecations

This CIP proposes to replace the notion of *uniqueness scope* and *cyphermorphism* and all associated rules by providing new, configurable pattern matching semantics for Cypher as outlined in this section.

This CIP proposes to deprecate support for binding relationship list variables in variable length relationship patterns.

This CIP proposes to deprecate the existing syntax for both `shortestPath` and `allShortestPaths` matching of Cypher.


=== Basic pattern matching semantics

Each pattern consists of one or more top-level pattern parts that are given in a comma separated list.

[source=cypher]
----
MATCH (a)-->(b), (c)<--(d)
RETURN *
----

The solution (set of succesful matches) of a pattern is the cross product over the solutions of all it's top-level pattern parts, i.e. the above is the same as

[source=cypher]
----
MATCH (a)-->(b)
// sequence of matches acts like a cross product:name: value
// for each incoming row with a and b, find all matches (c)<--(d)
MATCH (c)<--(d)
RETURN *
----

(ignoring uniqueness).

Binding any two node patterns, relationship patterns, or path patterns that are contained in the same pattern are bound to the same pattern variable describes an implicit join, i.e.

[source=cypher]
----
MATCH (a)-->()<--(a)-->(b)
RETURN a
----

is semantically the same as

[source=cypher]
----
MATCH (n1)-->(n2), (n3)<--(n4), (n4)-->(b) WHERE n1 = n4 AND n2 = n3
RETURN v1 AS a
----

=== Pattern binders

This CIP proposes to name the path variable that occurs before a pattern element of a pattern part to *pattern binder* in the grammar.
Note that such variables are always bound to a linear sequence of node, relationship, and path patterns of its pattern element.

=== Walks

This CIP introduces the following kinds of walks:

* `WALK`: A walk is an arbitrary, non-empty sequence of alternating nodes and relationships that starts with a node and ends with a node.
* `TRAIL`: A trail is a walk that does not contain the same relationship twice.
* `PATH`: A simple path is a trail that does not contain the same node twice unless that node is both the start node and the end node of the path.

Note that every `PATH` is a `TRAIL` and that every `TRAIL` is a `WALK`.

This CIP proposes to rename the cypher type `PATH` to `WALK`.

=== Pattern binder class

This CIP proposes introducing the notion of a *pattern binder class* that may be writtern before a pattern binder in a read-only pattern (i.e. a pattern that is not used as an argument to an updating clause) and restricts the set of valid pattern matches for the following pattern element.
The proposed pattern binder classes in both singular and plural form are:

* `WALK` (plural: `WALKS`) This pattern binder should only be bound to a `WALK` that matches all node, relationship, and path patterns given in the following pattern element.
* `TRAIL` (plural: `TRAILS`) This pattern binder should only be bound to a `TRAIL` that matches all node, relationship, and path patterns given in the following pattern element
* `PATH` (plural: `PATHS`) This pattern binder should only be bound to a simple `PATH` that matches all node, relationship, and path patterns given in the following pattern element

This CIP proposes the default pattern binder class to be `WALK`.

The pattern binder class may be futher qualified with one of the following prefixes:

* `OPEN WALK[S]|TRAIL[S]|PATH[S]` This pattern binder should only be bound to walks (or trails, or paths respectively) whose start and end nodes are _not the same node_
* `CLOSED WALK[S]|TRAIL[S]|PATH[S]` This pattern binder should only be bound to walks (or trails, or paths respectively) whose start and end nodes are _the same node_

The following additional pattern binder classes are proposed to accomodate existing terminology that is commonly used in graph theory:

* `CIRCUIT` is a synonym for `CLOSED TRAIL`
* `CYCLE` is a synonym for `CLOSED PATH`
* `CIRCUITS` is a synonym for `CLOSED TRAILS`
* `CYCLES` is a synonym for `CLOSED PATHS`

Implementations are advised to signal a warning for every use of an `OPEN` pattern binder class if the two endpoints of the pattern element are both unbound and both use the same variable name.

Implementations are advised to signal a warning for every use of an `CLOSED` pattern binder class if the two endpoints of the pattern element are both unbound and both use a different variable name.

=== Pattern match modes

This CIP proposes introducing the notion of a *pattern match mode* that may be writtern before a pattern binder in a read-only pattern (i.e. a pattern that is not used as an argument to an updating clause) and restricts the set of valid pattern matches for the following pattern element.

A pattern match mode is always written before any pattern binder class that has been explicitly given for the same pattern binder.

==== Matching node patterns

A node pattern always matches all described nodes from the graph.

Different pattern match modes do not influence the set of matched nodes.

==== MATCH ALL mode

This CIP proposes the new `MATCH ALL` pattern match mode that matches every walk (or trail, or path respectively) as described by all node, relationship, and path patterns given in the following pattern elements.

`MATCH ALL` may only be used in conjunction with a binder class in plural form (i.e. `WALKS`, `TRAILS`, `PATHS`).

This CIP proposes that an error should be raised for any use of `MATCH ALL` without an explicit binder class in combination with variable length relationship or path patterns.

Implementations are advised to signal a warning for any use of `MATCH ALL (OPEN|CLOSED) WALKS` that may return an infinite or prohibitively large result.

==== MATCH ALL SHORTEST mode

This CIP proposes the new `MATCH ALL SHORTEST` pattern match mode that matches every _shortest_ walk (or trail, or path respectively) as described by all node, relationship, and path patterns in the following pattern elements.

`MATCH ALL SHORTEST` may only be used in conjunction with a binder class in plural form (i.e. `WALKS`, `TRAILS`, `PATHS`).

==== MATCH SHORTEST mode

This CIP proposes the new `MATCH SHORTEST` pattern match mode that matches one _shortest_ walk (or trail, or path respectively) as described by all node, relationship, and path patterns in the following pattern elements.

`MATCH SHORTEST` may only be used in conjunction with a binder class in singular form (i.e. `WALK`, `TRAIL`, `PATH`).

=== Default MATCH mode

This CIP proposes a new default pattern match mode that assigns a different pattern match mode to each type of pattern element:

* Simple relationship patterns (e.g. `()-[]->()`) are to be matched using `MATCH ALL` (which is identical to `MATCH ALL SHORTEST` for simple relationship patterns)
* Bounded variable length relationship patterns (e.g. `()-[*2..4]->()`) are to be matched using `MATCH ALL`
* Unbounded variable length relationship patterns (e.g. `()-[*]->()`) are to be matched using `MATCH ALL`
* Path patterns (e.g. `()-/../->()`) are to be matched using `MATCH ALL SHORTEST`

This CIP proposes that an error should be raised for any use of the default pattern match mode without an explicit binder class in combination with variable length relationship patterns.

The default pattern match mode may only be used in conjunction with a binder class in plural form (i.e. `WALKS`, `TRAILS`, `PATHS`).

This changes Cypher to use homomorphic matching for simple relationship patterns.

=== Predicates and functions for working with walks

This CIP proposes to introduce additional predicates and functions for working with walks

* `isOpen(p)`: true if the start node and the end node of `p` are not the same node
* `isClosed(p)`: true if the start node and the end node of `p` are the same node
* `toTrail(p)`: `p` if `p` contains no duplicate relationships, `NULL` otherwise
* `toPath(p)`: `p` if `p` contains no duplicate relationships and either no duplicate nodes at all or the start node and the end node are the same node, `NULL` otherwise
* `toCircuit(p)`: return `toTrail(p)` if `closed(p)` is true, `NULL` otherwise
* `toCycle(p)`: returns `toPath(p)` if `closed(p)` is true, `NULL` otherwise
* `disjoint(list1, list2, ..., list_n)` is true if the lists do not share any elements

=== Multiline patterns

Finally, this CIP proposes additional syntax for splitting a pattern binding accross multiple lines:

[source=cypher]
----
MATCH p=(a)-/~very_long_path_pattern/->(b)-/~another-long_path_pattern/->(c)
RETURN *
----

may be split as:

[source=cypher]
----
MATCH p=(a)-/~very_long_path_pattern/->(b)
+ (b)-/~another-long_path_pattern/->(c)
RETURN *
----

This additional syntax is necessary due to the changes uniqueness scoping rules for pattern binders.
Splitting the pattern using `,` instead of the proposed `+` would have changed the result by only binding the first part of the pattern to `p`.

== Examples

The following examples demonstrates various ways in which the newly proposed constructs may be used if this CIP is adopted.

=== Matching shortest paths

[source=cypher]
----
// MATCH p=shortestPath((a)-[:X*]->()) today becomes:
MATCH SHORTEST TRAIL p=(a)-[:X*]->()
RETURN *

// MATCH p=shortestPaths((a)-[:X*]->()) may be approximated using path patterns:
MATCH SHORTEST p=(a)-/:X*/->()
RETURN *

// MATCH p=allShortestPaths((a)-[:X*]->()) today becomes:
MATCH ALL SHORTEST TRAILS p=(a)-[:X*]->()
RETURN *

// MATCH p=allShortestPaths((a)-[:X*]->()) today may be approximated using path patterns:
MATCH p=(a)-/:X*/->()
RETURN *
----

=== Matching with existing semantics

`disjoint` may be used to precisely express Cypher's current pattern matching semantics.

[source=cypher]
----
// Today (using same uniqueness scope for pat1, pat2, and pat)
MATCH pat1=..., pat2=..., pat3=...

// This CIP
MATCH TRAILS pat1=...
MATCH TRAILS pat2=...
MATCH TRAILS pat3=...
WHERE disjoint(rels(pat1), rels(pat2), rels(pat3))
----

== Pre-parser options

It is suggested that a conforming implementation should provide pre-parser options for defining the default pattern binder class as well as the default pattern match mode:

for each pattern match mode as well as the default pattern match mode for each class of pattern parts:

* `binder-class=walk[s]|trail[s]|path[s]` for configuring a different default pattern binder class
* `match-mode=all|all-shortest|shortest` for configuring a different default pattern match mode

== Benefits to this proposal

This proposal adds a facility to Cypher for selecting from multiple desirable pattern matching semantics.

== Caveats to this proposal

A moderate increase in language complexity.

A substantial departure from current pattern matching semantics.
However, care has been taken to retain access to current semantics.

`MATCH ALL [OPEN|CLOSED] WALKS` allows for non-terminating queries.