Intro to Graph Databases Episode #6 – Continuing with Cypher


Hey, everyone. Welcome back to the Intro To
Graph Databases series. My name is Ryan Boyd, and this is episode six. Today, we are going
to be covering more about Cypher. We’re going to be discussing how you do filtering, aggregates,
and also cover some additional syntax that is helpful to know as you’re writing your
Cypher queries. Let’s first give a recap of episode five.
In episode five, we learned that Cypher is just ASCII art patterns. And these are patterns
that are drawn with nodes, which are surrounded by parentheses and also drawn with relationships
as part of the patterns. Now, your patterns that you’re drawing out are ASCII art because
you’re indicating the directionality of the relationships using greater than and less
than signs. Optionally, with Neo4j, you could actually query without regard to direction,
and that’s the first example here. Now, the components of a Cypher query were also covered
in the last episode. So you can see here, we have our different Cypher keywords, match
and return. We have a variable called path and we have a node label and a relationship.
So in this case, we’re looking for people who acted in movies and return both the nodes
and the relationships that match that by returning the path. Now, a little bit more on syntax. You may
have noticed in some of the tutorials, or guides, or just your browsing around online
that you can get back both graphs as well as tabular results. And you may be wondering,
“How do I control what I get back?” Well, here is an example that was executed in the
Neo4j Browser. On the left-hand side here, we are returning the person nodes, the relationships,
whether the relationship is an acted in or a directed relationship, as well as the movies.
So in that case, we are going to get a graph returned, whereas, alternatively, on the right-hand
side, we are returning specific properties by name. So the name of the actor, for instance,
or the name of the title. And you can see a little bit thrown in here indicating the
name of the relationship type that connects those two. So in this case, we’re returning
very specific properties, and that’s when we yield a tabular result. Properties are
accessed using the syntax you just saw, where we specify the variable as well as the property
key. Now, you may have heard me in the last episode referring to variables actually as
aliases. I use the terms interchangeably, but the official term is variables, so I just
wanted to correct myself on that. And variables are used to refer to relationships that you’re
defining as your patterns or in your queries. And variables are also used to refer to nodes.
And you can see bolded here, or highlighted here, are the variables used for each. Case sensitivity is an important topic. Because
Cypher is a schema-optional query language, taking into account the case sensitivity is
important, especially as you’re executing your queries for the first time and you’re
not sure what quite to expect from your results. So the things that are case-sensitive are
the node labels, the relationship types, and the property keys. You can follow the Cypher
style guide, which I’ve linked here. And the Cypher style guide is written by the folks
on the openCypher project, and covers the preferred way of casing different parts of
your Cypher syntax. Now, we also do, though, have case-insensitive parts, and those are
the Cypher keywords. So here are some examples of each of these. So you can see person, acted
in, name are all case-sensitive, and you can see the appropriate, preferred Cypher style
here. And then case-insensitive are the Cypher keywords, such as match and return. And we
tend to capitalize those in our statements, but they are case-insensitive. Now, after watching the last episode of this
series, episode five, one of my colleagues had a tip that he wanted to share with you, and
that was a tip around setting properties. So in episode five, we were talking about
a person driving a car, so Ann and Dan and their relationship, and in this case here,
I was trying to find a person who drives the car, and I was trying to set two properties.
And so I set them by saying c.brand, where C is the variable, equals Volvo, and c.model
equals V70. Alternatively, I could have used a object to set these properties. So you can
see here on the right-hand side, we actually say set C plus equals to, and then an object
of the various properties. This can be easiest to do when you are using certain drivers in
programming languages that handle objects really well. So let’s get into some aggregate functions
as your next step on your journey. If you are already a SQL developer and understand
SQL, you might want to understand the difference between how Cypher handles aggregates and
how SQL handles aggregates. And the main difference is that in Cypher, you do not need to specify
a grouping key. The grouping key in Cypher is implicitly specified by any non-aggregate
fields in the return statement. So in this case here, we’re trying to group by the name,
and we are doing an aggregate counting the number of movies that that particular actor
acted in. And this is what the return looks like. It actually pages through several more
pages here, but the return is basically giving us each of the names of the actors in the
database as well as the number of movies that they’ve acted in. Now, there are a lot of
other aggregating functions that are available in Cypher and in Neo4j. Those aggregating
functions are things like additional ways of counting, collections, sums, and some basic
statistics functions. You can see the Cypher ref card with a list of these aggregating
function at this URL. Now, I also just want to mention the Cypher ref card in general
is a fantastic resource as you’re starting to learn Cypher, and it gives you all of the
different types of functions, and operators, and things like that that are available within
Cypher. So check that out. Highly encourage you to accelerate your learning path. Now, Neo4j is pretty extensible. And we’re
not going to cover all the ways that Neo4j is extensible in this particular episode,
but I do want to say that you can write your own aggregating functions or aggregation functions.
And the user-defined function documentation is part of our core reference docs at neo4j.com/docs,
so I’d encourage you to check that out. And there’s a explicit section along with an example
of how you might write your own aggregation function, in this case, in order to compute
the longest string in a set. Now, before you go off and write your own aggregation functions,
you might want to take a look at what open source has already been written for Neo4j
developers to do aggregation. So for that, you can check out one of the most popular
open source projects used by Neo4j developer community, the APOC library. The APOC library
is a library of 400 or 500 different user-defined procedures, and functions, and aggregation
functions. But since we’re talking about aggregation functions here, you can see a list of some
of the types of aggregations that it allows you to do, including doing medians and percentiles,
and slicing up the results, and that sort of thing. So check out APOC at this URL. The
APOC library is super, super useful. And if you don’t find it in the APOC library, someone
might have written it and it just hasn’t been released yet. So check that out. Check out
the source. But you can also contribute yourself to the APOC library as you get more advanced. Now, an important part of Cypher querying
is specifying the where clause. And the where clause is used for filtering results. Now,
you’ve seen in some of the past examples in this episode and the previous how we filtered
the results for a particular match. In this case, we want to find a movie with the title
of Matrix, and we’re using that JSON-like syntax in order to filter the results. Well,
we can also use the where clause and specify the query this way. So find a movie where
m.title equals The Matrix. Now both of these represent the same thing. The way that we
interpret the queries is actually convert the top to the bottom, but it’s just easier
at times to use that JSON-like syntax. But we’re going to be talking about more things,
types of comparisons, that you can do in the where clause. So you can see here a basic
string comparison, and you can also do things like numeric comparisons here. So in this
case here, we’re looking for a person who acted in a movie, and we’re looking for where
that movie’s released date is greater than or equal to the year 2000. And you saw us
use the greater than or equal to operator. All sorts of other operators are there as
you’d expect, including less than, or not equal to, equal to, etc., and some special
operators that we’ll cover a little bit more later on checking whether something is null
or is not null. So null is not null. That’s probably a pretty
confusing statement, but null represents missing or undefined values in Cypher. So when you
store a value on a property, or store a value for a property on a node, you never actually
store the value null. You simply do not specify that property on that node, and that’s okay.
That’s the way that Cypher as a schema-optional language works. The nodes that have the properties
have them defined. The nodes that do not have those properties do not have them defined.
So this basically prevents the very sparse type of storage that you might see in a table
where a lot of the values may be null. In the case of Neo4j, the nodes don’t store that
there’s a null value for a particular property. It actually just doesn’t have that property.
So null is not null. Null represents missing or undefined values. So in the case of comparing
null to null, you’re actually going to not get a true response. And why is that? Well,
you’re going to actually get a null response. And that’s because we don’t know. The values
are missing. So we don’t know if they’re equivalent to each other. Now, there’s another special comparison operator
available in Cypher, the equal tilde operator. And the equal tilde operator actually allows
us to do regular expression matching. So in this case, we’re looking for a person who
acted in a movie where that person’s name starts with a K and then is followed by zero
or one other characters. And so this is an important way to interact with data that is
stored as strings in Cypher, is looking at that as regular expressions. Now, you may
have noticed here that I also used some Boolean logic trying to compare the name based off
of that regular expression, but also comparing the released date, and returning those movies
that match one or the other of those two criteria. There are additional Boolean operators that
you can use such as and, xor, and not, and you can group your Boolean operators in a
more complex statement. So you’re seeing here, we’re grouping with parentheses the released
equals 1997, and the title is As Good As It Gets. And that’s grouped with parentheses
so that we apply the appropriate Boolean logic. So that’s all we’re going to be covering here
today in this episode. We’re going to be doing more episodes in the future with other filtering
techniques for Cypher, some more things around indexing, how to use lists and other features
of the Cypher query language. So look forward to the next episode and really hope that you’re
enjoying this series thus far. Thank you very much.

Danny Hutson

Leave a Reply

Your email address will not be published. Required fields are marked *