Hierarchical Queries: Databases for Developers #16

Hierarchical Queries: Databases for Developers #16


Trees. They’re everywhere in nature. But
they’re also common in your data. Such as your company’s hierarchy or your family
tree. But when you store these hierarchies in your database it can be hard
to tell where each row fits in the tree. Luckily all you need is a bit of SQL magic to turn your rows into a tree! It’s your dear old grandma’s
birthday coming up. To celebrate you want to draw out a family tree for her showing
all her children; your parents, aunts, and uncles; their children you, your siblings,
and cousins; their children and so on. To help with this you’ve got details of all
your family members stored in a database table. But, run a standard query against
this and it’s not obvious which generation each person belongs to and
who their parents are. You can get around this by joining the table to itself. Over
and over and over again for each new generation. But this is a paint to write.
And, if you want to redraw your family tree starting with your
great-great-great great-great great great grandma, you’ve got to
rewrite your SQL query. You can overcome these problems using
hierarchical operators. But before we dive into the SQL, some terminology. In the real world a tree starts with roots and grows out to its leaves. A data tree
is the same. The starting points are the roots, the ends the leaves. Each point in
between is a node. And, like a family tree, the direct ancestor of a node is its
parent and it’s direct descendants its children. A root is a node with no parent
and a leaf is one with no children. Back to the SQL! To write your query you
need to know how to identify which rows in the roots and which columns contain
the relationships between parents and children. Your grandma is the start of
the tree. So she’s the root. And the relationship between parents and
children is just like in real life; the mother or father of each person is the
parent of that child. Armed with this information you can now start to write
your SQL query in Oracle Database. But first you need to choose between the
proprietary connect my clause. Or the ANSI compatible recursive with. In both cases you want to begin with the root, your grandma. With connect by select her in
the start with clause. Using with you’ll have a normal select to get this row. As
in real life each person’s parent is their mother or father from the previous
generation. So you need to check these columns against the family member from
the previous level. Using Oracle syntax you can access this with a keyword PRIOR. So in the connect by clause you need to check in the prior person matches the
current mother or father. Recursive with is more complicated. You need to union
all the subquery with the initial query to fetch your granny. Then join this
back to the original family table where the person from the subquery matches
the mother or father. At this point both methods have built your tree. But the results look like you’ve just selected from the table. It’s not obvious which
rows are the children, grandchildren, or great-grandchildren. To show this you
need to add in which generation each person belongs to and indent the rows
appropriately. With connect by this is easy the pseudo-column level shows which
position you are in the hierarchy. This starts at 1 and then increments for each
new set of children in the tree. So granny is at level 1.
Your parents, aunts, and uncles level 2. You, your siblings, and cousins are level
3, and so on. You can then use this to pad spaces before the names so you can
easily see who is in each generation Recursive with doesn’t have a level
function built in. But this is easy enough to do manually. Just select one as the level in the base query. Then, in the recursive part, increment the level and add spaces before the names. Now it’s easy to see which generation each person belongs to. But the order may be a bit jumbled. Usually you want to sort siblings from
oldest to youngest and show all the person’s children next to them in the
tree. And it’s here that recursive with as a
big advantage over connect by. You can specify how you want to walk down a tree.
Depth-first or breadth-first. With depth-first search you start at the root
then keep going all the way down until you hit a leaf. You then go back up until
you find a node with a child you’ve not visited before. Follow that down until
you find the next leaf, and so on. So you’re always going down as far as you
can. On the other hand breadth-first search visits all nodes at the same
level before going down to the next. So it starts with your granny. Then gets all
her children; your parents, aunts, and uncles. Then all their children and so on. This search type can affect how quickly the SQL runs or even how the query
works. But for now it allows you to specify how to sort your results. To show
eldest children first, immediately followed by their children you want
depth-first search ordered by birth date. Doing this using recursive with this is
easy as sticking this after the subquery. You also need to define a column
showing which order the database visits the rows. This goes in the set clause.
To guarantee you display the rows in this sequence place this column in your
order by. Sadly connect by doesn’t have an option to specify how you want to
search the tree. It displays rows in the order it accesses them. Adding an order
by destroys this sequence. Luckily order by has a siblings clause
which allows you to specify which order to show rows in the same generation. So
to get the same effect order siblings by birth date. So now you built your family
tree showing all your granny’s blood descendants. But real family trees
usually include both parents of every child. Trying to do this adds a lot of
complexity to your SQL. For example, if you try and add grandpa in as a root you
build your whole family tree twice, duplicating everyone else in the family. This is because in data terms each node in a tree has at most one parent. But, of
course, in real life everyone has two parents. So a family tree isn’t a real
tree. It’s a directed acyclic graph! Bear this in mind if you’re trying to draw
out family trees using SQL. You’re best off following the descendants down
one gender, for example the mothers, then joining the fathers in as appropriate. So now we built your family tree. But there’s still one important aspect of
trees we need to cover off: Loops! Unless we discover time travel it’s impossible
to be your own ancestor. So you can’t have a loop in a family tree. But they
are possible in other data structures. If a loop does exist in your hierarchy, as
you walk along it you’ll eventually get stuck in an infinite loop.
Luckily the database can detect these and stop the query. Unluckily it will
error out when it does, leaving you with no results. You can avoid this problem
using cycle detection. With connect by just add the nocycle keyword after
connect by and Oracle Database will bail out if it comes back to the same row.
Doing the same using ANSI-style is more typing, but also as more power! For example say you’ve arranged your bricks in a tree with largest at the top and
smallest at the bottom. But you’ve accidentally pointed one of the leaves
back to another branch in the tree. There’s no loops. But this node has two
parents. So as you walk down the tree you’ll reach it twice, meaning you’ll get
two copies of all these bricks in your results. Something you want to avoid. You
know bricks gets smaller as you go down the tree. So if you’ve seen a brick of
the same size you’ve seen on the path so far you’ve got a problem. The cycle
clause allows you to detect this and stop processing. To use the cycle clause
lists the columns you want to check for duplicates. Then define a new column with
the value to set when the database sees a repeated value and, a default if it doesn’t. So that was a quick overview of how to build hierarchies using SQL. There are also a host of functions you can use with
connect by to find things like which rows are the leaves, or the ancestors of a
row, and so on. Check the links in the description for more details on these
and how to emulate these functions using recursive with. Just remember: when
building trees watch out for loops and make sure you define the roots. Otherwise
you’ll build a new tree for every row in you table! Thanks for watching! If you enjoyed this and want to watch more SQL videos click the links on screen
now

Danny Hutson

4 thoughts on “Hierarchical Queries: Databases for Developers #16

  1. won't UNION all produce or reveal duplication? Also at 5:47 wouldn't FROM Your_Family have to have the yf in front of it such that you could then use yf.family_member? I see you have Your_Family yf at the bottom when you are joining but doesn't order matter?

Leave a Reply

Your email address will not be published. Required fields are marked *