Build a Small Knowledge Graph Part 1 of 3: Creating and Processing Linked Data

Build a Small Knowledge Graph Part 1 of 3: Creating and Processing Linked Data


Hi, I’m Jarek Wilkiewicz,
and I’m a developer advocate. I was born in Poland. And like many other
engineers, I grew up reading science fiction. In fact, this is
my favorite author. My childhood home
was actually very close to his house in Krakow. And little did I know
that his work would have major influence on my life. Did you red science fiction too? Do you remember the moment
when the protagonist, while busy saving the
world, asks her computer for something? Computer, make it so. And the machine just
does it, and it nails it. Now, why can’t the real
world be more like this? For example, when a
user says, OK, Google, and asks for something,
the applicational service most suitable to
fulfill the request should be involved and get
the job done just like that. Today I want to talk to
you about technology-based on Schema.org
Actions, which I think is a small step
in that direction. With Actions, we
have an opportunity to delight your users and bring
more engagement to your app. However, we can’t do it alone. We need your help. At Google, we have been working
on organizing the world’s information to
make it universally accessible and useful. To help with that we’ve
built the Knowledge Graph. The Knowledge Graph
contains information about entities and
their relationships. One of the interesting
applications of the Knowledge Graph is resolving
ambiguities when processing language queries. For example, the
artist Dual Core is a band, which is different
than the concept of dual core. As far as the strings are
concerned, these two are equal. But to a user who is asking
their phone to play Dual Core, the difference is quite clear. Dual Core the band is assigned
a machine ID in the graph. We like to refer to these types
of object-based identifiers as things, not strings. Having a graph entity
with a machine ID, or MID, makes a big difference. The Knowledge Graph can also
help satisfy user requests with your application. I thought a good way to show you
how this works is to actually build a small Knowledge
Graph together with you. But first, let me introduce
you to my buddy Shawn. This is Shawn. We both like music. Shawn likes to DJ and spin
dance LPs in his free time. One day, we would love
to open a record store. We would curate an
awesome selection of LPs. And on weekends, we would
invite our favorite artists to give concerts at our store. Now, wouldn’t that be great? Our music selection
would of course be available for
purchases online. We would offer web and mobile
streaming access as well. Since today most
music recommendations are provided by
computers, we would want to make it
easy for computers to discover and understand
what we have to offer. For example, when
they user says, OK, Google, play
Dual Core, we would like our out to be ready
to fulfill users request. All right. So we have this music store. We are selling LPs,
offering streaming access, and hosting events on weekends. How do we express
this information in the machine readable way? And how would it be used? Let’s look at a simple
architecture diagram. The flow of information
is illustrated by the direction of the arrows. As you can see, the
crawler fetches data from our music website and
saves it in the Knowledge Graph. The graph is then consulted
when the user issues a voice command or a web search. Sean and I publish
this information about our music artists and
albums on our store website. Now, how do we do that? Rather than inventing a
new publishing format, we will apply the
lined data principles. The term “linked data” refers
to a set of best practices for publishing and connecting
structured data on the web. We’ll use Schema.org vocabulary
and JSON-LD serialization to describe the entities
in our music store. If you haven’t heard about
JSON-LD or Schema.org, don’t worry. Click the links on the
screen or follow along. It’s pretty straight forward. For example, here’s what an
artist looks like in JSON-LD. The JSON-LD markup is
embedded in an artist web page on our site. Now, take a look at
the sameAs property. This property will
make it easier to reconcile a match– the
JSON-LD document describing an artist with a corresponding
entity in our Knowledge Graph. In the spirit of linked
data, the sameAs property links the artist with
information on another site. This stuff is pretty
important, but we’ll show you more about how
that’s used in the next video. As I mentioned earlier,
we built a mobile app so our customers can
enjoy music on the go. How do we express the fact
that our music artist we carry can be listened
to using our app? Remember, I want our
mobile app to be triggered by the OK Google voice command. Fortunately, we can also
use Schema.org and JSON-LD for that. A recent addition to the
Schema.org vocabulary makes it possible
through something called potential Actions. Here’s an example. Note that in our
markup, we are making an association between an artist
and the mobile application we have built. Let’s recap. Shawn and I described our store
using JSON-LD and Schema.org, so now this information
is machine readable. We have also associated
our music app with artists that it can handle. Does this mean that
our music app will now be triggered when
the user asks Google to listen to the music
we have marked up? Not quite. First we need to test. There are several things
that can go wrong. We want to verify. The crawler has to
be able to parse the JSON-LD markup
we have published. Artists described in
our JSON-LD documents have to successfully
reconcile or match with entities in
our Knowledge Graph. Our mobile app has to
handle specific intents so that the music starts playing
when the app is triggered. Quite a few things
to test, isn’t it? Don’t worry. To help you with
testing all this, we have provided sample
open source implementations of the components you see
highlighted in this diagram– the website, the mobile
app, the crawler, and the small
Knowledge Graph running in the graph database
called Cayley. We have also built a
mobile and web simulator to help you trigger the
Schema.org actions defined in the JSON-LD documents
for the music store. All right. What do we do next? Now that our JSON-LD markup
is on our music store site, we can run the crawler to make
sure our markup is correct. The crawler parses
the JSON-LD markup and transforms it into a
graph database called Triples. It looks like this. If this sounds
intriguing to you– it sure is for me–
see the second video where the my colleague
Barak will show you how the output of the
crawler can be loaded into an open source
database called Cayley. Barak will also show you how
you can join the music store data with Freebase, a free
structured data repository with millions of people,
places, and things. If you would rather skip
ahead and learn more about how the entities published on
our music store website work in conjunction with
Schema.org actions, see the third video
in this series. Better yet, clone the
repository that we have shared and play around with
the code before you proceed to the next video. Thank you, and have a great day.

Danny Hutson

12 thoughts on “Build a Small Knowledge Graph Part 1 of 3: Creating and Processing Linked Data

  1. I'm really excited to try this out. Thanks for posting +Jarek Wilkiewicz. I was researching schema.org actions today, so the timing is perfect. 

  2. Hi! I cannot find the git repo you are talking about at the end of the video.. Can you help me? Thanks and.. Well done!

  3. Hello, I'm new to schema.org and trying to figure out how it works to improve a Knowledge Graph for a client. In this video series you seem to only speak about Json and not about Microdata (which is suprising in my opinion since Schema.org is mostly about Microdata right?) Why is that? Is JSON the way to go and leave Microdata? Or you use both at the same time: Add JSON in <head> section and then define all the divs in the body with Microdata? Can you please explain?
    Thank you, Max

  4. What about processing large software sources? For example parse all mainstream languages using some sort of variative parsings like Prolog DCG grammars, and processing of attribute grammars? I mean software analysis and transformation

  5. Hi there, thank you for sharing this information and knowledge within your Cayley project. Would you or somebody mind linking to an article/tutorial how to crawl > semantic structured data from a website > and save it in the or a knowledge graph? Your video is great for STRUCTURED data, but it is quite difficult to use within unstructured data.

Leave a Reply

Your email address will not be published. Required fields are marked *