Oracle In-database Machine Learning in 7 minutes (ATP and ADWC)

Oracle In-database Machine Learning in 7 minutes (ATP and ADWC)


Hello everyone. My name is Jeroen Kloosterman. I’ll show you how to get started within in-database
machine learning in seven minutes. In this time we’re going to create our test
environment and we create a first machine learning example that predicts which customers
are likely to buy certain products. If you’re planning to add machine learning
capabilities to an application, then the database can be a very good place for that. What we do in this approach is to run the
machine learning processes directly on the data. In this case the same database as one of our
end user applications. And that means that we don’t have to move
any data around; not the input data nor the resulting predictive data. And all that our end user applications have
to do is to pick up the predictive data from the same database that they’re already connected
to. So this approach also works regardless of
what application technology used. It can be Java, .NET, APEX, you name it in terms
of skillset you’ll see that you probably already have a large part of the required knowledge
80 percent of machine learning is about data preparation and the great thing about that
of course is that we can just use SQL. And even commands for model training and predictions
can be done with a language that’s familiar to us – with PL/SQL. To do in database machine
learning you have several product options. Most of the Oracle databases have this possibility.
And for you to get started with this technology I recommend autonomous cloud databases. These are very developer friendly options. You can spin out your own free test environment
in minutes and there’s hardly any system administration or tuning that you have to do. So for our example case that is perfect. Now in this video I will use the transaction
processing database, but just remember that the way the machine learning works for both products
is exactly the same. We’re going to go through three steps: First of all we’re going to create an Oracle
Cloud account. Then we’ll create the database test environment. And lastly we are actually
going to create the machine learning script. In this case to predict which customers are
likely to buy a particular product. To create an Oracle Cloud account just follow this
URL. Oracle gives you 300 dollars or 250 Euros to spend for free and
you can spend that on any of the cloud services actually. The URL brings me to this
page. So I just go to “Create a free account”. Important to know here is the “cloud account
name”. This is a logical name for your entire cloud
environment. So all the services that you will create will
form part of this context. The data region is to decide from which physical location
your services are run. The rest of the fields are self-explanatory. Your account must be verified by phone so
you will receive an SMS message when you press the “request code” button. And of course you need to enter that code to
verify it. Last of all a credit card a debit card is
required. In this case I’m using my personal credit
card as I’m just doing this experiment by myself. I think it’s important to mention that your
card details here are only required for verification. So the trail really is free. Now let me just open the welcome email. The
comment here also shows that also after the trial you will not be charged anything unless
you have explicitly upgraded to paid inside of Oracle Cloud. Otherwise the trial will simply end, and that’s it. Now we will create an Autonomous Transaction
Processing Database environment. In my case I will go for a minimum size environment
of 1 OCPU and one terabyte of storage. This process will create a database user with
mandatory name “admin” for which we have to choose a password. Another option that requires some explanation
is the “license type”. This only becomes relevant when you convert
the trial to an actual paid subscription, and if you choose to do that, then this option lets
you choose whether you use existing Oracle on premise licenses or new licenses. But now as long as we’re in the trial phase
this option is not relevant. So I’ll leave it at it’s default. That’s it. Now to ATP service is being provisioned. This
normally takes a few minutes so I’m going to do a -fastforward-. My ATP database instance
has just been created. So we’ve already got two users and we’ll now
create one more user that will be the developer of the machine learning processes in this
ATP database instance. I’m just going to log in with the details of the “admin” user that
we just created. I call my user “MLUSER1” and give
him a password. And this is actually a real database user
as well. Now we can start writing our Machine Learning script. So let me just log in with the details of
the machine learning user that we created earlier. Let’s create a new Notebook, where the real development happens. In our case we will build a model that predicts
how likely a customer is to be interested in a product we call “Y Box Games”. This is done by learning from data from existing
customers. Here we have a table that shows whether customers
already own “Y Box Games” and we see many factors of which we believe they may have
some influence on whether a customer owns these games. For example their level of education,
their occupation, the household size, etc. The magic of machine learning is that it will
find out exactly what the relationships are between these variables and our target variable,
which is the “Y Box Games” column. We’re splitting our data into two sets. 60 percent for training and 40 percent of
the records for testing. Before we can build the model first we must create a table with hyper
parameters for our model. This table can have any name. Right now the only parameter that we’re inserting
is the type of particular machine learning algorithm that we want to use. In this case it’s a Decision Tree. To actually
build and train the model. We have to use one single PL/SQL command and it takes some parameters. First of all there is the name that we will
give to our new model. Then there is the keyword “CLASSIFICATION” because
this is a classification problem. Then there is the name of the table that we’re
going to use for training that we created earlier. We also see the name of our target variable. The “Y Box Games”. And of course we see the name of the hyper parameters table. So that’s it. And now we can train the model. Before we can use our trained model, we should validate its quality. We’re doing this by applying our model to
our test set. In other words we’re going to predict whether
the people in the test set are likely to have “Y Box Games”. To do this we create a new placeholder column for our
prediction and we run the prediction to fill that new column. The results show now a new column. We have a predicted “Y Box Games” column
and we also of course have the actual known value of the “Y Box Games” column. So let’s see in what percentage of cases our
prediction is correct. We’re predicting that ownership correctly
in about 90 percent of the cases. We can look into this number in more detail with a Confusion
Matrix and this can easily be created by grouping on the two. “Y Box Games” columns. So here we see from top to bottom the true
negatives, the false positives, the false negatives and the true positives. So it might have been a bit fast if you don’t have a background in machine
learning yet. In that case I recommend that you watch my other videos to get more background
on the topic. Before I go, please press the subscribe button so you’re kept up-to-date on new videos. Also if you have any feedback or questions
for me, please use the sort questionnaire that you will find below the video and I’ll be
happy to help. Enjoy your first steps with machine learning! Bye!

Danny Hutson

Leave a Reply

Your email address will not be published. Required fields are marked *