Database Clustering Tutorial 1 – Intro to Database Clustering

Database Clustering Tutorial 1 – Intro to Database Clustering

What is up my peeps from the intertubes? It’s me from CalebTheVideoMaker2 and in this
series we are going to be talking about database clustering and managing those database clusters
with a tool called ClusterControl. A database cluster put simply is when you
have multiple computers that are all used to store your data. I’ve come up with four reasons why database
clustering is a good thing. Those reasons are data redundancy, the good
kind of data redundancy (and we will talk about that in just a moment), availability
scalability and monitoring. Okay… I need spell check this time. The very first reason is data redundancy. Now I’ve talked a lot about data redundancy
in my videos and all through one of my series, database design I talk about how data redundancy
is really bad but, that is a different kind of data redundancy. That kind of data redundancy is when you have
unnecessary duplicate data that can then… one can change and one can stay the same for
example and then you have ambiguity, because you don’t know if these are two separate entries
or if they are talking about the same thing and one of them is incorrect and it just gets
really messy and confusing. Data redundancy when it comes to database
clustering is a little bit different. So with database clustering we are essentially
using three computers to store our data. But how exactly is that data going to be stored? Are we going to store a third of the data
in each one of these computers or are we going to store most of the data on one and then
more important stuff on the other? or what? or what? Actually we are going to have these synchronize
to where all of the computers store the same information. That’s a great thing because if one of these
computers blow up, well you still have two copies of the data! So this data redundancy is not a bad thing
because everything is going to be synchronized. What that means is that as soon as there is
a change on one of these computers, that change is going to propagate to all of the computers
inside of this cluster. The data being synced this way gets rid of
the risk of ambiguity because all of the data is going to be the same. Data redundancy is one of the features you’re
not going to be able to get if you are just using a stand alone computer for all of your
data because then if this computer blows up, goodbye all your data! Or you are going to have to get regular backups
and store them on a hard drive or on something else. But we don’t want to do all that extra stuff. We just want it so where if something catastrophic
happens the data is not going to be deleted. We don’t have to worry about backing up data
and all that stuff. Obviously you should still back up your data
but there is a whole new layer or protection when you have a cluster. Now when it comes to scalability, the thing
I’m most interested in is load balancing. What a load balancer does is it will take
all of the incoming traffic and what it will do (so let’s say this big bubble is our load
balancer) what the load balancer will do is it will direct where the traffic should go
to which node it should go. The benefit of that is that well… there
is only one communication point to the database cluster, right here, so it’s easier for the
programmer. The second thing is that the load balancer
is not going to tax one particular computer too much. So you can just think of it as balancing the
demands all across these different computers and not taxing one particular computer too
much such that is slows down or explodes! Worse case scenario…. When it comes to load balancing though, it
can be a little confusing because there are a lot of different competitors and options
so not all of them work exactly the same way. This is just a general example that will help
you understand the concept of load balancing. When it comes to scalability, that means that
you can expand the application very easily. If you are working with one computer for example
and you just get like BOOM 4 million visitors, well that might tax the system really bad. With a load balancer those visits are going
to be distributed and when you have spikes of growth, your system is not pressured as
much. Scalability and availability go hand in hand. Essentially what availability is, is how much
or how often your application or database is available. How often are you able to use it and how often
is it broken? We want the availability to be as high as
possible. So for example let’s say we have a 99 percent
availability. Well that seems pretty good. I mean it’s almost 100! But if you think about it 1 percent of the
time, your database is going to be unavailable. So if you think of 365 in a year. 10 percent would be 36.5 1 percent would be
3.65 So we’re talking 3 1/2 days that your database is not going to be available. So 99 percent is actually a really bad value. How high of a number do you actually need? That depends on what the database is being
used for and how many users you’re going to be getting. For example if a mom and pop shop can save
some money by going to 99 percent and not paying extra for that extremely high available,
then 3 1/2 days down of the year isn’t that bad. But if you’re google, that’s really not that
good. Scalability comes in because if you’re load
balancing, you’re going to make your availability much higher. Because you know, this computer explodes,
nothing breaks because you can communicate with this computer just the same. And with a load balancer, it understands all
that so it knows which computers to talk to. If you were just connecting right to this
database here and then it breaks, that not good because now until its fixed, your software
systems not going to work. That’s bad! The last thing is monitoring and automation. I just clumped those together. So let’s make sure we are all on the same
page. When I’m talking about monitoring, I’m talking
about looking at the health of your database system. You can do that manually, but there are a
lot of tools out there to help you automate the process. So you can write scripts against your database
cluster and say, “Hey! If this is going on make sure you tell me
because that’s a big problem.” And you can have these scripts run regularly
like once a week or once a day or every so often and that will help you monitor the health
of your database. Also with automation (another piece of automation)
is doing a lot of this database clustering…. [dog in background drinking water] Onyx will
you be quiet! I’m making a video! You can do a lot of this database clustering
automatically with some of the software tools out there. For example you can just say “Hey! I want a load balancer.” Enable. Boom! You have a load balancer. That’s all you have to do. We will be getting into some of those tools. Until then, I just want to talk about theory. So that is all I really have to talk about
in this video. In the next couple of videos will be some
more informational stuff, and then what we’re going to do is actually make a database cluster. That’s going to be cool because then we can
test stuff. We can shut one down and see how things work
and all that good stuff. So this series is going to be sick! That’s good by the way… That’s kind of a dumb slang term because sick
is like a bad things, but when you say something is sick somehow it’s a good thing… but who
knows! So just as some extra information to help
you get through this series the best I’m going to have a study guide available at the end
of this video and also in the description probably… In addition to that I’m going to have extra
information on my website which you can probably find in the description assuming
I don’t forget to put it there. Also the company that is helping me make these
videos is called Several Nines They are the ones that have made some of this database
clustering software and they decided that they wanted to help support my channel. So that is really great for me and it’s also
great for you because I can make more content. So if you appreciate these videos, go check
out their stuff (obviously we are going to check it out because we are going to be using
their stuff throughout this series) but, I encourage you to go to them and say “Hey! I saw you from CalebTheVideoMaker2 you should
pay him tons of money!” You know that would help me out. So yeah guys! That’s all I have. Please please please remember to subscribe
to this channel because there is going to be lot of great new content coming out and
I’m also going to go back and finish my C Programming series… I know…. I’ll get to it. Just relax! So stay tuned. One last thing check out the newsletter, that
will be used to give you guys notifications of cool new stuff going on with CalebTheVideoMaker2. Lastly follow me on social networks and all
of that good stuff… Alright that is enough advertising for like
a week! I will see you guys in the next video. Hopefully this was helpful. If you have any questions at all please leave
them in the description…. [face palm] leave them as a comment and I will see if I can
help you out! Thanks and I will see you in the next one!

Danny Hutson

24 thoughts on “Database Clustering Tutorial 1 – Intro to Database Clustering

  1. Thanks Caleb for a lovely and informative video. Can you please also use polarizing filter for you camera or change an angle of shooting. Lamp reflecting on a board and some text on it – is not human-readable.

  2. Just starting the series, haven't been in the tech field since 2001 so I'm catching up.

    I might shoot you some questions soon that way I can an idea of the software available and beneficial features that best suited for what I want to do.

  3. doesn't the load balancing point contradict the redundancy point?  If all data is going to all nodes, what is there to balance?  Everything goes to all nodes, not just some data going to the least used node.

  4. only 173 likes out of 9696 views? come on people u can do better! this is a great video i watched about clustering!

  5. Awesome video. Super smart tech guy using a chalk board in 2017? I'm digging the old school meets new school vibe.

  6. Thanks for the video. Let's say your organization has 20+ critical applications and requires high availability(Active/Passive). Would you create one cluster and place all nodes(that serve the 20+ applications ) or would you try and divide it into cluster A, B, C and split the nodes and respective applications between them? What is the best approach, with cost not being a factor.

  7. Nice one dude.. I like your maneere.. just remove the lamp from the front of the board, it makes the text invisible XD

Leave a Reply

Your email address will not be published. Required fields are marked *