Database Lesson #1 of 8 – Introduction to Databases

Welcome, everyone. This is Dr. Soper
And today we will be going through our first
lecture for our database class. I need to operate
under the assumption that not everyone in the
class has had experience with databases in the past. And so this first lecture is
intended to bring everyone up to the same basic
level of understanding with respect to several
important database concepts. We have several
different objectives, which we will seek to
achieve in this lecture. These are listed on
the screen right now. First, we want to
try to understand why storing data in a
list is not necessarily a very good idea. It can cause many different
types of problems. Second, we want to see if we
can gain some insight into why an organization might
want to use a database. Third, we will see how the
notion of related tables of data, which forms a core
part of the relational database model, provides a
basis for eliminating many of the problems which are
associated with storing data in a list. We’ll also explore the various
components and elements that comprise a database
or a database system. And we will learn about
the purpose of something that we call a DBMS, a
database management system. And along the way,
we’re going to explore some of the functions
that a robust database application can provide to us. What, then, is the
purpose of a database? Well, it’s important to remember
that a database does not have just a single purpose. Instead, there are
several key advantages that databases provide. First, a database provides a
repository for storing data. That’s kind of
implicit in the name. Database implies that we
have a place to store data. However, what might
not be so obvious is that databases provide
an organizational structure for data. That is, we don’t just
have a place to store data, but the database also provides
an organized structure into which those
data can be placed. Finally, a database
provides us with a mechanism for interacting with our data. Now, interacting with
data can generally be described in four
different operations. Here they’re listed as
querying, creating, modifying, and deleting data. But there’s another more
interesting acronym, which may help you to remember this. And that acronym
is CRUD, C-R-U-D. This stands for create,
read, update, and delete. These are the four
basic operations that we can use when
interacting with data. A key point to remember here
is that, in business, there are many natural, hierarchical
relationships among data. For example, a customer
can place many orders. Another way of saying that
is many different orders can be associated with
the same customer. Or another example
is a department can have many
different employees, but a given employee might
work in one, and only one, department. So these are hierarchical
relationships among the data. And a relational database
allows us to model and represent these relationships. Let’s take a look
at a list of data. What I mean by a list here is
a simple, two-dimensional table of data. And in this table,
we store information that is important to
us for some reason. An example might be, say that
we have many different projects in our company, and we want to
keep track of who the project manager is for each project. Therefore, for each
project, we may track the project manager’s
name, their ID number, and maybe their phone
extension within our company. Now, ordinarily this wouldn’t
be such a big problem. But imagine if the same person,
the same project manager, is simultaneously managing
multiple projects. In that case, using
a simple list, the project
manager’s information would necessarily appear
on the list multiple times. Now, what is the
problem with this? Well, there isn’t really
any major problem, aside from the fact
that it’s redundant. Another way of saying that
is we are using more space than is necessary to record
which project manager is associated with which projects. Another problem with
storing data in a list is that the list of data may
contain more than one concept or more than one theme. I want you to remember this
idea of a business concept, because we will see it again
and again throughout our course. A business theme
or concept refers to a number of different
attributes or properties that are all naturally related
to one type of business entity. An example might be an employee. An employee is a
business concept. Employees have
different attributes that we may want to
track for each employee in our organization. For example, we may want
to track the employee’s ID number, the employee’s
name, the employee’s salary. Other examples of business
concepts or business themes include things like departments,
products, customers, orders, locations, and so forth. Returning to lists,
a major problem is with these multiple themes. That is, for each
row of the list, we might be recording
information about more than one of these business concepts. As an example, consider our
list of project managers. If we also wanted to include
project information– that is, perhaps, a project
name, a project ID, any additional information
about the project– we might store that in the
same row along with the project manager. Aside from redundancy,
as I mentioned earlier, the problem here
is not necessarily how we are storing
this information, but what might happen
to the information if it is stored in this way. When we store
information in a list, we introduce the
possibility of something called data anomalies. And specifically, there are
three types of these anomalies. Here they’re listed as a
deletion problem, an update problem, and an
insertion problem. Later in our course,
we will refer to these as deletion
anomalies, update anomalies, and insertion anomalies. So the structure of a list
and its associated problems of redundancy and
multiple themes can produce a situation in which
these modification problems potentially could occur. So let’s take a look at some
of these modification issues. Here we have a list of data,
which contains information on students. So we have a student’s
last name, first name, their email address. And then we see who that
student’s advisor is. So the last name
of the student’s advisor, the email of
the student’s advisor, and the department in
which they work, along with some additional
information. Let’s take a look at how some
of these modification anomalies might emerge. In the first case,
imagine that we want to change a student’s advisor. And for this
example, we’re going to change Advisor Baker
to Advisor [? Tiang. ?] When we do this, not
only are we going to have to update the last name
of the advisor in the list, but to maintain the overall
quality of the data, a concept that we call data
integrity, we are going to have to also update
the advisor’s email address. Note, in this case,
that we do not need to update the department or
the administrator’s last name, because those are the same for
advisors [? Tiang ?] and Baker. However, if, for some
reason, we wanted to change the student’s advisor
from, say, Baker to Valdez, well, now not only
do we need to update the last name and
the email address, but we also need to
update the department and the admin last name
attributes as well. Just to do something as simple
as changing the student’s advisor with this
list approach requires that four separate pieces
of information be updated. So this is what we might call
an update anomaly or an update problem. Next, let’s look at
a deletion problem. Imagine that our student,
listed here as Chip Marino, decides to drop out
of the university. So we need to remove Chip
from our list of students. Now, look what happens if
we delete this row of data. When the row of data
is deleted, not only do we lose the
student’s information, but we’ve also lost
information about the advisor and the department in
which that advisor works. That is, you will notice that
Advisor [? Tran ?] is not currently assigned to
advise any other students. So when we delete
this row of data, we may entirely
lose the knowledge that Advisor
[? Tran ?] even exists. And that can be a problem. In this case, this is called a
deletion problem or a deletion anomaly. Finally, let’s look at
an insertion anomaly. Let’s say that at
our university, we decide that we want
to add a new department. So we create a
biology department. Well, this is fine. We add information to our list. We now have a biology department
and an administrator’s last name. However, we now have all of
this missing data in our table. There are empty cells. There’s no information
for a student. There’s no information
for an advisor. All we have here is
information for the department. This means that
data are missing, and we’re not efficiently
utilizing our available storage space. These are some of the
problems with these lists. Now, remember,
earlier I mentioned that in the business world
there are complex relationships among different types of data. For example, a department
may have many employees who work in that department. Or a product may be assembled
from many different components. Or a customer might place
many different orders with our company. So there are these
natural complexities that arise in business. And relational databases, as
we will see, not only solve the problems that are
associated with lists, but also allow us to model
these natural relationships among business data. Now, I’ve been using this
term relational database. So it would probably
be a good idea for us to consider what a relational
database actually is. On a very basic level,
relational databases store data in a table. Now, a table is simply
a two-dimensional grid of data that contains
columns, and it contains rows. The convention in the
relational database world is that the columns represent
different attributes of an entity. And each row in the
table represents an instance of the entity. So for example, if I
have an employee table with an employee ID, an employee
name, and an employee telephone number, we would list
those three attributes– ID, name, and telephone number–
as columns in the table. And then each row in the
table would represent an individual employee. Again, we said that there are
these natural relationships among different
business concepts out there in the business world. In a relational database, data
for each of these concepts is stored in its own table. So I may have an employee table. I may have a department table. And then I can
create a relationship between those tables,
which will allow me to figure out which employees
work in which departments. So a good way of
thinking about this is instead of storing all of
the information in one big list, we break the list apart
into separate pieces according to which
information is associated with which business
theme or business concept. Therefore, all of
the information associated with
an employee might go into an employee table. And all of the information
associated with a department might go into a
department table. Although, inside a
relational database, information about different
business concepts or business themes is stored
in separate tables, it may be necessary– say, for
reporting purposes– for us to reassemble all of the data
from these different tables back into a list. Now, we can accomplish this in
the relational database world by performing something
called a join operation. That is, we join
the tables together. Now, a very, very important
concept for you to understand is that in the relational
database world, we link the records
in different tables together using matched
pairs of values. These matched pairs
of values allow us to associate a
row in one table with one or more rows
in another table. For example, if we
have a project table and we have a
customer table, and we use an attribute
called a customer ID to uniquely identify each
customer, then what we can do is we can place a customer ID
attribute in the project table. And we can then use
the ID of the customer in each table to link
related rows together. Now, of course, you may
be listening to this, and you may think,
why do we want to do all of this extra work? And it is true that relational
databases are more complicated than storing data in a list. However, as we will see,
relational databases have many, many
advantages over a list. Among these, first,
a relational database minimizes data redundancy. In the examples
we saw earlier, we saw that there exists
with lists the possibility to have all sorts
of redundant data. Relational databases
eliminate this. Relational databases
also allow us to model the natural
relationships that exist among business
concepts in the real world. And there are several
other advantages as well, including providing
a solid basis from which to do things
like generate reports or build user interface forms. I mentioned a few
minutes ago that there are four basic ways of
interacting with data– create, read, update, and delete. In the relational
database world, we have access to something
called the Structured Query Language, often called SQL,
or SEQUEL, if you like. And this is the
primary tool that we can use for
performing these four basic operations on our data. In a few weeks, we will begin
learning the Structured Query Language. This is an extremely valuable
skill for you to develop. If you are able to learn the
Structured Query Language well, then you will be able
to sit down and work with virtually any modern
database with a very, very short learning curve. It doesn’t matter if you need
to work with a SQL Server database, or an Access
database, or an Oracle database, or a DB2, or a MySQL
database, or even a database for, say, a mobile device. Most databases, including all
of those that I mentioned, support the Structured
Query Language. So if you can learn the
Structured Query Language, you will be very well
positioned to work with almost any database. Even though we’re still
early in our course, I want to give you an example
of the Structured Query Language now so that you can see
that it’s really not that difficult to understand. This is certainly not a full
scale computer programming language. The Structured Query Language
was designed from the ground up to be simple to use and to
be simple to understand. So what we have
here is an example where we have three tables. There is a customer
table, and a course table, and an enrollment table. So let’s say that
these are art courses. We have customers who have
signed up to take different art courses to see, perhaps,
if they can learn to paint or they can learn to sculpt. Now, although we start
with three tables, if we need to combine
all of the information together in order to
restore the original list structure of the
data, we can do that by using something called
a SQL SELECT statement. Here we see an example
of such a statement. I know this looks complicated. But hopefully, you are able
to read this even right now at the beginning of
our course and get a good idea for what this
statement is supposed to do. So let’s take a look. We are selecting the customer’s
last name, the customer’s first name, the customer’s phone
number, and the date of the art course, the amount paid out
of the enrollment table, the course itself,
and the total course fee from our three
different tables– customer, enrollment,
and course. We are then linking the related
records in each table together using these matched
pairs of values that I mentioned earlier. In the first case, we are
linking the customer table to the enrollment table using
matched pairs of customer numbers. And in the second case, we’re
linking the course table to the enrollment table
using matched pairs of course numbers. What this will do
for us is restore the original list of data. So you can see our list. It contains the customer’s
last name and first name, customer’s phone number,
the course date, the amount that the customer
has paid so far, the course they are taking,
and the total course fee. Now, the SQL statement that
we saw was a SELECT statement. And that is a statement
which just pulls data from the database. There are many
other things that we can do with SQL, like deleting
data, inserting new data, updating data in the
database, and so forth. And we will learn how to do
all of those various tasks when we study the Structured
Query Language in more depth in a few weeks. Now I would like to
turn our attention to some conceptual information. Let’s focus on a
database system. It’s important to remember
that a database system is not just the database itself. Rather, it exists in a
broader hierarchy, which includes users, application
software programs which rely on the database, something
called a database management system, a DBMS, and then, of
course, the database itself. The broader picture
of a database system includes these four components. Here we see the relationships
between the four components of a database system. On the far left, we see users. Now, users are
people, just like you or me, who might need to
use the data in a database. But a user doesn’t necessarily
have to be a human being. It is possible, for example,
for other software programs to be users of our
database as well. Now, you will see
that users interact with database applications. These are the programs
that rely on the database. So if we are a human user,
we might, for example, use a website, which
interacts with a database. That website could be considered
a database application. You will notice that
the database application talks to the DBMS. The DBMS, the database
management system, acts as a gatekeeper. All of the information flowing
in or out of the database must travel through the DBMS. It is a critical mechanism
for maintaining the quality of the data in the database. We do not allow our users
or our database applications to directly access the
data in the database. To work with the data,
to access those data, database applications must go
through the database management system. And then, finally, far over on
the right side of our screen, we see the database itself. And this is where we
actually store the raw data in their separate tables. So what do the users of
a database system do? Well, they do many
different things. You and I are users
of database systems on almost a daily basis,
even if we don’t realize it. So we can look up our checking
account balances online. We can produce reports. We can seek information
on Wikipedia. All of these are examples
of us as users interacting with database applications. A few interesting things
to note about the database itself is that it is a
self-describing collection of related records. What this means is that
the database does not just contain the data, but it
also contains a definition of its own structure. Databases also contain metadata. The simplest way to
understand metadata is that metadata are data
which describe other data. And in the context of
the database world, metadata are data which
describe the structure of the data in the database. So for example, I may say that
an employee table contains an attribute called Employee
ID, and that ID value is an integer. That knowledge that the
Employee ID value is an integer is metadata. And of course, databases
allow us to establish relationships between tables. So that means the
tables within a database are related to each other
according to the relationships that we establish. So it’s important to remember
that a database does not just contain the data that
we have put into it. A database also
contains metadata, which describe our data. It contains, or
potentially can contain, what we might refer
to as overhead data. These are tables
of data that are used to improve
performance or track the status of the database
or how users are interacting with the database. And one of the major types
of these overhead data are called indexes. Later in our course, we will
examine database indexes. And of course,
databases can also contain application metadata. So these are data
values that are used by the applications
which rely upon the database. An application, for example,
might store user preferences in the database. Now, to reiterate, a DBMS, or
a database management system, serves as a gatekeeper
or an intermediary between the database
and the applications which rely upon that database. The purpose of the DBMS
is to control and manage the operations of the database. The DBMS is a special kind
of software program, which is used to not only
create databases, but also to process and
administer those databases. So a DBMS is a type of
application program. But it is not the database. The DBMS interacts
with the database, but it itself is
not the database. What, then, can a database
management system do for us? Well, the answer is it can
do many different things, as we see here on your screen. We can use the DBMS to
create a new database. And then, within
that database, we can create tables and establish
relationships between tables. We can read, write, edit, and
delete data using the DBMS. We can establish rules
which regulate and control the data in the database. The DBMS provides an additional
layer of security for us. And it also provides
mechanisms through which we can backup our data and
recover our data if anything ever goes wrong. So the DBMS is a very
critical piece of software. Now, businesses often operate
with many different types of business rules
or constraints. And one of the great things
about working with databases is that the DBMS can
establish and enforce many different types
of constraints. One of the most useful
types of constraint that a DBMS can enforce is
called a referential integrity constraint. The purpose of a referential
integrity constraint is to ensure that values that
we are entering in one table have matching values
in another table. Now, that’s a very
abstract way of trying to convey a simple concept. As an example, imagine
that we have a customer table and a project table. And we are trying
to enter information about a new project
in the project table. And we specify that
the customer ID that is associated with this
project is customer number 5. Referential integrity means
that the database will actually check in the
customer table to see if customer number 5 exists. If customer number
5 does not exist, then the database will not
allow us to add that customer ID to our project table. And in this way,
I hope you can see it helps to maintain
the quality of the data in the database, what
we call data integrity. Now, the fourth component
of our database system were database applications. And database
applications are simply those software programs
that need to use or work with data in the database. So these might be
data driven websites. They might be mobile apps. They might be
desktop applications or productivity software. Just remember that these
applications are not allowed to talk to or use
the database directly. But rather, they must
go through the DBMS to work with or use those data. Finally, for this lecture, I
want to just talk about some of the different types
of database systems that are available. First, we can consider something
called a personal database system. These are very small databases. They are typically used by
a single user at a time. They are not complicated. They are simple in design. And they’re not really
suitable for supporting medium or large businesses. They might be used by very
small businesses just out of convenience. But as those companies grow,
they will, at some point, certainly need to choose a
more robust database solution. An example of a
personal database system might be Microsoft Access. Convenient, easy to use,
but not very powerful. One of the interesting
characteristics of personal database
systems is that they often contain the capability
to build a database application, or at least
interfaces to the database within the DBMS. So in Microsoft
Access, for example, I can create data
entry forms or generate reports for my Microsoft
Access database. That is, it kind of combines
the database application and the database management
system into a single entity, conceptually speaking. On the other hand, we have
a class of database systems that are called Enterprise-Level
database systems. These are databases that are
used by medium and large size companies to support
their operations. And it is
Enterprise-Level databases that we will learn to use and
interact with in this class. Compared to personal
database systems, Enterprise-Level
databases are designed to support many different
users at the same time. And this could be thousands
or tens of thousands of users all at the same time. Enterprise-Level
databases also serve data to more than one application. For example, if we are
a retailer like Target or Walmart, we may
have a database that provides data
to our website and also provides data
to the dashboards that are used by our managers. Enterprise-Level
databases are commonly spread across multiple
physical computing devices. So many different computers. So a single database
might be running on many physical servers. Enterprise-Level
databases can also be geographically distributed. So I might have
part of my database in my corporate
offices in Los Angeles, another part in Beijing, another
part in Berlin in Germany. And Enterprise-Level
database management systems support more than one database. So within the same
database management system, we might, for example, have
our operational database, which allows us to
run our business and keep track of
transactions in real time. And we might also create a
data warehouse or data marts. And they can all be managed
by the same Enterprise-Level database management system. These are large databases. As an example, consider
an ERP system like SAP. A typical ERP
implementation will have thousands of
tables, all related to each other in some way. An Enterprise-Level
database management system can handle this level of
complexity with relative ease. So here we see a
graphical example of these Enterprise-Level
database systems. The DBMS, again, serves
as an intermediary or a gatekeeper
between the databases and all of the various
database applications that want to rely on the
data in those databases. And they can be
mobile applications, applications that are
written in Java, or C#, or even web applications. It might be ASP.NET or PHP apps. Just as examples of some
commercial DBMS products, again, I mentioned that personal
or desktop DBMS products might include Microsoft Access. Whereas,
Enterprise-Level products include SQL Server,
which is what we will be using in this
class, Oracle, MySQL, and DB2. MySQL, if you’re interested,
is a completely free and open source database. Well, my friends, thus
ends our first lecture for our database class. I hope you learned something. And come back soon, and
we’ll begin our exploration of our next topic. Until then, have a great day.

Danny Hutson

Leave a Reply

Your email address will not be published. Required fields are marked *