Hello and welcome to this short summary presentation on Distributed Database Management Systems. This video is a re-presentation of an infographic created as part of a final project for UW-Milwaukee Information Studies 410, Database Information Retrieval Systems. A distributed database management system, or DDBMS, can be classified as either homogenous or heterogeneous. A homogeneous DDBMS is one in which the database management system is identical at each node of the distributed database. A homogeneous DDBMS is easiest to set up within a single organization that can control the overall system design, ensuring that all the database locations implement the same system. By contrast, “heterogeneous” describes a DDBMS for which the database management system employed by at least one location is different from the others. A heterogeneous DDBMS, for example, could be a solution to integrate data from two organizations when they merge. In this case, the newly-created organization needs to find a way to access data from multiple systems without re-formatting all of their records and without the expense of building an entirely new system from scratch. A heterogeneous DDBMS uses a software layer called “middleware” that allows users to access both of the original databases through a single interface. The same strategy could likewise be applied by an organization that requires connections to data contained in different systems across multiple departments or by an organization that needs to access data stored in legacy systems that use older DDBMS software. There are three characteristics that are shared by all distributed database management systems, each of which relate to how users interact with the distributed database. The first is location transparency. This means that a user can access data from multiple locations across the system, yet it should appear as if all the data is contained in a single database that is hosted locally. Fragmentation transparency is a somewhat similar concept: a user may need to retrieve data from multiple parts of a distributed database, and the DDBMS creates the appearance that all the data is contained in one database. Another way to state this is that the user does not need to specify which fragments to pull data from or where they are located. Finally, replication transparency is a characteristic of distributed databases related to updating and replication procedures carried out by the DDBMS. These types of processes should occur in the background and should not affect the way a user interacts with the system. Here is a brief list of several real-world organizations that utilize distributed databases. Best Buy and Amazon.com, for example, maintain retail websites that take advantage of the “high availability” of a DDBMS. Even if individual nodes of the database become unavailable due to hardware or network communication failure, the overall system is still accessible to customers. McAfee and Craigslist are two companies that require the ability to store large amounts of data along with the potential to “scale” their storage as the system grows. Adding storage capacity quickly and easily is another advantage of using a DDBMS strategy. Netflix and CERN (the European Organization for Nuclear Research) are two very different organizations, one commercial and one scientific, both of which use an open-source DDBMS called Apache Cassandra. Apache Cassandra is an example of an open-source DDBMS that is used in a variety of applications. Many organizations take advantage of the fact that Cassandra is free, which reduces development costs. Because it is open-source, Cassandra can also be customized for the needs of a particular company or institution. The table on this slide lists some of the features of Cassandra as described on the Apache website. Of particular note are the features “fault tolerant,” “decentralized,” and “durable,” all of which are related to system availability. System availability is the most important factor to consider when designing a DDBMS strategy. Hardware components will fail at times, and network communication between distributed database nodes can be interrupted by events such as cut cables or natural disasters. A well-designed DDBMS is robust enough to overcome these obstacles, ensuring that the system continues to be accessible and functional nearly 100% of the time. “High availability” is an especially significant consideration for retail and ecommerce websites. Customers expect a website to be available at all times. System outages result in lost sales and can lead to decreased customer loyalty. In order to describe the characteristics of a distributed databases, C. J. Date created a list of 12 rules for any distributed database management system. Rules 1 and 2 relate to the decentralized nature of the distributed database. Rule 3 identifies system availability as an essential attribute. Rules 4 through 6 concern user interaction with a distributed database, as described previously. Rules 7 and 8 describe how a DDBMS should handle queries and transactions across the multiple locations of the database. Rules 9 through 12 all relate the requirements of a DDBMS to make the database accessible to all users regardless of the hardware, operating system, network access methods, or database software that they may be using. It is difficult for any DDBMS to satisfy all 12 requirements completely, but this list does outline the goals that engineers and designers should aim for when employing a DDBMS strategy. Apple iTunes software is an example of how a DDBMS can be built to function in accordance to one of Date’s 12 rules, in this case rule 10 – operating system independence. iTunes can be installed on computers with a Windows or Macintosh operating system. Users create a profile that connects them to customer data in Apple’s distributed database, stored in the “cloud” at multiple locations. Regardless of which OS the computer is running, iTunes provides the same functionality and access to data, making it operating system independent.