Parallel and Distributed Database Systems

The architecture of a database system is greatly influenced by the underlying computer system on which it runs. Generally, databases are stored and managed on computers having any one of the following three architectures:

Server Architecture
Parallel Architecture
Distributed Architecture

Server Architecture

In server architecture, computers are connected to a network that consists of one server system and multiple client systems. In this architecture, functionality of the system is split between a server and multiplie clients. The server satisfies the requests generated by client systems. This division of work has led to the concept of client-server database systems.

Functionalities provided by client-server database systems can be broadly divided into two parts - the front end and the back end. The front-end of a database system consists of tools such as SQL user interface, forms interfaces, report generation tools, and data mining and analysis tools. Where as the back-end manages database related taks such as access structures, query evaluation and optimization, concurrency control, and recovery.

The communication betwen the front end and the back end generally takes place through a common languaged called Structured Query Language (SQL). Standards such as ODBC and JDBC were also developed to interface clients with servers.

Systems that deal with large numbers of users adopt a three-tier architecture, in which the front end is a Web browser which talks to an application server. The application server, in turn, talks to the database server for storage and retrieval of data from the centralized database.

Parallel Architecture

In parallel archiecutre, processing takes place in multiple CPU of the same computer, or multiple processors of various computers that run parallely. Parallel processing with in a computer system allows database-system activities to be speeded up, allowing fast response to transactions, as well as more transactions per second. Queries can also be processed in a way that exploits the parallelism offered by the underlying computer system. This led to the development of parallel database systems.

Parallel systems improve processing and I/O speeds by using multiple CPUs and disks in parallel. In parallel processing many operations are performed simultaneously, as opposed to serial processing, in which the computational steps are performed sequentially. There are two main measures of performance of a database systems that makes use of parallel processing: through-put and response time.

Through-put refers to the number of taks that can be completed in a given time interval and response time refers to the amount of time it takes to complete a single task fromt the time it is submitted. A system that proesses large transactions can imporve response time as well as throughput by performing subtaks of each transaction in parallel.

There are several architecture models for parallel machines. The following are four architectures in which multiple processors are running parallely, and the resources such as memory, processor and databases are shared among them in four different ways:

Shared Memory Architecture: In this architecture, all the processors share a common memory.
Shared Disk Architecture: All the processors share a common set of disks and the shared-disks connected to this system are called clusters.
Shared Nothing Architecture: In this kind of architecture, the processors share neither a common memory or common disk among themselves.
Hierarchical Architecture: In this model of paralle processing, a hybrid architecture, which makes of more than one of the above mentioned architecture.

Distributed Architecture

In distributed architecure, the database is stored on several computers. The computers connected to the distributed environment communicate with one another through various communication media, such as high-speed networks or telephone lines. They do not share main memory or disks. The computer may also vary in size and function, raning from workstations up to mainframe systems. The computers in a distributed system are referred to as sites or nods.

Distributed architecture looks similar to that of Shared Nothing Architecture in parallel systems. The main differences between distributed architecture and shared-nothing parallel architecture are the following:

Distributed systems are typically geographically separated
They are separately administered and
They have a slower interconnection

Another major difference is that, in a distributed database system, we differntiate between local and global transactions. A local transaction is one that accesses data only from sites where the transaction was initiated. Whereas, a gloabl transaction, either accesses data in a site different from the one at which the transaction was initiated, or accesses data in several different sites.

Search This Blog