HA-BLOG
Parallel Databases
(Ref. https://www.educba.com/database-parallelism/)
Contents
- Introduction
- What are parallel databases?
- Working of Parallel Database
- Performance measurement of Parallel Databases
- Benefits of Parallel Databases
- Benefits of Queries
- Disadvantages of Parallel Databases
- Types of Parallelism in database
- Architectures of Parallel Databases
- Applications
- Conclusion
- References
Introduction
Database systems are pretty much used in every organization to store and manipulate data. These systems have made it easier to update, retrieve, delete, add data and perform various operations.
Over the years, different types of database systems have been developed to make this process easier and faster. One of these systems that is widely used these days is 'Parallel Databases'. Let's learn more about it.
What are parallel databases?
- Parallel databases are the ones which involve usage of multiple processors to provide database services simultaneously.
- A parallel database system's main objective is to improve performance using parallelization of several operations like loading data, building index and evaluating queries.
- These systems use multiple CPUs and disks in simultaneously to achieve high performance.
(Ref: https://www.tutorialride.com/parallel-databases/parallel-databases-tutorial.htm)
Working of Parallel Database
- Step 1 :- Parallel processing usually divides a relatively larger task in smaller sub problems and then works on those smaller tasks concurrently on several CPUs.
- Step 2 :- The soul goal behind parallel databases is the increasing demand of applications that need to query very large databases or that have to work on a large number of transactions in a second.
- Step 3 :- In parallel processing, many operations are performed simultaneously as opposed to serial processing, in which the computational steps are performed sequentially.
Performance measurement of parallel databases
- We will measure the performance of parallel database by taking two factor in consideration which are speed-up and scale-up.
- Response time is used to infer the performance of parallel databases.
- Response Time: The term Response time deals with the total amount of time it take to respond to a request.
-
Speed-up: The term speed-up deals with enlarging the degree of resources to complete running task in lesser time.
Speed-up = t1/tn
t1 stands for time taken by 1 processor to execute a task .
t2 stands for time taken by n processors to execute a task.
Scale-up: The term scale-up is the accomplishment to keep performance constant when processes and resources increases proportionally.
Scale up = Vn/V1
Vn = Time taken by n processor's to execute queries.
V1 = Time taken by 1 processor to execute queries.
E.g. Suppose 10 user's using CPU at 100% efficiency. If we try to add more users then its efficiency reduces. Single CPU don't able to handle more number of user's. We can add new processor to increase response time. And will provide 200% efficiency.
Benefits of parallel Databases
-
Speed
The parallel database system works on divide and conquer approach. It breaks up one request into different parts and sends each request to a different computer.
Thereafter, it executes the requests and later on combines the outputs and returns them. -
Capicity
In order to work on increasing requests, more machines can be added in the parallel system thus increasing its capacity. -
Realiablity
When some failure occurs at one machine in the cluster, the server senses that there is no response from that system and then immediately redirects the request to another system. Thus, chances of failure are reduced significantly making the system more reliable.
Benefits for queries
Parallel query processing can benefit the following types of queries :-
- Select statements where scanning of large numbers of pages is to be done.
- Select statements that include union, order by, or distinct can make use of parallel sorting.
- Create index statements, and the alter table - add constraint clauses that create indexes, unique and primary keys.
Disadvantages of Parallel Databases :-
-
Cost
Parallel database system require a large number of processors and disks to work simultaneously, and hence the cost of implementing them is very high. -
Resources
Maintenance of parallel databases requires frequent renewal, modification, replacement or change of resources which makes it a complex affair. -
Difficulty in managing the system
With large number of resources, machines, processors and disks, it is difficult and time consuming to manage them. If update is needed, each system takes up considerate amount of time.
Types of parallelism in database
-
Intraquery Parallelism
- Intraquery parallelism means execution of a single query in parallel on multiple processors and disks. Using intraquery parallelism is essential for speeding up long-running processes.
- This parallelism breaks down the serial SQL query into lower-level operations such as scan, join, sort, and aggregation.
- These lower-level operations are executed in parallel form.
-
Interquery Parallelism
- In interquery parallelism, different queries or transaction are executed in parallel.
- From this parallelism transactions throughput can be increased. Thus, the primary purpose of interquery parallelism is to scale up transaction processing to support more transactions per second.
Architectures of Parallel Databases
-
Shared Disk Architecture
- In Shared Disk Architecture, various CPUs are attached to an interconnected network. In this, each CPU has its own memory and all of them have access to the same disk.
- Also, note that since there is no memory sharing among CPUs, each node contains its own copy of the operating system and DBMS.
-
Shared-Memory Architecture
- The main advantage of this technique is that a single RDBMS server can apply all processors, access all memory, and access the entire database, thus providing the client with a consistent single system image.
- In Shared Memory Architecture, there are multiple CPUs that are attached to an interconnected network. They share a global main memory and common disk arrays.
-
Shared-Nothing Architecture
- It is multiple processor architecture in which each processor has its own memory and disk storage. In this, multiple CPUs are attached to an interconnected network through a node.
- Also, note that no each CPU has its own disk area. There is no sharing of memory or disk resources.
Applications
Several applications now-a-days require very large databases which go up to hundreds of terabytes of petabytes. Such applications majorly use parallel database systems.
- E-commerce: Where large number of customers request data at the same time and even transactions have to be performed at a high rate.
- Data-warehousing: Storing large volumes of important information by businesses and organizations and then manipulating that information is done through parallel database systems.
- Data-mining: Working on huge datasets is made efficient with the help of parallelism.
- On-Line Transaction Processing (OLTP): Where concurrent transactions are performed on large databases.
- On-Line Analytical Processing (OLAP): When we need to perform relatively complex queries like decision support queries.
Conclusion
- Parallel databases have changed the way of storing data. They have transformed the way in which databases perform.
- Being efficient and way faster than traditional centralized databases, parallel databases are now being used by several tech-giants and businesses.
- If we focus on the brighter side and work on the few disadvantages, these type of databases will improve a lot of operations and transaction ten-folds.
References
Informative content
ReplyDelete