Design of Main Memory Database System/Overview of DBMS

Chapter 2: Introduction to Database Management Systems

Table of Contents — Previous: Introduction — Next: Introduction to MMDB

2.1 Overview

The database management system (DBMS) is the software that handles storage and retrieval of data. Most of the DBMS present today are relational DBMS. This book concentrates only on relational database management systems. RDBMS has five main components

Interface Drivers
SQL Engine
Transaction Engine
Relational Engine
Storage Engine

Figure 1 contains DBMS components, Memory layout and disk files associated with relational database management system. From early days of database system evolution, disk is considered to be the backing store for the data to achieve durability. The architecture above applies for disk resident database systems (DRDB). Nowadays there are two different approaches other than DRDBs. They are

Main memory databases (MMDB) – data is stored in main memory.
Network Databases – data is stored in another host over the network.

Most of the components in the DRDB system architecture above are present in main memory and network databases as well.

2.2 Driver Interfaces

A user or application program shall initiate either schema modification or content modification. These application requests are broadly classified by SQL as Data Definition Language (DDL), Data Manipulation Language (DML) and Data Control Language (DCL). DDL deals with schema modifications; DML deals with content modifications; DCL deals with user access and privilege modifications. If the application program is written in C/C++, it shall use ODBC drivers to connect to the DBMS, or if it is written in Java, it shall use JDBC drivers to connect to DBMS. Some vendors provide language specific proprietary interfaces. For example MySQL provides drivers for PHP, Python, etc.

These drivers are built on top of SQL. They provide methods to prepare statements, execute statements, fetch results, etc.

2.3 SQL Engine

This component is responsible for interpreting and executing the SQL query. It comprises three major components

Compiler – builds a data structure from the SQL statement and then does semantic checking on the query such as whether the table exists, field exists, etc.

Optimizer – transforms the initial query plan (data structure created by compiler), into sequence of operations usually pipelined together to achieve fast execution. It refers the metadata (dictionary) and statistical information stored about the data to decide which sequence of operations is likely to be faster and based on that it creates the optimal query plan. Both cost and rule based optimizers are used in case of DRDBs.

Execution Engine – executes each step in the query plan chosen by optimizer. It interacts with the relational engine to retrieve and store records.

2.4 Transaction Engine

Transactions are sequence of operations that read or write database elements, which are grouped together. Transaction should have the following ACID properties

Atomicity: Either all or none of the effect should appear in database after transaction completes.

Consistency: Constraints should always keep the database in consistent state

Isolation: Transaction should run as though no other transaction is running.

Durability: Once the transaction completes, effect of the transaction on the database must never be lost.

All the above properties are explained in detail under the Transaction Chapter.

Transaction engine comprises three major components

Concurrency Manager – responsible for concurrent synchronized access to data. This is usually implemented using latches and locks. Latches or Mutexes are acquired and released for short duration synchronization and locks are used for that of long duration.

Log Manager – responsible for atomicity and durability property of transaction. Undo logs make sure that transaction rollback takes the database state to previous consistent state when that transaction started. Redo logs make sure that all committed transactions shall be recovered in case of crash.

Recovery Manager- responsible for recovering the database from the disk image and redo log files. Most of the databases uses a technique called ‘shadow paging’ to maintain consistent image of memory in disk.

2.5 Relational Engine

Relational objects such as Table, Index, and Referential integrity constraints are implemented in this component. Some of the main components are

Field – abstracts column level information including type, length, etc. Catalog – maintains Meta data information about the relational database objects such as table, index, trigger, field, etc.

Table – responsible for insert, update, delete, fetch, execute. It interacts with the allocator subsystem of storage engine, which in turn talks to buffer manager to get the job done.

Index – responsible for insert, update, delete, and scan of index nodes for all index types. Popular index types are hash and tree. Hash index is used for improving the point lookup (predicate with equality on primary key) and tree index is used for improving the range query (predicate with greater or less than operator on key).

Expression Engine – represents the predicate (WHERE clause of SQL statement) of the data retrieval operation and responsible for evaluating the expressions, which shall include arithmetic, comparison, and logical expressions.

2.6 Storage Engine

This component is responsible to store and retrieve data records. It also provides mechanism to store meta data information and control information such as undo logs, redo logs, lock tables, etc. Important storage engine components are

Buffer manager – responsible for loading pages from disk to memory and to manage the buffer pool based on Least Recently Used (LRU) algorithm. This also has special purpose allocator for storing control information, which are transient. Buffer pool is the memory space used by buffer manager to cache disk pages associated with records, index information, Meta data information. Some database systems have space limit at individual level and some at global level for buffer pool size.

File Manager – Database in DRDB is nothing but a physical file at disk. File manager maps disk pages of the file to the memory pages and does the actual disk I/O operations in case of major faults generated by buffer manager module.

Process Manager – responsible for registering and deregistering database application process and threads and account all the resources (transactions, locks, latches) acquired by them.

2.7 SELECT Execution Sequence

This is what happens conceptually, when user issues a SELECT SQL statement, Fig 2: SQL SELECT Execution Sequence

User issues transaction start request (startTrans())
DBMS reserves one free slot for the transaction started (allocSlot())
DBMS returns to the user.
User issues an SELECT SQL request (stmtExecute())
DBMS interprets the request and represents it in data structure (parse())
DBMS checks whether table and field names exist in database (check())
DBMS identifies the optimum way to execute the statement (optimize())
DBMS executes the statement by interacting with relational engine (execute())
DBMS checks with buffer manager whether the disk page where data is present is already present in memory (isPageInMemory())
DBMS interacts with file manager to load the page into memory buffer if not already loaded.( loadPage())
DBMS evaluates records which satisfy the predicate (evaluate())
DBMS takes lock on the records based on the isolation level of the transaction (lockRecord())
DBMS retrieves records and returns to the application (returnRecords)
User issues transaction commit (commit())
DBMS releases all the locks acquired during the transaction (releaseLocks())
DBMS releases the transaction slot allocated for this transaction (freeSlot())
DBMS returns to the application

2.8 INSERT Execution Sequence

This is what happens conceptually, when user issues INSERT SQL statement,

Fig 3: SQL INSERT Execution Sequence

User issues transaction start request (startTrans())
DBMS reserves one free slot for the transaction started (allocSlot())
DBMS returns to the user.
User issues an INSERT SQL request (stmtExecute())
DBMS interprets the request and represents it in data structure (parse())
DBMS checks whether table and field names exist in database (check())
DBMS identifies the optimum way to execute the statement (optimize())
DBMS executes the statement by interacting with relational engine (execute())
DBMS checks with buffer manager whether the disk page where record needs to be allocated is already present in memory (isPageInMemory())
DBMS interacts with file manager to load the page into memory buffer if not already loaded.( loadPage()), not shown in diagram above
DBMS creates undo log records for the newly inserted record (createUndoLog())
DBMS copies the values from the application buffer to allocated new record (copyValues())
DBMS creates redo log records for the newly inserted record (createRedoLog())
DBMS takes lock on the allocated record based on the isolation level of the transaction (lockRecord())
DBMS checks if indexes are available, if yes it does index node insertion for all indexes on this table. (insertIndexNode())
DBMS checks with buffer manager whether the index disk page where index node needs to be allocated is already present in memory (isPageInMemory())
DBMS interacts with file manager to load the page into memory buffer if not already loaded.( loadPage()), not shown in diagram above
DBMS takes lock on the allocated index node (lockIndexNode())
DBMS returns to the application with the number of rows affected (returnRowsAffected)
User issues transaction commit (commit())
DBMS releases all the locks acquired during the transaction (releaseLocks())
DBMS releases the transaction slot allocated for this transaction (freeSlot())
DBMS returns to the application