Oracle and DB2, Comparison and Compatibility/Process Model/Locking
Overview[edit | edit source]
As soon as you have more than one user on a database you have to have a mechanism to prevent them from accessing the same data at the same time. There is a sequence to how transactions make changes to data in memory, how those changes are logged, and how those changes in memory are written to disk. Each of these steps takes a certain amount of time to execute, and some of the steps are designed to circumvent expensive I/O operations – changes will not be written as they are made, they will be saved up so that a group of changes can be written in one fell swoop. Each step in this sequence of operations takes time, and if a database is servicing many users there is an opportunity for changes from one user to impact changes from another user. With modern high speed processors, delayed writes, and many concurrent users the opportunity for this type of event increases dramatically, but it only has to happen once for it to be a big problem.
An example of this type of problem is that if two users read and change the same data, if there is no control then the last user to write their changes ‘wins’ – they will overwrite the changes made by the other user. This type of problem is called a Lost Update.
The way that Oracle and DB2 handle this is with Locking – where records that are going to be updated are locked first. As with many aspects of database operation, there is a trade off when using locks, and this trade off is that if you lock large pieces of the database, the database rapidly becomes unusable. To account for this, databases employ different levels of locking within the database, and different strategies for locking and lock escalation. This overview describes the types of situations that a database needs to consider in its locking strategy, the individual sections for Oracle and DB2 describe how they implement these strategies.
The types of problems that occur when users are reading and writing the same data without any controls are:
• Phantom reads – where a transaction reads a number of rows, and another independent transaction either deletes or adds rows. If the first transaction issues exactly the same read, it will get a different number of rows.
• Dirty reads – a transaction reading uncommitted data from another transaction.
• Lost updates – an update in one transaction overwrites data in another transaction.
• Non-repeatable reads – a transaction that reads data twice should not have that data change between reads by other transactions.
It should be noted that it’s not just DML that changes databases, DDL like Drop or Alter will also change tables. Your locking strategy needs to cater for DDL changes as well as DML changes.
Many database operations involve Cursors. A cursor is a device for navigation through a set of returned rows – it’s a pointer to where you are in the set. Locking needs to cater for cursor operations, so that changes to rows in the set (inserts and deletes) do not disrupt the cursor handling, this type of disruption is called Cursor Instability, and it’s an especially sensitive area when large amounts of users can be fetching large sets of rows simultaneously. In this instance the user is not affecting a single row of data, but multiple rows. If you have many users doing this, a poorly implemented locking strategy can bring the database to its knees.
To avoid these problems the ANSI/ISO SQL standard defines a number of isolation levels:
SERIALIZABLE. All transactions execute as if they were issued serially (i.e. one after the other). This holds true even if the DBMS can execute multiple transactions simultaneously, to all intents and purposes the transaction execute as if they were completely isolated.
REPEATABLE READ. All rows read by a SELECT cannot change. However, if the SELECT has a ranged WHERE clause, phantom reads may occur since this isolation level gets read locks on all retrieved data, but does not get range locks.
READ COMMITTED. Data retrieved in a SELECT statement may be modified and because of this Non-repeatable reads may occur. This happens because read locks are acquired on selected data but they are released immediately whereas write locks are released at the end of the transaction.
READ UNCOMMITTED. One transaction may see uncommitted changes made by some other transaction.
The default isolation level for DBMS’s vary, and most vendors allow users to change isolation levels and specify different locks in their SELECT syntax.
Isolation levels are a way to balance the requirements of multi user databases that need to maintain read consistency. This is a two fold issue, first you need to understand the methods employed by vendor, and then you need to understand which methods are implemented (and why) in the system you are working on.