Oracle and DB2, Comparison and Compatibility/Database Scaling

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Introduction[edit | edit source]

Because of their flexibility and degree of standardization, relational file systems have become the preeminent storage choice for businesses. The ability of relational databases to store huge amounts of data (literally, petabytes at the time of writing) and the desire of organizations to gather more and more information have advanced the requirement for relational databases beyond being simply huge repositories for information.

The first of these requirements is speed. There is a certain human expectation in system response time, and people expect modern systems to be faster. The explanation that individual response times might be slowing but that overall more concurrent users and orders of magnitude more data are being handled sounds more like an excuse than a reason. This opens an immediate opportunity for those organizations (Both database vendor organizations and database consumer organizations) that can service greater user and data populations faster.

Next comes availability. The Internet has facilitated round-the-clock business operations, so the file systems supporting this information need to be highly available (HA). High availability implies fault tolerance – the ability of data systems to quickly recover from system failures of any description, and to protect themselves against single points of failure hamstringing operations. This naturally leads to de-centralization, a geographical and logical separation of computing resources and data.

The requirement of “lots of data, quickly available to lots of people, all the time, everywhere” can be solved in several different ways, and can simply be described as spreading the requirement across the available resources – either spreading out the data, spreading out the processing, spreading out the hardware or combinations of all three. The strategies are relatively straightforward, and each is described in this section. Since the implementation of these strategies requires the coordination of several moving parts (data, memory, hardware and networking) they involve additional cost and a degree of planning in advance. This means that people do not tend to implement them just because it is technically possible, implementations tend to be based on “what is the simplest and most cost effective solution to my requirement”.

There are a number of ways of organizing data and processing power that is independent of database vendors. We are going to begin with a description of these:

Overview of Database Topologies:

So far our architectural descriptions of a database have looked like this:

General Database Architecture


This model has provided a framework for comparison of functionality, since in general databases need to do the same things, and they go about them in the same ways (i.e. moving data from disk to memory and back again, logging transactions, executing queries etc.)