IB/Group 4/Computer Science/Databases/Basic concepts
Basic concepts[edit | edit source]
Computers store data. Data can be any one of several different types (e.g. numeric, text, Boolean, etc.) but has no intrinsic meaning to a human. Data becomes information when it is put into a context that gives it meaning.
For example: 32 23 11 08 40 17 is data, but it has no meaning.
If we provide a context for that data, it becomes information, e.g.:
- The home and away scores for 6 soccer teams last Saturday
- The temperatures in degrees Celsius for 6 cities around the world at mid-day today
- The ages in years of the last 6 people to walk through the turnstiles of the Eiffel Tower in Paris.
Thus: Information = Data + context
Strictly speaking, databases store data, not information. However the terminology is commonly used loosely as there is an assumption that data stored in and retrieved from a database is interpreted by human beings within the scope of an appropriate context.
"An information system (IS) is any organized system for the collection, organization, storage and communication of information.
"An information system (IS) is a group of components that interact to produce information."
"..information systems .. are made up of six components: hardware, software, data, people, network, and process."
Information system is therefore a wider term than database as it refers to a complete system, although many (if not most) information systems include one or more databases as part of their overall structure.
There are several unique problems that arise when not using a database:
1. Data Size: Storing a small amount of data in a spreadsheet is manageable; however, when the data becomes extensive, a spreadsheet solution is no longer viable. If the number of data records reaches millions, storing the data in multiple spreadsheets can cause speed issues, making it time-consuming to find a specific record.
2. Ease of Updating Data: Multiple users cannot edit the same file simultaneously. As a result, others must wait for the file to become available for updating, leading to wasted time.
3. Accuracy: Data entry into spreadsheets or files may be prone to errors since there is no built-in data validation. Users might input incorrect spellings, dates, or amounts, making it challenging to maintain data accuracy.
4. Security: Data stored in text files or spreadsheets cannot be adequately secured. Unauthorized users can access and read the information, making these storage methods unsuitable for applications requiring privacy, such as banking, healthcare, or payroll departments.
5. Redundancy: Duplication of data is possible when using text files or spreadsheets. Limiting the addition of multiple data copies is difficult, leading to accuracy issues. Managing and updating numerous copies is not an easy task.
6. Incomplete Data: Some data may not be considered important, and therefore not entered into the file due to a lack of validation. This compromises data integrity.
To avoid the aforementioned problems associated with storing data in text files or spreadsheets, a database is necessary.
A transaction is a unit of work that is performed against a database. Transactions are units or sequences of work accomplished in a logical order, whether in a manual fashion by a user or automatically by some sort of a database program.
A transaction is the propagation of one or more changes to the database. For example, if you are creating a record or updating a record or deleting a record from the table, then you are performing a transaction on that table. It is important to control these transactions to ensure the data integrity and to handle database errors.
For example, to ensure data consistency when moving money between two accounts it is necessary to complete two operations (debiting one account and crediting the other). Unless both operations are carried out successfully, the transaction will be rolled back.
Consistency, in the context of databases, states that data cannot be written that would violate the database’s own rules for valid data. If a certain transaction occurs that attempts to introduce inconsistent data, the entire transaction is rolled back and an error returned to the user.
Data integrity is the overall completeness, accuracy and consistency of data. This can be indicated by the absence of alteration between two instances or between two updates of a data record, meaning data is intact and unchanged. Data integrity is usually imposed during the database design phase through the use of standard procedures and rules. Data integrity can be maintained through the use of various error-checking methods and validation procedures.
A transaction, in the context of a database, is a logical unit that is independently executed for data retrieval or updates. In relational databases, database transactions must be atomic, consistent, isolated and durable--summarized as the ACID acronym.
Concurrency control is a database management systems (DBMS) concept that is used to address conflicts with the simultaneous accessing or altering of data that can occur with a multi-user system. If two or more users try to update the contents of a database simultaneously, locks and partitions are put into place to prevent it. Thus enabling greater concurrency.
ACID refers to:
- atomicity ensures that all operations within the work unit are completed successfully. Otherwise, the transaction is aborted at the point of failure and all the previous operations are rolled back to their former state.
- consistency ensures that the database properly changes states upon a successfully committed transaction.
- isolation enables transactions to operate independently of and transparent to each other.
- durability. ensures that the result or effect of a committed transaction persists in case of a system failure.
Query functions A query is a request for data or information from a database table or combination of tables. This data may be generated as results returned by Structured Query Language (SQL) or as pictorials, graphs or complex results, e.g., trend analyses from data-mining tools.
ADD, DELETE UPDATE are all functions that can be performed on the database
UPDATE Query is used to modify the existing records in a table. You can use the WHERE clause with the UPDATE query to update the selected rows, otherwise all the rows would be affected.
Data verification is a way of ensuring the user types in what he or she intends, in other words, to make sure the user does not make a mistake when inputting data. ... Validation is about checking the input data to ensure it conforms with the data requirements of the system to avoid data errors. Data validation is the process of ensuring that data is valid. Data validation rules are used in data validation processes to ensure the validity. The term validity of data mostly denotes the meaningfulness and correctness of the data. Data Validation is done on the original document whereas data verification is done on copies. This is the major difference between data validation and data verification.