Integrity
Module 4.2
Integrity refers to the overall completeness, accuracy We looked at integrity last year, when we
Database management and consistency of data in a database. Data integrity discussed the concepts of referential integrity
in a database is ensured by implementing principles (in the module on multi-tables). Also refer to
of normalisation and including validation in your Module 4.1 Database design and concepts.
In this module we are going to focus on managing data. We will look at things that we can do to database design.
protect data, how data (and databases) can ‘create opportunities’ to be hacked and, finally, at the
different types of jobs involved in managing and working with data and databases. Access control
Access control involves the use of software settings to control users’ access to different resources. It
also involves control over physical access to equipment. Access control is an essential part of
Managing and protecting data protecting data. You need to make sure that
We all know that data can be very valuable. Keep in mind that the value of data changes according x good password policies are used and enforced
to who is looking at the data, what they use the data for, how large the data set is, the age of the x user rights and privileges are implemented and managed, kept up to date, and are applied to
data, etc. both the file / network level and within the database itself.
x devices are physically secured (e.g. users need passwords to access the device itself)
People keep discovering new ways to extract information from data and new ways to store and x locations are physically secured (e.g. key control, access registers, biometric access control, etc.)
process data. The concept of GIGO (Garbage In, Garbage Out) still applies, though, so managing and
protecting data is becoming increasingly important. We will now look at some of the ways to ensure Logging changes (audit trail)
the integrity (‘correctness) and safety of our data. You have already dealt with some of these
An audit trail is a record or ‘trail’ that is kept to keep track of who made changes to the database and
concepts in grade 11.
when they made those changes. This helps you to figure out what went wrong when errors occur –
Validation or, if the database is ever hacked, it helps to identify which user name was used or compromised.
Data validation is the process of ensuring that data meets a set of requirements, (a defined set of To create an audit trail a whole extra set of tables is required to store actions such as inserting,
criteria or allowable values) – e.g. that it is a whole number, that it is a string of no more than 10 deleting or editing data, the time and date the changes were made, as well as who made them. In
letters, that it is only ‘M’ or ‘F’, etc. We can get the software to help us with data validation by using extreme cases, the old and new versions of field values are also stored, so that the exact nature of
existing features or building checking processes into our software. the change can be seen.
Keeping an audit trail slows down the performance of the database as a whole, but is very useful
Verification
when data is extremely sensitive (e.g. a database of financial transactions).
Checking for correctness – verifying data – is something that computers can never really do. When
the computer asks you to fill in your date of birth or gender it can only check that you are not Parallel data sets (mirroring)
entering totally invalid data such as a ‘C’ for your gender, instead of ‘M’ or ‘F’. The computer,
This technique involves keeping multiple copies of data (hence the term mirroring). It means that
however, has no way of ‘knowing’ or verifying whether your data is accurate or not (you may be
you need to, at minimum, double the storage requirements for your system (and if you have more
lying, after all).
than one mirror, then you need to multiply this by as many copies as there are of the data). Doing
Verification therefore involves a person checking data for errors and inconsistencies. It might involve this means that, even if one set of data fails, the parallel set can be used immediately, resulting in no
printing reports that are checked manually, or using algorithms that compare the data in a source system failure or down time. From a safety and security perspective, parallel data sets work best if
and a destination table after the data has been moved. Verification is built into some types of they are kept in separate locations.
systems – for example, accountants use a double entry system (record every transaction twice as
either an income or an expenditure in different accounts) to check that all transactions are properly
recorded.
Data verification is a time-consuming and costly process, and, as such, is only used when it is
essential to check that the data is correct.
1 2
Warehousing An example of using data as a hacking tool is SQL Injection attacks.
Data collects quickly. Think of a bank. It has hundreds of thousands of customers, many of whom do When an application is powered by a database it inevitably makes use of SQL. All SQL takes the form
multiple transactions a day. This quickly adds up to a database containing millions of transactions – of a string that sends instructions to the database – e.g. 'Select * from Invoices where CustID =
and that is for one month only. Part of keeping a database streamlined and functioning well is to "JQ351" '. This string gets the database to search the Invoices table for all invoices connected to a
manage its size. In the case of the bank, the old records (say transactions of 3 months ago) are of specific customer ID.
historical importance but are definitely not needed for daily use. Keeping them in the database
makes the system large and slow. The question becomes what to do with them. The thing is, no program will ever use completely fixed queries such as the example above. Instead,
programmers use string handling and user input to generate SQL queries on the go. The program
One (very bad) solution is to simply delete the old records. Sure it will speed up your database and contains the fixed statement 'Select * from Invoices where CustID = "' and then gets the actual
make it more efficient, but it is irresponsible and is definitely not managing data in a meaningful customer ID value from user input. The program code joins the fixed statement and the user input
way! together into the final SQL statement that is sent off to the database.
Simply backing the data up and then deleting the old records from the current operational database A hacker using the program can actually type in an SQL statement into the data entry field and so get
is also an option – but not a very productive one. It is what was done before the concept of data the database to do things the program does not expect it to do. Here’s an example of one at work…
warehousing was developed.
x Instead of typing in JQ351 (or any valid customer ID), the hacker types in " or "1" = "1" .
A third option – data warehousing – takes records from the current database modifies them into a x The result is an SQL statement which reads as follows:
new format and then stores them in a different type of database optimised for ‘mining’ information 'Select * from Invoices where CustID = " " or "1" = "1"'.
from a huge collection of data. The old, unneeded records are then deleted from the current
x This statement actually asks the database to list ALL invoices (the or "1" = "1" part of the new
operational database, which speeds it up and reduces its size. Warehousing allows you to not only
SQL statement will always evaluate as true for all records and so the query will list all records in
keep a copy of old records as you would in a backup, it also makes that old data valuable in a new
the table.
way.
A clever hacker who knows SQL well – and knows your DBMS well will know all sorts of tricks to be
Warehousing is a topic on its own and we will cover it in more detail in Module 4.3.
able to get access to (or even change) data in the
database that they should not be able to gain access to! SQL Injection and the Web
Data as a hacking tool We are not going to spend a lot of time looking at ways
Most dynamic websites use a
database to store the data used to
to stop SQL injection attacks, except to point out that
Did you know that data can be used as a hacking tool? This is especially true during the input phase create the web page that the user
good data validation structures make them far more
of any computer system. ends up seeing. Just like a program
difficult to carry out. For example, it is difficult to ‘inject
running on your local computer, the
During input, a program receives data from the user, external devices and sensors, communications SQL’ if the data entry field in the program is
data entry fields on a web page can
devices or storage. The programmer of the system usually expects this data to be in a certain format. x limited to the length of the field in the database
be used to send SQL commands to
The data used to hack a computer is usually invalid or false, i.e. not the format or type of data that x strictly typed (e.g. no letters when a number is the database and so carry out an SQL
the program expects to have to deal with. If a hacker knows how a system works it is not difficult to being entered) Injection attack. This is even true for
feed the system data that can cause it to crash. E.g. if you know a number will be used in a division x checked to only allow legal characters (e.g. a the user name and password fields
calculation in the program and you enter the value 0 then you know you have a chance to make the customer ID only contains uppercase letters and that you use to logon to the site!
program either crash or behave unpredictably. numbers – if you check that only these characters
have been entered then the SQL Injection containing ‘ and = becomes invalid data and is
In fact many hackers again unauthorised access to networks by using ‘bugs’ in the design of the
rejected).
system software (such an operating system or web browser) or an application program such an SQL
database program. Every program has rights (often based on the user who initiated it) which are
granted by the operating system. If the program has a high level of rights of permissions and the
program crashes, it can be possible for a hacker to gain access to the computer or network the
program was running on.
3 4
b) When your friend exited the table, the following message appeared:
Database management – Who does what?
Large-scale business databases are usually very complex. The number of people required to manage
and support a database depends on the size and complexity of the database. The minimum
requirement tends to be a database administrator and programmers.
Database administrator (DBA) Briefly explain what the concept of data integrity refers to
3. What is the process of ‘checking for correctness’ called and why does it always involve a
The DBA is responsible for
person?
x designing the database (i.e. deciding on tables, fields in records, keys, relationships, etc.)
x security (setting up and enforcing user rights, encryption, etc.) 4. Verification involves a person checking data for errors and inconsistencies.
x backup and restoration plans and policies Give TWO ways in which this might be done.
x monitoring of the database performance (disk space used, memory used, average transaction 5. Name TWO general ways in which data integrity can be ensured in a database.
time) to determine if the database is out-growing the hardware that it runs on 6. Access control is an essential part of protecting data.
x maintenance. List TWO methods of controlling access to data.
In most organisations the database is accessed by multiple divisions, each of which may have their 7. Audit trails are often used together with databases.
own separate program, website or web application that makes use of the database. In these cases Explain what an audit trail is, what it is used for, as well as how it is used
the DBA needs to be a senior management position that ensures that there is consistency and 8. A parallel data set is a term associated with databases.
stability in the database across all divisions. In large organisations, the DBA is supported by
a) Explain what a parallel data set is and what it is used for.
specialised people tasked with specific database support functions (e.g. backup and restoration).
b) Give the main disadvantage of implementing parallel data sets.
Programmers c) Why should parallel data sets be kept in separate locations?
9. In large database systems thousands of records get added and deleted on a daily basis. The
Just having a database is not enough. You need to have software that allows your employees or
deleted records can be kept on the system and just be indicated or ‘flagged’ as deleted.
customers to work with the database. In most cases this means that custom software needs to be
written (or websites / web applications created). It is the programmers task to develop the software a) Give TWO reasons why a company might want to remove deleted records from their current
– but work within the rules and limitations set by the DBA in terms of the database structure. In operational database.
many cases a business may have multiple applications using the same database simultaneously. Each b) Give ONE reason why a large sales company would not just simply delete old records from
programmer cannot simply create and define tables as they like as this would result in chaos and a its computer systems.
lack of consistency within the organisation. 10. Hackers frequently use ‘invalid or ‘unexpected’ data to cause programs to crash in order to hack
computer systems.
As with the DBA, in most cases a team of programmers work on developing the software, resulting in
a division of labour and responsibility. a) Give TWO examples of data that could be used to achieve this.
b) Briefly explain why causing a program to crash can potentially be used as a hacking tool.
Activity 11. Briefly explain what an SQL Injection attack is, how it is executed in broad terms and why it is so
dangerous.
1. The relative ‘value’ of data can change according to value to who is looking at the data.
12. Your father’s firm needs to appoint a database administrator (DBA) to maintain their DBMS.
Name TWO other factors that can influence the ‘value’ of data.
Give FOUR key responsibilities of the DBA that the firm should place in the advert for this
2. Your friend made changes to the design of a database table in Access by adding a validation
position.
rule.
13. Just having a database is not enough. You need to have software that allows your employees or
a) Briefly explain what data validation is in general by referring to a suitable example.
customers to work with the database.
Would this generally be the responsibility of the database administrator (DBA)?
Briefly motivate your answer.
5 6