Data Warehousing Lab Manual Guide
Data Warehousing Lab Manual Guide
Lab Manual
Year/Sem : III/V
DHAANISH AHMED COLLEGE OF ENGINEERING
Vision
Mission
To produce eminent engineers and managers with academic excellence in their chosen fields,
which would be able to take up the challenges in the modern era and fulfill the expectations of the
organization they join, with moral values and social ethics.
Department of Artificial Intelligence and Data Science
Vision
To impart quality Education, Industry Collaboration, promote Research and produce Graduate
Industry-ready Engineers in the field of Artificial Intelligence and Data Science to serve the society.
Mission
• To provide a conducive learning environment for quality education in the field of Artificial
Intelligence and Data Science.
• To promote industry-institute interaction and collaborative research activities.
• To empower the students with ethical values and social responsibilities in their profession.
• Show proficiency in the knowledge of basic sciences, mathematics, Artificial Intelligence, data
science and statistics to build systems that require management and analysis of large volume
of data.
• Demonstrate technical skills to pursue pioneering research in the field of AI and Data Science
and create disruptive and sustainable solutions for the welfare of ecosystems.
• Exhibit effective communication skills, team work and lead their profession with ethics.
PSO2: Create, select and apply the theoretical knowledge of AI and Analytics along with practical
industrial tools and techniques to manage and solve societal problems.
INDEX
MODELLING
4 OLAP OPERATIONS
APPLICATIONS
6 Knowledge Flow
each
dataset.
Case Study of an Online Analytical
8
Processing (OLAP) Implementation
Aim:
Procedure:
Step 1: Using the Open file ... option under the Preprocess tag select the weather-
[Link] file . When you open the file, your screen looks like as shown here.
This screen tells us several things about the loaded data, which are discussed
further in this chapter.
Step 2: Let us first look at the highlighted Current relation sub window. It shows the
name of the database that is currently loaded. You can infer two points from this sub
window
• There are 14 instances - the number of rows in the table.
• The table contains 5 attributes - the fields, which are discussed
in the upcoming sections.
On the left side, notice the Attributes sub window that displays the various fields in
the database.
Step 3: The weather database contains five fields - outlook, temperature, humidity,
windy and play. When you select an attribute from this list by clicking on it, further
details on the attribute itself are displayed on the right hand side. This will show us
the result.
Let us select the temperature attribute first. When you click on it, you would see the following
screen,and the final result will be like the following page which is
Step 4: In the Selected Attribute subwindow, you can observe the following −
• The name and the type of the attribute are displayed.
• The type for the temperature attribute is Nominal.
• The number of Missing values is zero.
• There are three distinct values with no unique value.
• The table underneath this information shows the nominal values for this field
as hot, mild and cold.
• It also shows the count and weight in terms of a percentage for each
nominal value.
At the bottom of the window, you see the visual representation of the class values.
If you click on the Visualize All button,
you will be able to see all features in one single window a
Many a time, the data that you want to use for model building comes with many
irrelevant fields. For example, the customer database may contain his mobile
number which is relevant in analysing his credit rating.
shown here −
To remove Attribute/s select them and click on the Remove button at the bottom.
The selected attributes would be removed from the database. After you fully
preprocess the data, you can save it for model building. Next, you will learn to
preprocess the data by applying filters on this data.
Some of the machine learning techniques such as association rule mining requires
categorical data. To illustrate the use of filters, we will use weather- [Link]
database that contains two numeric attributes - temperature and humidity.
We will convert these to nominal by applying a filter on our raw data. Click onthe
Choose button in the Filter subwindow and select the following filter −
weka→filters→supervised→attribute→Di scretize
Click on the Apply button and examine the temperature and/or humidity
attribute. You will notice that these have changed from numeric to nominal
types
Let us look into another filter now. Suppose you want to select the best attributes
for decidingthe play. Select and apply the following filter
weka→filters→supervised→attribute→AttributeSelection
You will notice that it removes the temperature and humidity attributes from the
database.
After you are satisfied with the preprocessing of your data, save the data by
clicking the Save ... button. You will use this saved file for model building.
In the next chapter, we will explore the model building using several predefined ML
algorithms.
RESULT:
Thus the demonstration of Weather database in preprocessing the data using weka
tool was implemented successfully.
[Link]
Date: APPLY WEKA TOOL FOR DATA VALIDATION
Aim:
Procedure:
Step 1: Loading the data. We can load the dataset into weka by clicking on open button in
preprocessing interface and selecting the appropriate file.
Step 2: Once the data is loaded, weka will recognize the attributes and during the
scan of the data weka will compute some basic strategies on each attribute. The left
panel in the above figure shows the list of recognized attributes while the top panel
indicates the names of the base relation or table and the current working relation
(which are same initially).
Step 3: Clicking on an attribute in the left panel will show the basic statistics on the
attributes for the categorical attributes the frequency of each attribute value is shown,
while for continuous attributes we can obtain min, max, mean, standard deviation
and deviation etc.,
Step 4: The visualization in the right button panel in the form of cross- tabulation
across two attributes.
c) Click the apply button to apply filter to this data. This will
remove the attribute and create new working relation.
e
x
c
e
l
l
e
n
t
,
n
o
Result:
3
Thus the experiment to illustrates some of the basic data preprocessing
operations
0 that can be performed using WEKA-Explorer was implemented successfully.
-
4
[Link]
Aim:
To Design multi-dimensional data models namely Star, snowflake and Fact
constellation schemas for any one enterprise.
Procedure:
Multi-Dimensional model was developed for implementing data warehouses
& it provides both a mechanism to store data and a way for business analysis. The
primary components of dimensional model are dimensions & facts. There are
different of types of multi- dimensional data models. They are:
[Link] Schema Model
[Link] Flake Schema Model
3. Fact Constellation Model.
Now, we are going to design these multi-dimensional models for the Marketing
enterprise.
First, we need to built the tables in a database through SQL log as shown below.
In the above window, left side navigation bar consists of a database named
as
―sales_dw‖ in which there are six different tables (dimcustdetails, dimcustomer,
dimproduct, dimsalesperson,dimstores, factproductsales) has been created.
In the above window, we are seeing Microsoft Visual Studio before creating
a project In which right side navigation bar contains different options like Data
Sources,Data Source Views,Cubes, Dimensions etc.
By data source views & cubes, we can see our retrieved tables in
multi- dimensionalmodels. We need to add dimensions also through
dimensions [Link] general, Multi- dimensional models consists of
dimension tables & fact tables.
A Star schema model is a join between a fact table and a no. of dimension
tables. Each dimensional table are joined to the fact table using primary key to
foreign key join but dimensional tables are not joined to each other. It is the
simplest style of dataware house schema.
Star schema is a entity relationship diagram of this schema resembles a
star with pointradiating from central table as we seen in the below implemented
window.
Snow Flake Schema:
It is slightly different from star schema in which dimensional tables from a
starschema are organized into a hierarchy by normalizing them.
Snow flake schema is represented by centralized fact table which are
connected tomultiple dimension tables. Snow flake effects only dimension tables not
fact tables.
Fact Constellation Schema:
Result:
Thus to design multi-dimensional data models namely Star, snowflake and Fact
constellation schemas for any one enterprise using WEKA-Explorer was implemented
successfully.
[Link]
Aim:
To perform the various OLAP operations in multidimensional data is grouped into cities
rather than countries using SQL server.
Procedure:
Roll-up (Drill-up):
Pivot (rotate):
The pivot operation is also known as rotation. It rotates the data axes in view in
order to provide an alternative presentation of data.
Now, we are practically implementing all these OLAP Operations using Microsoft
Procedure for OLAP Operations:
1. Open Microsoft Excel, go to Data tab in top & click on ―Existing Connections”.
2. Existing Connections window will be opened, there “Browse for more” option
should be clicked for importing .cub extension file for performing OLAP
Operations. For sample, I took music. cub file.
After Swapping (rotating), we will get resultant as represented below with a pie-chart
forCategory-Classical& Year Wise data.
Result:
Aim:
To design DataWare house for real time applications.
Introduction:
A "Data Warehousing Approach" Driven Labor Market Decision-Making in Kuwait.
Data warehousing projects are frequently side-tracked or derailed completely by
nontechnical factors, Because data warehouses are infrastructure for
sociotechnical systems. Politics and the exercise of power are inherent in data
warehousing projects, and data warehouse designers have to adopt work
practices and methods from non- technical disciplines, think of themselves in new
ways, and employ some fairly sophisticated qualitatively sociological methods in
order to optimize the chances for successful deployment of data warehouses as in
Marc (1997). Technically different people have different definitions for a data
warehouse. The most popular definition came from Inmon (1995) defines a data
warehouse as “ a subject-oriented, integrated, time-variant, and nonvolatile
collection of data in support of management’s decision- making process ”.Kimball
(1996) provided a more concise definition of a data warehouse, states that a data
warehouse is " a copy of transaction data specifically structured for query and
analysis ". Greenfield in (1996) " analysis " with " reporting ". Ahmad and Ahmad
(2010) propose the crucial role of data warehouse for statistical bureau, “ The
development of data warehouse for an organization helps in storing the statistical
information that can be helpful in decision making process later on ”. The strategy
of Information and Communications Technology (ICT) system development in the
Central Statistical Bureau (CSB) is targeting two frameworks: Meta- Information
Framework ensures the systematic use of meta-information inside and outside of
the statistical information system and provides tools for the security of the internal
and external integration of the statistical information system. Information-
Technological Framework to enhance CSB role, improve decision making,
increase productivity and improve access to statistical information. This project
employs the fulfillment of this strategy. Information-Technological Framework:
supports the functionality of individual processes necessary
for the fulfillment of the framework content, also includes tools for: data collection and their
elaboration, implementation of mathematical models and mathematical-statistical methods
and system of confirmation, release and dissemination of data. The paper structured as
following:
section 2 Problem Statement and Context. Section 3 Methodology. Section
4Conclusion closing the paper.
The two most important aspects of demographic development in Kuwait are : The
first is the natural increase of the indigenous population (crude birth and death
rates, infant and child mortality rates, and the distribution of the population by
age) and the country’s fertility policy.
The second is the growth of the foreign population in Kuwait (which from the early
1960s onwards formed the majority of the total population), and the country’s
migration and labor policies as said in Onn (1998). As per the Central Statistical
Bureau records 64% approximately of total population are foreigners (June /2011)
in a small country like Kuwait which raise up the need to develop a labor market
policy- based indicators. A professional review at 2012 conducted for labor market
information (LMI) in Kuwait indicated that no one of relevant Kuwaiti Agencies has
fully integrated data sets for Labor Market and Labor Statistics or the full picture
(i.e. provides quantitative and the qualitative information and intelligence on the
labor market that can assist labor market agents in making informed plans,
choices, and decisions related to their business requirements, career planning,
education and training offerings, job search, recruitment, labor policies and
workforce investment strategies) for the Labor Market Information. A Labor Market
Information System (LMIS) is a set of institutional arrangements, processes, and
tools for the collection, integration, analysis, support to policy formulation, and
dissemination of labor market information. A LMIS will assist in minimizing
information gaps that lead to mismatches and distortions; and to make available
signals of supply and demand for skills to the various stakeholders responsible for
the formulation and implementation of human resource development and
employment policies and programmes, and private sector decisions. The LMIS
would also provide policy makers the ability to:
Field Visit to relevant agencies / Ministries and dedicate the time necessary to
examine – in detail – the data and connectivity methods available at each, in
addition to the database structures and technologies used.
2a: Developing Data warehouse Approach at CSB.
2b: Design data Warehouse.
2c: Design Mapping table, a mapping of Labor Market Data (LMD)for each of the
Agency / Ministry that captures the available data, time series of data, database
technology used, database structure/ER diagrams (to the extent possible),
development applications used, and connectivity options available for each
agency.
3: Labor Information will be stored centrally in the data warehouse and their
release will be done through a dependent Data-mart (including public database).
Result:
Thus the real time application of Data Warehousing Approach" Driven Labor Market
Decision-Making was implemented.
[Link]
Aim:
To analyze and train a model for the given data using Knowledge Flow in Weka.
PROCEDURE:
The Simple CLI provides full access to all Weka classes, i.e., classifiers,
filters, clusterers,etc., but without the hassle of the CLASSPATH (it facilitates
the one, with which Weka was started). It offers a simple Weka shell with
separated command line and output.
Commands
The following commands are available in the Simple CLI:
• Break
• Cls
Clears the output area
Lists the capabilities of the specified class, e.g., for a classifier with its.
• option
• exit
• help [<command>]
Invocation
In order to invoke a Weka class, one has only to prefix the class
with ”java”. This command tells the Simple CLI to load a class and execute it
with any given parameters. E.g., the J48 classifier can be invoked on the iris
dataset with the following command:
java [Link].J48 -t c:/temp/[Link]
This results in the following output:
Command redirection
Starting with this version of Weka one can perform a basic redirection: java
[Link].J48 -t [Link] > [Link]
Note: the > must be preceded and followed by a space, otherwise it is not
recognized as redirection, but part of another parameter.
Command completion
Commands starting with java support completion for classnames and
filenames via Tab (Alt+BackSpace deletes parts of the command again). In
case that there are several matches,Weka lists all possible matches.
• Package Name Completion java [Link]<Tab>
Results in the following output of possible matches
packagenames: Possible matches:
[Link]
rs
[Link]
rs
• Classname completion
Possible matches:
[Link].AdaBoostM1
[Link]
[Link]
• Filename Completion
In order for Weka to determine whether a the string under the cursor is a
classname or a filename, filenames need to be absolute (Unix/Linx:
/some/path/file;Windows: C:\Some\Path\file)or relative and starting with a dot
(Unix/Linux:./some/other/path/file; Windows:
.\Some\Other\Path\file).
Result:
To analyze and train a model for the given data using Knowledge Flow in Weka was
implemented successfully.
[Link]
Aim:
To explore various options in Weka for Preprocessing data and apply (like
Discretization
Filters, Resample filter, etc.) in each dataset.
Procedure:
Preprocess Tab
1. Loading Data
The first four buttons at the top of the preprocess section enable you to load data
into WEKA:
2. Open file Brings up a dialog box allowing you to browse for the data file on the
local file system.
3. Open URL: Asks for a Uniform Resource Locator address for where the data is stored.
4. Open DB: Reads data from a database. (Note that to make this work you might
have to edit the file in weka/experiment/Database [Link].)
5. Generate: Enables you to generate artificial data from a variety of Data Generators.
Using the Open file ...button you can read files in a variety of formats: WEKA’s ARFF
format, CSV format, C4.5 format, or serialized Instances format. ARFF files
typically have [Link] extension, CSV files a .csv extension, C4.5 files a .data and
.names extension, and serialized Instances objects a .bsi extension.
Current Relation: Once some data has been loaded, the Preprocess panel shows a
variety of information. The Current relation box (the “current relation” is the currently loaded
data, which can be interpreted as a single relational table in database terminology) has
three entries:
1. Relation. The name of the relation, as given in the file it was loaded from. Filters
(described below) modify the name of a relation.
Below the Current relation box is a box titled Attributes. There are four
buttons, and beneath them is a list of the attributes in the current relation.
1. No.. A number that identifies the attribute in the order they are specified in the data file.
2. Selection tick boxes. These allow you select which attributes are present in the relation.
3. Name. The name of the attribute, as it was declared in the data file. When you
click on different rows in the list of attributes, the fields change in the box to the right
titled Selected attribute.
This box displays the characteristics of the currently highlighted attribute in the list:
1. Name. The name of the attribute, the same as that given in the attribute list.
4. Missing. The number (and percentage) of instances in the data for which this
attribute is missing (unspecified).
5. Distinct. The number of different values that the data contains for this attribute.
[Link]. The number (and percentage) of instances in the data having a value for this
attribute that no other instances have.
Below these statistics is a list showing more information about the values stored in this
attribute, which differ depending on its type. If the attribute is nominal, the list consists
of each possible value for the attribute along with the number of instances that have that
value. If the attribute is numeric, the list gives four statistics describing the distribution of
values in the data— the minimum, maximum, mean and standard deviation.
And below these statistics there is a coloured histogram, colour-coded according to the
attribute chosen as the Class using the box above the histogram. (This box will bring up
a drop- down list of available selections when clicked.) Note that only nominal Class
attributes will result in a colour-coding. Finally, after pressing the Visualize All button,
histograms for all the attributes in the data are shown in a separate window.
They can be toggled on/off by clicking on them individually. The four buttons above
can also be used to change the selection:
PREPROCESSING
4. Pattern. Enables the user to select attributes based on a Perl 5 Regular Expression.
E.g.,
.* idselects all attributes which name ends with id.
Once the desired attributes have been selected, they can be removed by clicking the
Remove button below the list of attributes. Note that this can be undone by clicking
the Undo button, which is located next to the Edit button in the top-right corner of the
Preprocess panel.
The GenericObjectEditor dialog box lets you configure a filter. The same kind
of dialog box is used to configure other objects, such as classifiers and clusterers.
The fields in the window reflect the available options.
Right-clicking on such a field will bring up a popup menu, listing the following
options:
1. Show properties has the same effect as left-clicking on the field, i.e., a dialog
appears allowing you to alter the settings.
3. Enter configuration is the “receiving” end for configurations that got copied to
theclipboard earlier on. In this dialog you can enter a class name followed by options (if the
class supports these). This also allows you to transfer a filter setting from the Preprocess
panel to a Filtered Classifier used in the Classify panel.
Left-Clicking on any of these gives an opportunity to alter the filters settings. For
example,the setting may take a text string, in which case you type the string into the text
field [Link] it may give a drop-down box listing several states to choose from. Or it
may do something else,depending on the information required.
Information on the options is provided in a tool tip if you let the mouse pointer hover of
the corresponding field. More information on the filter and its options can be obtained by
clicking on the More button in the About panel at the top of the
GenericObjectEditor window.
Applying Filters
Once you have selected and configured a filter, you can apply it to the data by pressing
theApplybutton at the right end of the Filter panel in the Preprocess panel. The
Preprocess panel willthenshow the transformed data. The change can be undone by
pressing the Undo button. You can also use the Edit...button to modify your data
manually in a dataset editor. Finally, the Save... button at the top right of the Preprocess
panel saves the current version of the relation in file formats that can represent
the relation, allowing it to be kept for future use.
➢ Steps for run preprocessing tab in WEKA
Aim:
Procedure:
Information systems (IS) is a relatively new discipline compared with accounting that has a
history of several centuries. However, the two disciplines are closely linked together both in
practice and academic research. Early research work showed the relationship and linkage
between IS and accounting (Firmin and Linn, 1968; Godfrey and Price, 1971; Colantoni et al.,
1971). Since then accounting information systems (AIS) has developed as one of the core
courses in universities' business curriculum. AIS has also become a separate area of research
within the IS domain.
Among the various professions, accountants are the pioneers in the application of
computers and IT to process large volumes of data beginning from the 1960s. Whilst
today's accountants place heavy reliance on computerised AIS to compile information
required by managers, a plethora of IS technologies have emerged over the last two
decades, examples include Decision Support System (DSS), Executive Information
System (EIS), On-line Analytical Processing (OLAP) or
Business Intelligence System, to name a few. In view of the significant impact of IS
development on accounting activities, the traditional roles of accountants are changing
continuously.
The purpose of this paper is to investigate the IS role of accountants and the benefits of
using OLAP to accountants in generating complex management reports. A case study is
conducted and a public sector organization is selected as the research site. The case reports the
implementation of OLAP by the accountants to improve the value of AIS and quality of financial
reporting. It also illustrates that the emerging role of systems accounting is important in
bridging the gap between the accounting and IT specialists in the process of maintaining and
upgrading the AIS
Background:
Previous research has demonstrated that the value of computerised accounting
information is critical to survival and success of a business (Cathey and Phillips,
1994). The performance of the IS function has a direct impact on the performance of
the AIS. However, the measure of IS performance is difficult, if not impossible, and
subject to much controversy. Cases of IS failure in business firms are common place
in the literature (Reimus, 1997).
Different types of managerial decision-making relate to different roles of accounting
(Burchell et al., 1980). The increasing adoption of IT in organisations will changethe
computation and judgment roles to the compromise and inspiration roles (Stambaugh
and Carpenter, 1992). The IT impact on corporate financial reporting is significant,
though its degree and pattern are contingent.
Prior to July 1998, there was much dissatisfaction from users across the entire
organisation with the delivery of IT services. During the period from July 1994 to July 1997, a
new management information system (MIS) encompassing all functional areas, including
finance; payroll; human resources; supply and maintenance, was implemented to deliver
information to all levels of staff. A post- installation review was conducted in September 1997
to highlight any deficiencies of the MIS. It was discovered that the lack of user friendly and
efficient reporting tools did not meet the reporting requirements of ABC. The following are
extracts of comments from the users documented in the review report:
Available reporting tool is complex and difficult to use.
Report generation is slow.
Screens are busy and are costly/difficult to change. Lack of
user friendly tools to extract and download data.
Since July 1997, an interim spreadsheet-based (using Microsoft Excel) solution has been in
operation.
However, the Excel solution had the following problems:
Complexity and logistics of reporting process/systems means less time and commitment
available for quality analysis.
Cannot drill down to the underlying numbers in the reports.
Maintenance of reporting templates and the processes or systems necessary to generate and
update forecasts is slow, complex, not properly documented and reliant on one person.
Systems Solution: OLAP
In view of this reporting dilemma, the corporate finance division (CFD) took an active role to
improve the financial management reporting and supporting systems and technology. In
April 1998,the CFD initiated implementing a more efficient, effective and robust financial
reporting tool using OLAP technology to meet the sophisticated reporting requirements in ABC.
The Systems Accounting Manager (SAM) was in charge of this project.
In August 1998, the OLAP reporting system was fully operational with
improved speed and quality of information available to management. The
structure of the new system is depicted in Figure 1 and the main features of the
system are:
An OLAP database is linked to the MIS storing transactions from different
operational systems.
The OLAP database allows data to be accessed by users on-line with
extremely fast response times. Financial management reporting
templates will be used as a window to the data in the OLAP database
and will allow users to drill down to the underlying data and update
forecasts directly in the OLAP database.
The results of changes to forecasts will be available to all levels of management
immediately (on-line).Systems accountants are responsible for accessing the data from
the OLAP database and generating sophisticated management reports through the
creation of multi-dimensional models.
Figure 1
OLAP
MIS
Databas
Standard Complex
Reports & Reports &
Enquiries Enquiries
Benefits of OLAP to Accountants
The traditional two-dimensional view of the AIS has been replaced by a sophisticated,
real-time multi- dimensional environment. Accountants can now manipulate the
various multi- dimensional cubes to produce top- quality information to management.
At ABC, the benefits of the new reporting tool can be summarised as follows:
Reduce the time and streamline the processes/systems associated with distributing,
analysing, forecasting and consolidating financial management reports periodically.
Provide better tools to analyse data underlying the line items in the monthly financial
management reports more effectively and efficiently.
Reduce the risk associated with undocumented and complex processes and
reliance on one person to maintain the systems that support financial management
reporting.
Streamline and simplify the maintenance of reporting templates and supporting
systems. Provide the potential to realise productivity gains in areas currently using
inefficient reporting tools and
areas responsible for coordinating, staging, distributing and consolidating financial
management information.
Provide opportunities to implement improvements in other areas such as budgeting and
project reporting.
The case has demonstrated that, in light of the failure of the IT function to deliver
satisfactory and quality system solutions to the accounting users, accountants play a critical
role to add value to the AIS through the use of multi- dimensional reporting tools. It is clear that
the "systems accounting" role has emerged in today's organisations having intensive use of IT
and further research is required to develop solid theories.
This role is certainly critical to the survival and success of firms operating in today's
competitive world. While Mouritsen's (1996) five aspects (bookkeeping, consulting, banking,
controlling and administration) of accounting departments' work encompass a wide range of
work performed by accountants, the new systems role seems to complicate the dimensions.
Support and enhance MIS financial modules. Develop and support management
reporting systems using different state-of-the-art tools.
Streamline and improve processes, systems and procedures.
Assist line department plan, design, project, manage and implement systems initiatives.
Specifically, the lessons learned from this case and the directions for future research are:
The IS role of accountants exists but is unclear. Further research is required to investigate
empirical relationship between the accounting and IS functions, especially on
their roles in different types and sizes of organisations.
Accountants can complement any unsatisfactory IS services by having a proactive attitude
towards systems design and development. Further research is necessary in exploring the
the systems accounting role of accountants.
Conclusion
From the empirical evidence of this case study, the image of accountants is no longer bean
counters but value- added internal systems consultants (Baker, 1994). They are even ready to
expand their roles to act as IS planners and managers (Falconer and Hodgett, 1997), or
assume the position of the CIO in managing the IT activities. This indicates that the traditional
role of the accountants as information- gatherers has diminished, rather they are the
information brokers and analysts, also acting as interpreters and advisers in CIS design and
development.
Since IT is now an indispensable weapon for achieving businesses, CFOs have to take an
active role to improve under-performing IT departments. Accountants are becoming more involved
in IS planning, design and implementation. The demands on them to make use of IT to improve
reporting capability are great. The emerging area of systems accounting is important to the
accounting practitioners and academics in the years ahead.
[Link]: 9 CASE STUDY OF AN OLTP IMPLEMENTATION
Date:
Aim:
To do a case study of an OLTP IMPLEMENTATION.
Procedure:
Online transaction processing, or OLTP, refers to a class of systems that
facilitate and manage transaction-oriented applications, typically for data entry and
retrieval transaction processing. The term is somewhat ambiguous; some understand
a "transaction" in the context of computer or database transactions, while others
(such as the Transaction Processing Performance Council) define it in terms of
business or commercial transactions.
OLTP has also been used to refer to processing in which the system responds
immediately to user requests. An automatic teller machine (ATM) for a bank is an
example of a commercial transaction processing application.
definition –
REQUIREMENTS:-
ADVANTAGES OF OLTP:-
Online Transaction Processing (OLTP) has the following advantages:
It provides faster and more accurate forecast for revenues and expenses.
It provides a concrete foundation for a stable organization because of
Timely modification of all transactions.
It makes the transactions much easier on behalf of the customers by allowing them to make the
payments according to their choice.
It broadens the customer base for an organization by simplifying and speeding up individual
processes.
DISADVANTAGES OF OLTP:-
As with any information processing system, security and reliability are considerations.
Online transaction systems are generally more susceptible to direct attack and abuse
than their offline counterparts. When organizations choose to rely on OLTP,
operations can be severely impacted if the transaction system or database is
unavailable due to data corruption, systems failure, or network availability issues.
Additionally, like many modern online information technology solutions, some
systems require offline maintenance which further affects the
cost-benefit analysis.
1. Today, with the ubiquity of the internet, more and more people even from
those remote areas are not doing transactions online through an e-commerce
environment. The term transaction processing is often associated with the
process wherein an online shop or ecommerce website accepts and
processes payments through a customer's credit or debit card in real time in
return for purchased goods and services.
2. During the process of online transactions, a merchant payment system will
automatically connect to the bank or credit card company of the customer and
carry out security, fraud and other checking for validity after which
authorization to take the payment follows. In is strongly advised that when a
company looks for other companies that will handle online transactions and
processing, the company should have a system infrastructure that is robust,
secure and reliable that give customers fast, seamless and secure checkout
time.
4. There are also many OLTP brokering programs which can distribute
transaction processing among multiple computers on a network that can
enhance the functions of an OLTP working on a more demanding
decentralized database system. Service oriented architectures and web
services are now commonly integrated with OLTP.
5. The two main benefits with using OLTP are simplicity and efficiency. OLTP
helps simplify a business operation by reducing paper trails and helping draw
faster and more accurate forecasting for revenues and expenses.
OLTP applications are often used to capture new data or update existing
data. An order-entry system is a typical example of an OLTP application.
The data warehouse and the OLTP data base are both
relational databases. However, the objectives of both these
databases are different.
• The OLTP database records transactions in real time and aims to automate
clerical data entry processes of a business entity.
Optimized for bulk loads and large, Optimized for a common set of
complex, unpredictable queries that transactions, usually adding or
access many rows per table. retrieving a single row at a time per
table.
Supports few
Supports thousands ofconcurrent
concurre
users.
nt users relative to
OLTP
OLAP
OLTP
1. Current data. 1. Current and historical data.
2. Short database transactions . 2. Long database transactions.
3. Short database transactions . 3. Batch update/insert/delete.
4. Normalization is promoted . 4. Denormalization is promoted .
5. High volume transactions . 5. Low volume transactions.
6. Transaction recovery is 6. Transaction recovery is not
necessary. necessary.
Real-time processing requires the master file to be available more often for
updating and reference than batch processing. The database is not accessible all
of the time for batch processing.
Features
Rapid response
Reliability
Inflexibility
A TPS wants every transaction to be processed in the same way regardless of the
user, the customer or the time for day. If a TPS were flexible, there would be too many
opportunities for non-standard operations, for example, a commercial airline needs to
consistently accept airline reservations from a range of travel agents, accepting
different transactions data from different travel agents would be a problem. The
processing in a TPS must support an organization's operations. For example if an
organization allocates roles and responsibilities to
particular employees, then the TPS should enforce and
maintain this requirement. Example : ATM Transaction
ACID test properties: first
definition Atomicity
Consistency
A transaction is a correct transformation of the state. The actions taken as a
group do not violate any of the integrity constraints associated with the state.
This requires that the
transaction be a correct
program![2]Isolation
Durability
Concurrency
Ensures that two users cannot change the same data at the same time.
That is, one user cannot change a piece of data before another user has
finished with it. For example, if an airline ticket agent starts to reserve the
last seat on a flight, then another agent cannot tell another passenger that a
seat is available Storing and retrieving information from a TPS must be
efficient and effective.
. The data are stored in warehouses or other databases,
the system must be well designed for its backup and recovery procedures.
A relational structure.
A hierarchical structure. A network structure.
The following features are included in real time transaction processing systems:
In a TPS, there are 5 different types of files. The TPS uses the files to
store and organize its transaction data:
Master file: Contains information about an organization’s business
situation. Most transactions and databases are stored in the master
file.
Transaction file: It is the collection of transaction records. It helps to
update themaster file and also serves as audit trailsand transaction
history.
Report file: Contains data that has been formatted
forpresentation to auser.
Work file: Temporary files in the system used during theprocessing.
Program file: Contains the instructions for the processing ofdata.
A data warehouse is a database that collects information from different
sources. When it's gathered in real-time transactions it can be used for
analysis efficiently if it's stored in a data warehouse. It provides data that
are consolidated, subject-oriented, historical and read-only:
Recovery process
A TPS may fail for many reasons. These reasons could include a system failure,
human errors, hardware failure, incorrect or invalid data, computer viruses, software
application errors or natural or man- made disasters. As it's not possible to prevent all
TPS failures, a TPS must be able to cope with failures. The TPS must be able to
detect and correct errors when they occur. A TPS will go through a recovery of the
database to cope when the system fails, it involves the backup, journal, checkpoint,
and recovery manager:
Journal:
A journal maintains an audit trail of transactions and database
changes. Transaction logs and Database change logs are used,
a transaction log records all the essential data.
Each transactions, including data values, time of transaction and
Terminal number.
A database changelog contains before and after copies of records that have been
modified by transactions.
• Checkpoint: A checkpoint record contains necessary information to
restart the system. These should be taken frequently, such as several
times an hour. It is possible to resume processing from the most-recent
checkpoint when a failure occurs with only a few minutes of processing
work that needs to be repeated.
Depending on how the system failed, there can be two different recovery
procedures used. Generally, the procedures involves restoring data that
has been collected from a backup device and then running the
transaction processing again. Two types of recovery are backward
recovery and forward recovery:
There are two main types of Back-up Procedures: Grandfather- father-son and
Partial backups:
Grandfather-father-son
Partial
backups
This only occurs when parts of the master file are backed up. The master
file is usually backed up to magnetic tape at regular times, this could be
daily, weekly or monthly. Completed transactions since the last backup are
stored separately and are called journals, or journal files. The master file
can be recreated from the journal files on the backup tape if the system is to
fail.
Updating in a batch
This is used when transactions are recorded on paper (such as bills and
invoices) or when it's being stored on a magnetic tape.
Transactions will be collected and updated as a batch at when it's
convenient or economical to process them. Historically, this was the most
common method as the information technology did not exist to allow real-
time processing.
The two stages in batch processing are:
Processing the data by updating the master file - which can be difficult, this may
involve data additions, updates and deletions that may require to happen in a
certain order. If an error occurs, then the entire batch fails.
Updating in real-time
Updating in real-time uses direct access of data. This occurs when data are
accessed without accessing previous data items. The storage device stores data
in a particular location based on a mathematical procedure. This will then be
calculated to find an approximate location of the data. If data are not found at this
location, it will search through successive locations until it's found. The information
technology used could be a secondary storage medium that can store large
amounts of data and provide quick access (thus the common choice of a magnetic
disk). It requires a user- friendly interface as it's important for rapid response time.
The distinction between wholesale and retail electronic payment systems parallels
the distinction that has evolved in regulating these systems.
This regulatory distinction reflects the kinds of parties involved-rules for retail
electronic payment systems are fashioned with the consumer in mind whereas
rules for wholesale payment systems are fashioned with commercial parties in
mind.
Conclusion:
The term 'electronic payment' is a collective phrase for the many different
kinds of electronic payment methods available (also meaning online
payment), and the processing of transactions and their application within
online merchants and ecommerce websites. It is essential for all online
businesses to be able to accept and process electronic payments in a fast
and secure way. Businesses can gain a significant advantage over their
competitors by providing an instant electronic payment service as it lets
customers pay by their preferred way of credit or debit card. Electronic
payments systems can also increase your cash flow, reduce administrative
costs and labour and provide yet another way for your customers to pay.
Care must be taken when choosing an electronic payment solution as it will
need to fit within the constraints of your particular online business and
integrate seamlessly within your website.