0% found this document useful (0 votes)
8 views60 pages

Data Warehousing Lab Manual Guide

The document is a lab manual for the Data Warehousing Laboratory course at Dhaanish Ahmed College of Engineering, detailing the vision and mission of the institution and the Department of Artificial Intelligence & Data Science. It outlines the program's educational objectives and specific outcomes, along with a series of experiments using the WEKA tool for data processing, validation, and OLAP operations. The manual includes step-by-step procedures for various data modeling techniques and OLAP operations, emphasizing the practical application of theoretical concepts in data science.

Uploaded by

Manimaran S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views60 pages

Data Warehousing Lab Manual Guide

The document is a lab manual for the Data Warehousing Laboratory course at Dhaanish Ahmed College of Engineering, detailing the vision and mission of the institution and the Department of Artificial Intelligence & Data Science. It outlines the program's educational objectives and specific outcomes, along with a series of experiments using the WEKA tool for data processing, validation, and OLAP operations. The manual includes step-by-step procedures for various data modeling techniques and OLAP operations, emphasizing the practical application of theoretical concepts in data science.

Uploaded by

Manimaran S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

DHAANISH AHMED COLLEGE OF ENGINEERING

Dhaanish Nagar, Padappai, Chennai – 601301


Approved By AICTE, New Delhi,
Affiliated to Anna University, Chennai.
[Link]

Department of Artificial Intelligence & Data Science

Lab Manual

CCS334 – Data warehousing Laboratory

Year/Sem : III/V
DHAANISH AHMED COLLEGE OF ENGINEERING

Vision

To establish a world-class institution that is recognized as a “Centre of Excellence” offering


education and research in engineering, technology and management with a blend of social and
moral values to serve the community with a futuristic perspective.

Mission

To produce eminent engineers and managers with academic excellence in their chosen fields,
which would be able to take up the challenges in the modern era and fulfill the expectations of the
organization they join, with moral values and social ethics.
Department of Artificial Intelligence and Data Science

Vision

To impart quality Education, Industry Collaboration, promote Research and produce Graduate
Industry-ready Engineers in the field of Artificial Intelligence and Data Science to serve the society.

Mission

• To provide a conducive learning environment for quality education in the field of Artificial
Intelligence and Data Science.
• To promote industry-institute interaction and collaborative research activities.
• To empower the students with ethical values and social responsibilities in their profession.

PROGRAMME EDUCATIONAL OBJECTIVES (PEOs)

• Show proficiency in the knowledge of basic sciences, mathematics, Artificial Intelligence, data
science and statistics to build systems that require management and analysis of large volume
of data.

• Demonstrate technical skills to pursue pioneering research in the field of AI and Data Science
and create disruptive and sustainable solutions for the welfare of ecosystems.

• Exhibit effective communication skills, team work and lead their profession with ethics.

Program Specific Outcome (PSO)


PSO1: Evolve AI based efficient domain specific processes for Effective decision making in several
domains such as business and governance domains.

PSO2: Create, select and apply the theoretical knowledge of AI and Analytics along with practical
industrial tools and techniques to manage and solve societal problems.
INDEX

S. No. Date Name of the Experiment Page Marks Sign


No.
1 PROCESSING THE DATA USING
WEKA
2 APPLY WEKA TOOL FOR DATA
VALIDATION
3 ANALYSE THE DIMENSIONAL
DATA

MODELLING
4 OLAP OPERATIONS

DESIGN DATAWAREHOUSE FOR


5 REAL TIME

APPLICATIONS
6 Knowledge Flow

Explore various options in Weka for

7 Preprocessing data and apply in

each

dataset.
Case Study of an Online Analytical
8
Processing (OLAP) Implementation

CASE STUDY OF AN OLTP


9
IMPLEMENTATION
[Link]

Date: PROCESSING THE DATA USING WEKA

Aim:

To demonstrate the available features in preprocessing, we will use


the Weather database that is provided in the installation.

Procedure:

Step 1: Using the Open file ... option under the Preprocess tag select the weather-
[Link] file . When you open the file, your screen looks like as shown here.

This screen tells us several things about the loaded data, which are discussed
further in this chapter.

Step 2: Let us first look at the highlighted Current relation sub window. It shows the
name of the database that is currently loaded. You can infer two points from this sub
window
• There are 14 instances - the number of rows in the table.
• The table contains 5 attributes - the fields, which are discussed
in the upcoming sections.
On the left side, notice the Attributes sub window that displays the various fields in
the database.
Step 3: The weather database contains five fields - outlook, temperature, humidity,
windy and play. When you select an attribute from this list by clicking on it, further
details on the attribute itself are displayed on the right hand side. This will show us
the result.
Let us select the temperature attribute first. When you click on it, you would see the following
screen,and the final result will be like the following page which is

Step 4: In the Selected Attribute subwindow, you can observe the following −
• The name and the type of the attribute are displayed.
• The type for the temperature attribute is Nominal.
• The number of Missing values is zero.
• There are three distinct values with no unique value.
• The table underneath this information shows the nominal values for this field
as hot, mild and cold.
• It also shows the count and weight in terms of a percentage for each
nominal value.
At the bottom of the window, you see the visual representation of the class values.
If you click on the Visualize All button,
you will be able to see all features in one single window a

Step 5: Removing Attributes

Many a time, the data that you want to use for model building comes with many
irrelevant fields. For example, the customer database may contain his mobile
number which is relevant in analysing his credit rating.
shown here −
To remove Attribute/s select them and click on the Remove button at the bottom.
The selected attributes would be removed from the database. After you fully
preprocess the data, you can save it for model building. Next, you will learn to
preprocess the data by applying filters on this data.

Step 6: Applying Filters

Some of the machine learning techniques such as association rule mining requires
categorical data. To illustrate the use of filters, we will use weather- [Link]
database that contains two numeric attributes - temperature and humidity.
We will convert these to nominal by applying a filter on our raw data. Click onthe
Choose button in the Filter subwindow and select the following filter −
weka→filters→supervised→attribute→Di scretize

Click on the Apply button and examine the temperature and/or humidity
attribute. You will notice that these have changed from numeric to nominal
types

Let us look into another filter now. Suppose you want to select the best attributes
for decidingthe play. Select and apply the following filter

weka→filters→supervised→attribute→AttributeSelection

You will notice that it removes the temperature and humidity attributes from the
database.
After you are satisfied with the preprocessing of your data, save the data by
clicking the Save ... button. You will use this saved file for model building.
In the next chapter, we will explore the model building using several predefined ML
algorithms.

RESULT:

Thus the demonstration of Weather database in preprocessing the data using weka
tool was implemented successfully.
[Link]
Date: APPLY WEKA TOOL FOR DATA VALIDATION

Aim:

To experiment some of the basic data preprocessing operations that can be


performed using WEKA-Explorer. The sample dataset used for this example is the
student data available in arff format.

Procedure:
Step 1: Loading the data. We can load the dataset into weka by clicking on open button in
preprocessing interface and selecting the appropriate file.

Step 2: Once the data is loaded, weka will recognize the attributes and during the
scan of the data weka will compute some basic strategies on each attribute. The left
panel in the above figure shows the list of recognized attributes while the top panel
indicates the names of the base relation or table and the current working relation
(which are same initially).

Step 3: Clicking on an attribute in the left panel will show the basic statistics on the
attributes for the categorical attributes the frequency of each attribute value is shown,
while for continuous attributes we can obtain min, max, mean, standard deviation
and deviation etc.,

Step 4: The visualization in the right button panel in the form of cross- tabulation
across two attributes.

Step 5: Removing an attribute-When we need to remove an attribute, we can do


this by using the attribute filters in weka. In the filter model panel,click on choose
button,This will show a popup window with a list of available filters.
Scroll down the list and select the “[Link]” filters.

Step 6: a)Next click the textbox immediately to the right of the


choose button. In the resulting dialog box enter the index of the
attribute to be filtered out.

b) Make sure that invert selection option is set to false. The


click OK now in the filter box. You will see “Remove-R-7”.

c) Click the apply button to apply filter to this data. This will
remove the attribute and create new working relation.

d) Save the new working relation as an arff


file by clicking save button on the
top(button)panel.([Link])
Discretization:

1) Sometimes association rule mining can only be performed on


categorical data. This requires performing discretization on
numeric or continuous attributes. In the following example let
us discretize age attribute.

2) Let us divide the values of age attribute into three bins(intervals).


3) First load the dataset into weka ([Link])
4) Select the age attribute.
5) Activate filter-dialog box and select
“[Link]”from the list.

6) To change the defaults for the filters, click on the


box immediately to the right of htcho o se button.
7) We enter the index for the attribute to be discretized. In this case
the attribute isage. So we must enter ‘1’ corresponding to the age
attribute.
8) Enter ‘3’ as the number of bins. Leave the remaining field values as they are.
9) € Click OK button.
10) Click apply in the filter [Link] will result in a new
workingrelation with the selected attribute partition into 3 bins.
11) Save the new working relation in a file called [Link].

DATASET STUDENT .ARFF:


@relation student
@attribute age {<30,30-40,>40}
@attribute income {low, medium,
high}@attribute student {yes, no}
@attribute credit-rating {fair,
excellent}@attribute buyspc
{yes, no} @data
%
<30, high, no, fair, no
<30, high, no, excellent, no 30-40, high, no, fair, yes

>40, medium, no, fair, yes

>40, low, yes, fair, yes

>40, low, yes, excellent, no


0, low, yes,
>
excellent, yes
4
<30, medium, no, fair, no
0
<30, low, yes, fair, no
,
>40, medium, yes, fair, yes
<30, medium, yes,
l
excellent, yes
o
30-40,
w
medium, no,
,
excellent, yes
30-40, high,
y
yes, fair, yes
e
>40, medium, no, excellent, no
s
%
,
The following screenshot shows the effect of discretization

e
x
c
e
l
l
e
n
t
,

n
o

Result:
3
Thus the experiment to illustrates some of the basic data preprocessing
operations
0 that can be performed using WEKA-Explorer was implemented successfully.

-
4
[Link]

Date: ANALYSE THE DIMENSIONAL DATA


MODELLING

Aim:
To Design multi-dimensional data models namely Star, snowflake and Fact
constellation schemas for any one enterprise.

Procedure:
Multi-Dimensional model was developed for implementing data warehouses
& it provides both a mechanism to store data and a way for business analysis. The
primary components of dimensional model are dimensions & facts. There are
different of types of multi- dimensional data models. They are:
[Link] Schema Model
[Link] Flake Schema Model
3. Fact Constellation Model.

Now, we are going to design these multi-dimensional models for the Marketing
enterprise.
First, we need to built the tables in a database through SQL log as shown below.
In the above window, left side navigation bar consists of a database named
as
―sales_dw‖ in which there are six different tables (dimcustdetails, dimcustomer,
dimproduct, dimsalesperson,dimstores, factproductsales) has been created.

In the above window, we are seeing Microsoft Visual Studio before creating
a project In which right side navigation bar contains different options like Data
Sources,Data Source Views,Cubes, Dimensions etc.

Through Data Sources, we can connect to our MySQL database named as


“sales_dw”.
Then, automatically all the tables in that database will be retrieved to this tool
forcreating multi-dimensional models.

By data source views & cubes, we can see our retrieved tables in
multi- dimensionalmodels. We need to add dimensions also through
dimensions [Link] general, Multi- dimensional models consists of
dimension tables & fact tables.

Star Schema Model:

A Star schema model is a join between a fact table and a no. of dimension
tables. Each dimensional table are joined to the fact table using primary key to
foreign key join but dimensional tables are not joined to each other. It is the
simplest style of dataware house schema.
Star schema is a entity relationship diagram of this schema resembles a
star with pointradiating from central table as we seen in the below implemented
window.
Snow Flake Schema:
It is slightly different from star schema in which dimensional tables from a
starschema are organized into a hierarchy by normalizing them.
Snow flake schema is represented by centralized fact table which are
connected tomultiple dimension tables. Snow flake effects only dimension tables not
fact tables.
Fact Constellation Schema:

Fact Constellation is a set of fact tables that share some dimension


tables. In this schema there are two or more fact tables. We developed fact
constellation in visual studio as shown below. Fact tables are labelled in yellow
color.

Result:

Thus to design multi-dimensional data models namely Star, snowflake and Fact
constellation schemas for any one enterprise using WEKA-Explorer was implemented
successfully.
[Link]

Date: OLAP OPERATIONS

Aim:
To perform the various OLAP operations in multidimensional data is grouped into cities
rather than countries using SQL server.
Procedure:

Roll-up (Drill-up):

Roll-up performs aggregation on a data cube in any of the following ways


• By climbing up a concept hierarchy for a dimension
• By dimension reduction
• Roll-up is performed by climbing up a concept hierarchy for the dimension location.
• Initially the concept hierarchy was "street < city < province < country".
• On rolling up, the data is aggregated by ascending the location
hierarchy from the level of city to the level of country.
When roll-up is performed, one or more dimensions from the data
cube are removed.
Drill-down:
Drill-down is the reverse operation of roll-up. It is performed by either of the following ways
• By stepping down a concept hierarchy for a dimension
• By introducing a new dimension.
• Drill-down is performed by stepping down a concept hierarchy
for the dimension time.
• Initially the concept hierarchy was "day < month < quarter < year."
• On drilling down, the time dimension is descended from the
level of quarter to the level of month.
• When drill-down is performed, one or more dimensions from the data
cube are added.
• It navigates the data from less detailed data to highly detailed data.
Slice:
The slice operation selects one particular dimension from a given cube and provides a
new sub-cube.
Dice:
Dice selects two or more dimensions from a given cube and provides a new sub-cube.

Pivot (rotate):
The pivot operation is also known as rotation. It rotates the data axes in view in
order to provide an alternative presentation of data.
Now, we are practically implementing all these OLAP Operations using Microsoft
Procedure for OLAP Operations:

1. Open Microsoft Excel, go to Data tab in top & click on ―Existing Connections”.
2. Existing Connections window will be opened, there “Browse for more” option
should be clicked for importing .cub extension file for performing OLAP
Operations. For sample, I took music. cub file.

3. As shown in above window, select ―PivotTable Report” and click “OK”.

4. We got all the [Link] data for analyzing different OLAP


Operations. Firstly, we performed
drill-down operation as shown below.

In the above window, we selected year „2008‟ in „Electronic‟ Category, then


automatically the Drill-Down option is enabled on top navigation options. We will
click on.
In the above window, we selected year „2008‟ in „Electronic‟ Category, then
automatically the Drill-Down option is enabled on top navigation options.
We will click on
„Drill-Down‟ option, then the below window will be displayed.

5. Now we are going to perform roll-up (drill-up) operation, in the above


window I selected January month then automatically Drill-up option is enabled
on top. We will click on Drill-up option, then the below window will be displayed.

6. Next OLAP operation Slicing is performed by inserting slicer as


shown in top navigation options.
While inserting slicers for slicing operation, we select 2 Dimensions (for e.g.
Category Name & Year) only with one Measure (for e.g. Sum of sales).After
inserting a slice& adding a filter (Category Name: AVANT ROCK & BIG BAND;
Year: 2009 & 2010), we will gett able as shown below.

[Link] operation is similar to Slicing operation. Here we are selecting 3


dimensions (CategoryName, Year, RegionCode)& 2 Measures (Sum of
Quantity, Sum of Sales) through „insert slicer‟ option. After that adding a filter
for CategoryName, Year& RegionCode asshown below.
[Link], the Pivot (rotate) OLAP operation is performed by swapping rows
(Order Date-Year)& columns (Values-Sum of Quantity & Sum of Sales)
through right side bottom navigation bar as shown below.

After Swapping (rotating), we will get resultant as represented below with a pie-chart
forCategory-Classical& Year Wise data.

Result:

Thus the various OLAP operations in multidimensional data is grouped into


cities rather thancountries using SQL server was implemented successfully.
[Link]

Date: DESIGN DATAWAREHOUSE FOR REAL TIME APPLICATIONS

Aim:
To design DataWare house for real time applications.

Introduction:
A "Data Warehousing Approach" Driven Labor Market Decision-Making in Kuwait.
Data warehousing projects are frequently side-tracked or derailed completely by
nontechnical factors, Because data warehouses are infrastructure for
sociotechnical systems. Politics and the exercise of power are inherent in data
warehousing projects, and data warehouse designers have to adopt work
practices and methods from non- technical disciplines, think of themselves in new
ways, and employ some fairly sophisticated qualitatively sociological methods in
order to optimize the chances for successful deployment of data warehouses as in
Marc (1997). Technically different people have different definitions for a data
warehouse. The most popular definition came from Inmon (1995) defines a data
warehouse as “ a subject-oriented, integrated, time-variant, and nonvolatile
collection of data in support of management’s decision- making process ”.Kimball
(1996) provided a more concise definition of a data warehouse, states that a data
warehouse is " a copy of transaction data specifically structured for query and
analysis ". Greenfield in (1996) " analysis " with " reporting ". Ahmad and Ahmad
(2010) propose the crucial role of data warehouse for statistical bureau, “ The
development of data warehouse for an organization helps in storing the statistical
information that can be helpful in decision making process later on ”. The strategy
of Information and Communications Technology (ICT) system development in the
Central Statistical Bureau (CSB) is targeting two frameworks: Meta- Information
Framework ensures the systematic use of meta-information inside and outside of
the statistical information system and provides tools for the security of the internal
and external integration of the statistical information system. Information-
Technological Framework to enhance CSB role, improve decision making,
increase productivity and improve access to statistical information. This project
employs the fulfillment of this strategy. Information-Technological Framework:
supports the functionality of individual processes necessary
for the fulfillment of the framework content, also includes tools for: data collection and their
elaboration, implementation of mathematical models and mathematical-statistical methods
and system of confirmation, release and dissemination of data. The paper structured as
following:
section 2 Problem Statement and Context. Section 3 Methodology. Section
4Conclusion closing the paper.

PROBLEM STATEMENT AND CONTEXT :


Development and Sustainability is what we all do in attempting to improve our lot
and measures impacts or success in relation to three dimensions : environment,
social systems, and economy as in UNDP (2011) on one hand. On the other hand,
the demographic and socio-economic trends in the GCC (Gulf Cooperation
Council) countries are crucial issues.

The two most important aspects of demographic development in Kuwait are : The
first is the natural increase of the indigenous population (crude birth and death
rates, infant and child mortality rates, and the distribution of the population by
age) and the country’s fertility policy.

The second is the growth of the foreign population in Kuwait (which from the early
1960s onwards formed the majority of the total population), and the country’s
migration and labor policies as said in Onn (1998). As per the Central Statistical
Bureau records 64% approximately of total population are foreigners (June /2011)
in a small country like Kuwait which raise up the need to develop a labor market
policy- based indicators. A professional review at 2012 conducted for labor market
information (LMI) in Kuwait indicated that no one of relevant Kuwaiti Agencies has
fully integrated data sets for Labor Market and Labor Statistics or the full picture
(i.e. provides quantitative and the qualitative information and intelligence on the
labor market that can assist labor market agents in making informed plans,
choices, and decisions related to their business requirements, career planning,
education and training offerings, job search, recruitment, labor policies and
workforce investment strategies) for the Labor Market Information. A Labor Market
Information System (LMIS) is a set of institutional arrangements, processes, and
tools for the collection, integration, analysis, support to policy formulation, and
dissemination of labor market information. A LMIS will assist in minimizing
information gaps that lead to mismatches and distortions; and to make available
signals of supply and demand for skills to the various stakeholders responsible for
the formulation and implementation of human resource development and
employment policies and programmes, and private sector decisions. The LMIS
would also provide policy makers the ability to:

(i) monitor the short term developments in the labor market,


(ii) labor market patterns, the impact by events/crisis, changes in
unemploymentratessystemscanrespond;
(iii) develop targeted and evidence-based policy interventions; and
(iv) (iv) assess the impact of policies on specific groups
– i.e. nationals, expatriate workers, youth, women Methodology:

Practically we develop our methodology which consist of the following steps;


1a:Labor Market Situation Analysis: Professional questionnaire used to
thoroughly analyze all LMIS-related labor market databases Kuwait-wide. 1b:

Field Visit to relevant agencies / Ministries and dedicate the time necessary to
examine – in detail – the data and connectivity methods available at each, in
addition to the database structures and technologies used.
2a: Developing Data warehouse Approach at CSB.
2b: Design data Warehouse.
2c: Design Mapping table, a mapping of Labor Market Data (LMD)for each of the
Agency / Ministry that captures the available data, time series of data, database
technology used, database structure/ER diagrams (to the extent possible),
development applications used, and connectivity options available for each
agency.
3: Labor Information will be stored centrally in the data warehouse and their
release will be done through a dependent Data-mart (including public database).

Result:
Thus the real time application of Data Warehousing Approach" Driven Labor Market
Decision-Making was implemented.
[Link]

Date: KNOWLEDGE FLOW

Aim:
To analyze and train a model for the given data using Knowledge Flow in Weka.

PROCEDURE:

The Knowledge Flow provides an alternative to the Explorer as a graphical


front end to
WEKA’s core algorithms.
The Knowledge Flow presents a data-flow inspired interface to WEKA. The user
can select WEKA components from a palette, place them on a layout canvas and connect
them together in order to form a knowledge flow for processing and analyzing data. At
present, all of WEKA’s classifiers, filters, clusterers, associators, loaders and savers
are available in the Knowledge Flowalong with some extra tools.
The Knowledge Flow can handle data either incrementally or in batches (the Explorer handles
batch data only). Of course learning from data incrementally requires a classifier that can be
updated.

Currently in WEKA there are ten classifiers that can


handle data incrementally. The Knowledge Flow offers the
following features:
• Intuitive data flow style layout.
• Process data in batches or incrementally.
• Process multiple batches or streams in parallel (each separate flow executes
in its own thread) .
• Process multiple streams sequentially via a user-specified order of execution.
• Chain filters together.
• View models produced by classifiers for each fold in a cross validation.
• Visualize performance of incremental classifiers during processing
(scrolling plots of classification accuracy, RMS error, predictions etc.).
• Plugin “perspectives” that add major new functionality (e.g. 3D data
visualization, time series forecasting environment etc.).
1. Simple CLI

The Simple CLI provides full access to all Weka classes, i.e., classifiers,
filters, clusterers,etc., but without the hassle of the CLASSPATH (it facilitates
the one, with which Weka was started). It offers a simple Weka shell with
separated command line and output.
Commands
The following commands are available in the Simple CLI:

• Java <classname> [<args>]

Invokes a java class with the given arguments (if any).

• Break

Stops the current thread, e.g., a running classifier, in a friendly manner


kill stops the current thread in an unfriendly fashion.

• Cls
Clears the output area

• Capabilities <classname> [<args>]

Lists the capabilities of the specified class, e.g., for a classifier with its.

• option

Capabilities [Link] -W [Link].Id3

• exit

Exits the Simple CLI

• help [<command>]

Provides an overview of the available commands if without a


command name as argument, otherwise more help on the
specified command

Invocation
In order to invoke a Weka class, one has only to prefix the class
with ”java”. This command tells the Simple CLI to load a class and execute it
with any given parameters. E.g., the J48 classifier can be invoked on the iris
dataset with the following command:
java [Link].J48 -t c:/temp/[Link]
This results in the following output:
Command redirection
Starting with this version of Weka one can perform a basic redirection: java
[Link].J48 -t [Link] > [Link]
Note: the > must be preceded and followed by a space, otherwise it is not
recognized as redirection, but part of another parameter.

Command completion
Commands starting with java support completion for classnames and
filenames via Tab (Alt+BackSpace deletes parts of the command again). In
case that there are several matches,Weka lists all possible matches.
• Package Name Completion java [Link]<Tab>
Results in the following output of possible matches
packagenames: Possible matches:
[Link]
rs
[Link]
rs
• Classname completion

java [Link].A<Tab> lists the following classes

Possible matches:
[Link].AdaBoostM1
[Link]
[Link]

• Filename Completion

In order for Weka to determine whether a the string under the cursor is a
classname or a filename, filenames need to be absolute (Unix/Linx:
/some/path/file;Windows: C:\Some\Path\file)or relative and starting with a dot
(Unix/Linux:./some/other/path/file; Windows:
.\Some\Other\Path\file).

Result:
To analyze and train a model for the given data using Knowledge Flow in Weka was
implemented successfully.
[Link]

Date: EXPLORE VARIOUS OPTIONS IN WEKA FOR


PREPROCESSING DATA AND APPLY IN
EACH DATASET.

Aim:
To explore various options in Weka for Preprocessing data and apply (like
Discretization
Filters, Resample filter, etc.) in each dataset.

Procedure:

Preprocess Tab

1. Loading Data
The first four buttons at the top of the preprocess section enable you to load data
into WEKA:
2. Open file Brings up a dialog box allowing you to browse for the data file on the
local file system.

3. Open URL: Asks for a Uniform Resource Locator address for where the data is stored.
4. Open DB: Reads data from a database. (Note that to make this work you might
have to edit the file in weka/experiment/Database [Link].)
5. Generate: Enables you to generate artificial data from a variety of Data Generators.
Using the Open file ...button you can read files in a variety of formats: WEKA’s ARFF
format, CSV format, C4.5 format, or serialized Instances format. ARFF files
typically have [Link] extension, CSV files a .csv extension, C4.5 files a .data and
.names extension, and serialized Instances objects a .bsi extension.
Current Relation: Once some data has been loaded, the Preprocess panel shows a
variety of information. The Current relation box (the “current relation” is the currently loaded
data, which can be interpreted as a single relational table in database terminology) has
three entries:

1. Relation. The name of the relation, as given in the file it was loaded from. Filters
(described below) modify the name of a relation.

2. Instances. The number of instances (data points/records) in the data.

3. Attributes. The number of attributes (features) in the data.

Working With Attributes

Below the Current relation box is a box titled Attributes. There are four
buttons, and beneath them is a list of the attributes in the current relation.

The list has three columns:

1. No.. A number that identifies the attribute in the order they are specified in the data file.

2. Selection tick boxes. These allow you select which attributes are present in the relation.
3. Name. The name of the attribute, as it was declared in the data file. When you
click on different rows in the list of attributes, the fields change in the box to the right
titled Selected attribute.
This box displays the characteristics of the currently highlighted attribute in the list:

1. Name. The name of the attribute, the same as that given in the attribute list.

[Link]. The type of attribute, most commonly Nominal or Numeric.

4. Missing. The number (and percentage) of instances in the data for which this
attribute is missing (unspecified).
5. Distinct. The number of different values that the data contains for this attribute.

[Link]. The number (and percentage) of instances in the data having a value for this
attribute that no other instances have.

Below these statistics is a list showing more information about the values stored in this
attribute, which differ depending on its type. If the attribute is nominal, the list consists
of each possible value for the attribute along with the number of instances that have that
value. If the attribute is numeric, the list gives four statistics describing the distribution of
values in the data— the minimum, maximum, mean and standard deviation.
And below these statistics there is a coloured histogram, colour-coded according to the
attribute chosen as the Class using the box above the histogram. (This box will bring up
a drop- down list of available selections when clicked.) Note that only nominal Class
attributes will result in a colour-coding. Finally, after pressing the Visualize All button,
histograms for all the attributes in the data are shown in a separate window.

They can be toggled on/off by clicking on them individually. The four buttons above
can also be used to change the selection:

PREPROCESSING

1. All. All boxes are ticked.


2. None. All boxes are cleared (unticked).
3. Invert. Boxes that are ticked become unticked and vice versa.

4. Pattern. Enables the user to select attributes based on a Perl 5 Regular Expression.
E.g.,
.* idselects all attributes which name ends with id.

Once the desired attributes have been selected, they can be removed by clicking the
Remove button below the list of attributes. Note that this can be undone by clicking
the Undo button, which is located next to the Edit button in the top-right corner of the
Preprocess panel.

Working with Filters:-


The preprocess section allows filters to be defined that transform the data in
various ways. The Filter box is used to set up the filters that are required. At the left
of the Filter box is a Choose button. By clicking this button, it is possible to select
one of the filters in WEKA. Once a filter has been selected, its name and options are
shown in the field next to the Choose button. Clicking on this box with the left mouse
button brings up a Generic Object Editor dialog box. A click with the right mouse
button (or Alt+Shift+left click) brings up a menu where you can choose, either to
display the properties in a Generic Object Editor dialog box, or to copy the current
setup string to the clipboard.

The GenericObjectEditor Dialog Box

The GenericObjectEditor dialog box lets you configure a filter. The same kind
of dialog box is used to configure other objects, such as classifiers and clusterers.
The fields in the window reflect the available options.
Right-clicking on such a field will bring up a popup menu, listing the following
options:
1. Show properties has the same effect as left-clicking on the field, i.e., a dialog
appears allowing you to alter the settings.

2. Copy configuration to clipboard copies the currently displayed configuration


string to the system’s clipboard and therefore can be used anywhere else in WEKA or
in the console. This is rather handy if you have to setup complicated, nested schemes.

3. Enter configuration is the “receiving” end for configurations that got copied to
theclipboard earlier on. In this dialog you can enter a class name followed by options (if the
class supports these). This also allows you to transfer a filter setting from the Preprocess
panel to a Filtered Classifier used in the Classify panel.

Left-Clicking on any of these gives an opportunity to alter the filters settings. For
example,the setting may take a text string, in which case you type the string into the text
field [Link] it may give a drop-down box listing several states to choose from. Or it
may do something else,depending on the information required.
Information on the options is provided in a tool tip if you let the mouse pointer hover of
the corresponding field. More information on the filter and its options can be obtained by
clicking on the More button in the About panel at the top of the
GenericObjectEditor window.

Applying Filters

Once you have selected and configured a filter, you can apply it to the data by pressing
theApplybutton at the right end of the Filter panel in the Preprocess panel. The
Preprocess panel willthenshow the transformed data. The change can be undone by
pressing the Undo button. You can also use the Edit...button to modify your data
manually in a dataset editor. Finally, the Save... button at the top right of the Preprocess
panel saves the current version of the relation in file formats that can represent
the relation, allowing it to be kept for future use.
➢ Steps for run preprocessing tab in WEKA

1. Open WEKA Tool.


2. Click on WEKA Explorer.
2. Click on Preprocessing tab button.
3. Click on open file button.
5. Choose WEKA folder in C drive.
• Select and Click on data option
button.
4. Choose labor data set and open file.
5. Choose filter button and select the Unsupervised-Discritize
option and apply Dataset [Link].

The following screenshot shows the effect of discretization


Ex no: 8 Case Study of an Online
Analytical Processing (OLAP)
Date: Implementation

Aim:

To do a case study of an Online Analytical Processing (OLAP) Implementation

Procedure:

Information systems (IS) is a relatively new discipline compared with accounting that has a
history of several centuries. However, the two disciplines are closely linked together both in
practice and academic research. Early research work showed the relationship and linkage
between IS and accounting (Firmin and Linn, 1968; Godfrey and Price, 1971; Colantoni et al.,
1971). Since then accounting information systems (AIS) has developed as one of the core
courses in universities' business curriculum. AIS has also become a separate area of research
within the IS domain.

Among the various professions, accountants are the pioneers in the application of
computers and IT to process large volumes of data beginning from the 1960s. Whilst
today's accountants place heavy reliance on computerised AIS to compile information
required by managers, a plethora of IS technologies have emerged over the last two
decades, examples include Decision Support System (DSS), Executive Information
System (EIS), On-line Analytical Processing (OLAP) or
Business Intelligence System, to name a few. In view of the significant impact of IS
development on accounting activities, the traditional roles of accountants are changing
continuously.
The purpose of this paper is to investigate the IS role of accountants and the benefits of
using OLAP to accountants in generating complex management reports. A case study is
conducted and a public sector organization is selected as the research site. The case reports the
implementation of OLAP by the accountants to improve the value of AIS and quality of financial
reporting. It also illustrates that the emerging role of systems accounting is important in
bridging the gap between the accounting and IT specialists in the process of maintaining and
upgrading the AIS

Background:
Previous research has demonstrated that the value of computerised accounting
information is critical to survival and success of a business (Cathey and Phillips,
1994). The performance of the IS function has a direct impact on the performance of
the AIS. However, the measure of IS performance is difficult, if not impossible, and
subject to much controversy. Cases of IS failure in business firms are common place
in the literature (Reimus, 1997).
Different types of managerial decision-making relate to different roles of accounting
(Burchell et al., 1980). The increasing adoption of IT in organisations will changethe
computation and judgment roles to the compromise and inspiration roles (Stambaugh
and Carpenter, 1992). The IT impact on corporate financial reporting is significant,
though its degree and pattern are contingent.

The Case Study


The company, hereinafter called ABC (true name disguised to protect
identity), is a public sector organization In the state of New South Wales in
Australia. The
firm has9,300 employees with an annual revenue of A$1.5 billion. The Australian
public sector has undergone a series of drastic reforms in the past decade. However, it is not
the intention of this paper to investigate the impact of those institutional changes on the IS
and accounting activities.
Systems Problem

Prior to July 1998, there was much dissatisfaction from users across the entire
organisation with the delivery of IT services. During the period from July 1994 to July 1997, a
new management information system (MIS) encompassing all functional areas, including
finance; payroll; human resources; supply and maintenance, was implemented to deliver
information to all levels of staff. A post- installation review was conducted in September 1997
to highlight any deficiencies of the MIS. It was discovered that the lack of user friendly and
efficient reporting tools did not meet the reporting requirements of ABC. The following are
extracts of comments from the users documented in the review report:
Available reporting tool is complex and difficult to use.
Report generation is slow.
Screens are busy and are costly/difficult to change. Lack of
user friendly tools to extract and download data.
Since July 1997, an interim spreadsheet-based (using Microsoft Excel) solution has been in
operation.
However, the Excel solution had the following problems:
Complexity and logistics of reporting process/systems means less time and commitment
available for quality analysis.
Cannot drill down to the underlying numbers in the reports.
Maintenance of reporting templates and the processes or systems necessary to generate and
update forecasts is slow, complex, not properly documented and reliant on one person.
Systems Solution: OLAP

In view of this reporting dilemma, the corporate finance division (CFD) took an active role to
improve the financial management reporting and supporting systems and technology. In
April 1998,the CFD initiated implementing a more efficient, effective and robust financial
reporting tool using OLAP technology to meet the sophisticated reporting requirements in ABC.
The Systems Accounting Manager (SAM) was in charge of this project.

technology. In April 1998, the CFD initiated implementing a more efficient,


effective and robust financial reporting tool using OLAP technology to meet the
sophisticated reporting requirements in ABC. The Systems Accounting
Manager (SAM) was in charge of this project.

In August 1998, the OLAP reporting system was fully operational with
improved speed and quality of information available to management. The
structure of the new system is depicted in Figure 1 and the main features of the
system are:
An OLAP database is linked to the MIS storing transactions from different
operational systems.
The OLAP database allows data to be accessed by users on-line with
extremely fast response times. Financial management reporting
templates will be used as a window to the data in the OLAP database
and will allow users to drill down to the underlying data and update
forecasts directly in the OLAP database.
The results of changes to forecasts will be available to all levels of management
immediately (on-line).Systems accountants are responsible for accessing the data from
the OLAP database and generating sophisticated management reports through the
creation of multi-dimensional models.

Figure 1
OLAP
MIS
Databas

Standard Complex
Reports & Reports &
Enquiries Enquiries
Benefits of OLAP to Accountants

The traditional two-dimensional view of the AIS has been replaced by a sophisticated,
real-time multi- dimensional environment. Accountants can now manipulate the
various multi- dimensional cubes to produce top- quality information to management.
At ABC, the benefits of the new reporting tool can be summarised as follows:

Reduce the time and streamline the processes/systems associated with distributing,
analysing, forecasting and consolidating financial management reports periodically.
Provide better tools to analyse data underlying the line items in the monthly financial
management reports more effectively and efficiently.
Reduce the risk associated with undocumented and complex processes and
reliance on one person to maintain the systems that support financial management
reporting.
Streamline and simplify the maintenance of reporting templates and supporting

systems. Provide the potential to realise productivity gains in areas currently using
inefficient reporting tools and
areas responsible for coordinating, staging, distributing and consolidating financial
management information.
Provide opportunities to implement improvements in other areas such as budgeting and
project reporting.

Implications from the Case

The case has demonstrated that, in light of the failure of the IT function to deliver
satisfactory and quality system solutions to the accounting users, accountants play a critical
role to add value to the AIS through the use of multi- dimensional reporting tools. It is clear that
the "systems accounting" role has emerged in today's organisations having intensive use of IT
and further research is required to develop solid theories.

This role is certainly critical to the survival and success of firms operating in today's
competitive world. While Mouritsen's (1996) five aspects (bookkeeping, consulting, banking,
controlling and administration) of accounting departments' work encompass a wide range of
work performed by accountants, the new systems role seems to complicate the dimensions.

The systems accounting role in ABC include the following:

Support and enhance MIS financial modules. Develop and support management
reporting systems using different state-of-the-art tools.
Streamline and improve processes, systems and procedures.
Assist line department plan, design, project, manage and implement systems initiatives.

Specifically, the lessons learned from this case and the directions for future research are:

The IS role of accountants exists but is unclear. Further research is required to investigate
empirical relationship between the accounting and IS functions, especially on
their roles in different types and sizes of organisations.
Accountants can complement any unsatisfactory IS services by having a proactive attitude
towards systems design and development. Further research is necessary in exploring the
the systems accounting role of accountants.

Conclusion

From the empirical evidence of this case study, the image of accountants is no longer bean
counters but value- added internal systems consultants (Baker, 1994). They are even ready to
expand their roles to act as IS planners and managers (Falconer and Hodgett, 1997), or
assume the position of the CIO in managing the IT activities. This indicates that the traditional
role of the accountants as information- gatherers has diminished, rather they are the
information brokers and analysts, also acting as interpreters and advisers in CIS design and
development.
Since IT is now an indispensable weapon for achieving businesses, CFOs have to take an
active role to improve under-performing IT departments. Accountants are becoming more involved
in IS planning, design and implementation. The demands on them to make use of IT to improve
reporting capability are great. The emerging area of systems accounting is important to the
accounting practitioners and academics in the years ahead.
[Link]: 9 CASE STUDY OF AN OLTP IMPLEMENTATION
Date:

Aim:
To do a case study of an OLTP IMPLEMENTATION.

Procedure:
Online transaction processing, or OLTP, refers to a class of systems that
facilitate and manage transaction-oriented applications, typically for data entry and
retrieval transaction processing. The term is somewhat ambiguous; some understand
a "transaction" in the context of computer or database transactions, while others
(such as the Transaction Processing Performance Council) define it in terms of
business or commercial transactions.

OLTP has also been used to refer to processing in which the system responds
immediately to user requests. An automatic teller machine (ATM) for a bank is an
example of a commercial transaction processing application.

The technology is used in a number of industries, including banking, airlines, mail


order, supermarkets, and manufacturing. Applications include electronic banking,
order processing, employee time clock systems, e-commerce, and e Trading. The
most widely used OLTP system is probably IBM's CICS.

In computer science, transaction processing is information processing that is


divided into individual, indivisible operations, called transactions. Each transaction
must succeed or fail as a complete unit; it cannot remain in an intermediate state.

definition –

Databases must often allow the real-time processing of SQL transactions to


support e- commerce and other time-critical applications.

This type of processing is known as online transaction processing (OLTP).

OLTP (online transaction processing) is a class of program that facilitates and


manages transaction-oriented applications, typically for data entry and retrieval
transactions in a number of industries, including banking, airlines, mail order, supermarkets, and
manufacturers.
Probably the most widely installed OLTP product is IBM's CICS (Customer
Information Control System).

Today's online transaction processing increasingly requires support for transactions


that span a network and may include more than one company. For this reason, new
OLTP software uses client/server processing and brokering software that allows
transactions to run on different computer platforms in a network.
A system consisting of interconnected computers that share a central storage
system and various peripheral devices such as a printer, scanner, or router. Each
computer connected to the system can operate independently, but has the ability to
communicate with other external devices and computers.

REQUIREMENTS:-

Online transaction processing increasingly requires support for transactions that


span a network and may include more than one company. For this reason, new
OLTP software uses client/server processing and brokering software that allows
transactions to run on different computer platforms in a network.
In large applications, efficient OLTP may depend on sophisticated transaction
management software (such as CICS) and/or database optimization tactics to
facilitate the processing of large numbers of concurrent updates to an OLTP-
oriented database.
For even more demanding Decentralized database systems, OLTP brokering
programs can distribute transaction processing among multiple computers on a
network. OLTP is often integrated into service- oriented architecture (SOA) and
Web services.
BENEFITS:-
Online Transaction Processing has two key benefits: simplicity and
efficiency. Reduced paper trails and the faster, more accurate
forecasts for revenues and expenses are both examples of how OLTP
makes things simpler for businesses.

ADVANTAGES OF OLTP:-
Online Transaction Processing (OLTP) has the following advantages:
It provides faster and more accurate forecast for revenues and expenses.
It provides a concrete foundation for a stable organization because of
Timely modification of all transactions.
It makes the transactions much easier on behalf of the customers by allowing them to make the
payments according to their choice.
It broadens the customer base for an organization by simplifying and speeding up individual
processes.

DISADVANTAGES OF OLTP:-

As with any information processing system, security and reliability are considerations.
Online transaction systems are generally more susceptible to direct attack and abuse
than their offline counterparts. When organizations choose to rely on OLTP,
operations can be severely impacted if the transaction system or database is
unavailable due to data corruption, systems failure, or network availability issues.
Additionally, like many modern online information technology solutions, some
systems require offline maintenance which further affects the
cost-benefit analysis.

Online Transaction Processing (OLTP) has the following disadvantages:

It makes the database much more susceptible to intruders and hackers


because it makes the database available worldwide.
For B2B (business-to-business) transactions, businesses must go offline to
complete certain steps of an individual process, causing buyers and
suppliers to miss out on some of the efficiency benefits that the system
provides. As simple as OLTP is, the simplest disruption in the system has
the potential to cause a great deal of problems, causing a waste of both
time and money.
It can lead to server failure, which may cause delays or even wipe out large
amounts of data from the database.
IMPLEMENTATION OF OLTP:-
On-Line Transaction Processing is a processing that supports the daily
business operations. Also know as operational processing and
OLTP. An OLTP is a database which must typically allow the real- time processing of
SQL transactions to support traditional retail processes, e- commerce and other time-
critical applications. It is also a class of program that helps to manage or facilitate
transaction oriented applications such as data entry and retrieval transactions in a
number of industries, including banking, airlines, mail order, supermarkets, and
manufacturers.
With today's business environment, it is impossible to run a business without having to
rely on data.
Processing online transactions these days increasingly requires support for
transactions spanning a large network or even the global internet and may
include many companies.

1. Today, with the ubiquity of the internet, more and more people even from
those remote areas are not doing transactions online through an e-commerce
environment. The term transaction processing is often associated with the
process wherein an online shop or ecommerce website accepts and
processes payments through a customer's credit or debit card in real time in
return for purchased goods and services.
2. During the process of online transactions, a merchant payment system will
automatically connect to the bank or credit card company of the customer and
carry out security, fraud and other checking for validity after which
authorization to take the payment follows. In is strongly advised that when a
company looks for other companies that will handle online transactions and
processing, the company should have a system infrastructure that is robust,
secure and reliable that give customers fast, seamless and secure checkout
time.

3. An OLTP implementation tends to be very large involving very high volume of


data at any given time. Business organizations have invested in sophisticated
transaction management software like Customer Information Control System
(CICS) and database optimization tactics that can help OLTP process very large
numbers and volumes of concurrent updates on an OLTP-oriented database.

4. There are also many OLTP brokering programs which can distribute
transaction processing among multiple computers on a network that can
enhance the functions of an OLTP working on a more demanding
decentralized database system. Service oriented architectures and web
services are now commonly integrated with OLTP.
5. The two main benefits with using OLTP are simplicity and efficiency. OLTP
helps simplify a business operation by reducing paper trails and helping draw
faster and more accurate forecasting for revenues and expenses.

6. OLTP helps provide a concrete foundation with timely updating of corporate


data.
7. for an enterprise' customers, OLTP allows the more choices on how they want
to pay giving them more flexible time and enticing them to make more
transactions. Most OLTP transactions offer services to customers 24 hours a
day seven days a week .
APPLICATION OF OLTP:-

OLTP applications are often used to capture new data or update existing
data. An order-entry system is a typical example of an OLTP application.

OLTP applications have the following characteristics:

• Transactions that involve small amounts of data


• Indexed access to data
• Many users
• Frequent queries and updates
• Fast response times.

Compare Data Warehouse database and OLTP database?

The data warehouse and the OLTP data base are both
relational databases. However, the objectives of both these
databases are different.
• The OLTP database records transactions in real time and aims to automate
clerical data entry processes of a business entity.

8. Addition, modification and deletion of data in the OLTP database is


essential and the semantics of the application used in the front end impact
on the organization of the data in the database.
• The data warehouse on the other hand does not cater to real time
operational requirements of the enterprise. It is more a storehouse
of current and historical data and may also contain data extracted
from external data sources.
Differences
Data warehousedatabase
OLTP database

Designed for analysis of


business measures by Designed for real time business
categories and attributes operations.

Optimized for bulk loads and large, Optimized for a common set of
complex, unpredictable queries that transactions, usually adding or
access many rows per table. retrieving a single row at a time per
table.

Optimized for validation of incoming


Loaded with consistent, valid data;
data during transactions; uses
requires no real time validation
validation data tables.

Supports few
Supports thousands ofconcurrent
concurre
users.
nt users relative to
OLTP

• However, the data warehouse supports OLTP system by providing a


place for the latter to offload data as it accumulates and by providing
services which would otherwise degrade the performance of the
database.
OLTP vs. OLAP

We can divide IT systems into transactional (OLTP) and analytical


(OLAP). In general we can assume that OLTP systems provide source
data to data warehouses, whereas OLAP systems help to analyze it.

- OLTP (On-line Transaction Processing) is characterized by a large


number of short on-line transactions (INSERT, UPDATE, DELETE). The main emphasis
for OLTP system sis puton very fast query processing, maintaining data integrity in
multi-access environments and an effectiveness measured by number of transactions
per second. In OLTP database there is detailed and current data, and schema used to
store transactional databases is the entity model (usually 3NF).

- OLAP (On-line Analytical Processing) is characterized by relatively low


volume of transactions. Queries are often very complex and involve aggregations. For
OLAP systems a response time is an effectiveness measure. OLAP applications are
widely used by Data Mining techniques. In OLAP database there is aggregated,
historical data, stored in multi- dimensional schemas (usually star schema).
The following table summarizes the major differences betweenOLTP and OLAP
system design.

OLTP System Online OLAP System Online


Transaction Processing AnalyticalProcessing
(Operational System) (Data Warehouse)

Operational data; OLTPsare Consolidation data; OLAP


Source of data the original source ofthe data. data comes from the
various OLTP
Databases
To control and To help with planning,
Purpose Of
run fundamental problem solving, and decision
data
businesstasks support
Reveals a snapshot of Multi-dimensional views of various
What the data ongoing business processes kinds of business activities

Short and fast inserts


Inserts an Periodic long-running batch
d andupdates initiated by end
Update jobs refresh the data
users
s
Relatively standardized and
Queries simple queries Returning Often complex
relatively few records queries involving
aggregations
Depends on the amount of data
involved; batch data refreshes
Processin and complex queries may take
g Typically very fast many hours; query speed can be
Speed improved by creating indexes
Larger due to the existence of
Space Can be relatively if aggregation structures and
Requiremen small historical data history data; requires more
ts is archived indexes than
OLTP
Database Highly normalized with many Typically de-normalized with
Design tables fewer tables; use of

star and/or snowflake


schemas

Backup religiously; Instead of regular backups, some


operational data is critical to run environments may consider simply
the business, data loss is likely reloading the OLTP data as a
to entail recovery method
significant monetary loss
Backup
and
Recove
ry

and legal liability


DIFFERENCE BETWEEN OLAP & OLTP:-

OLAP
OLTP
1. Current data. 1. Current and historical data.
2. Short database transactions . 2. Long database transactions.
3. Short database transactions . 3. Batch update/insert/delete.
4. Normalization is promoted . 4. Denormalization is promoted .
5. High volume transactions . 5. Low volume transactions.
6. Transaction recovery is 6. Transaction recovery is not
necessary. necessary.

HOW OLTP WORKS :-


TRANSACTION
PROCESSING SYSTEM:
A Transaction Processing System or Transaction Processing Monitor is
a set of information which processes a data transaction in a database
system that monitors transaction programs(a special kind of program). The
essence of a transaction program is that it manages data that must be left in
a consistent state. E.g. if an electronic payment is made, the amount must
be both withdrawn from one account and added to the other; it cannot
complete only one of those steps. Either both must occur, or neither. In case of
a failure preventing transaction completion, the partially executed transaction
must be 'rolled back' by the TPS. While this type of integrity must be provided
also for batch transaction processing.
The seat reservation data must be locked until the reservation is made, otherwise
another user may get the impression a seat is still free while it is actually being
booked at the time. Without proper transaction monitoring, double bookings may
occur. Other transaction monitor functions include deadlock detection and resolution
(deadlocks may be inevitable in certain cases of cross-dependence on data), and
transaction logging (in 'journals') for 'forward recovery' in case of massive failures.
TYPES:-

Contrasted with batch processing.


Batch processing is not transaction processing. Batch processing involves
processing several transactions at the same time, and the results of each
transaction are not immediately

available when the transaction is being entered;[1] there is a time delay.


transactions are accumulated for a certain period (say for day) where updates are
made specially after work.

Real-time and batch processing

There are a number of differences between real-time and batch


processing. These are outlined below:

Each transaction in real-time processing is unique. It is not part of a group of


transactions, even though those transactions are processed in the same
manner. Transactions in real-time processing are stand-alone both in the
entry to the system and also in the handling of output.

Real-time processing requires the master file to be available more often for
updating and reference than batch processing. The database is not accessible all
of the time for batch processing.

Real-time processing has fewer errors than batch processing, as transaction


data is validated and entered immediately. With batch processing,
the data is organised and stored before the master file is updated. Errors can occur during
these steps.
Infrequent errors may occur in real-time processing; however, they are often
tolerated. It is not practical to shut down the system for infrequent errors.

More computer operators are required in real-time processing, as the


operations are not centralised. It is more difficult to maintain a real-
time processing system than a batch processing system.

Features

Rapid response

Fast performance with a rapid response time is critical. Businesses cannot


afford to have customers waiting for a TPS to respond, the turnaround time
from the input of the transaction to the production for the output must be a
few seconds or less.

Reliability

Many organizations rely heavily on their TPS; a breakdown will disrupt


operations or even stop the business. For a TPS to be effective its failure
rate must be very low. If a TPS does fail, then quick and accurate recovery
must be possible. This makes well–designed backup and recovery
procedures essential.

Inflexibility

A TPS wants every transaction to be processed in the same way regardless of the
user, the customer or the time for day. If a TPS were flexible, there would be too many
opportunities for non-standard operations, for example, a commercial airline needs to
consistently accept airline reservations from a range of travel agents, accepting
different transactions data from different travel agents would be a problem. The
processing in a TPS must support an organization's operations. For example if an
organization allocates roles and responsibilities to
particular employees, then the TPS should enforce and
maintain this requirement. Example : ATM Transaction
ACID test properties: first
definition Atomicity

Main article: Atomicity (database systems)


A transaction’s changes to the state are atomic: either all happen or none
happen. These changes include database changes, messages, and actions on
transducers.[2]

Consistency
A transaction is a correct transformation of the state. The actions taken as a
group do not violate any of the integrity constraints associated with the state.
This requires that the
transaction be a correct
program![2]Isolation

Even though transactions execute concurrently, it appears to each


transaction T, that others executed either before T or after T, but
not both.[2]

Durability

Once a transaction completes successfully (commits), its changes to the


state survive failures.

Concurrency
Ensures that two users cannot change the same data at the same time.
That is, one user cannot change a piece of data before another user has
finished with it. For example, if an airline ticket agent starts to reserve the
last seat on a flight, then another agent cannot tell another passenger that a
seat is available Storing and retrieving information from a TPS must be
efficient and effective.
. The data are stored in warehouses or other databases,
the system must be well designed for its backup and recovery procedures.

Databases and files


The storage and retrieval of data must be accurate as it is used many times
throughout the day. A database is a collection of data neatly organized,
which stores the accounting and operational records in the database.
Databases are always protective of their delicate data, so they usually
have a restricted view of certain data. Databases are designed using
hierarchical, network or relational structures; each structure is effective in
its own sense.

Hierarchical structure: organizes data in a series of levels, hence


why it is called hierarchal. Its top to bottom like structure consists
of nodes and branches; each child node has branches and is
only linked to one higher level parent node.
Network structure: Similar to hierarchical, network structures also
organizes data using nodes and branches. But, unlike hierarchical,
each child node can be linked to multiple, higher parent nodes.
Relational structure: Unlike network and hierarchical, a relational
database organizes its data in a series of related tables. This
gives flexibility as relationships between the tables are built.

A relational structure.
A hierarchical structure. A network structure.

The following features are included in real time transaction processing systems:

Good data placement: The database should be designed to access


patterns of data from many simultaneous users.
Short transactions: Short transactions enables quick processing.
This avoids concurrency and paces the systems.
Real-time backup: Backup should be scheduled between low times
of activity to prevent lag of the server.
High normalization: This lowers redundant information to increase
the speed and improve concurrency, this also improves backups.
Archiving of historical data: Uncommonly used data are moved into
other databases or backed up tables. This keeps tables small and
also improves backup times.
Good hardware configuration: Hardware must be able to handle many
users and provide quick response times.

In a TPS, there are 5 different types of files. The TPS uses the files to
store and organize its transaction data:
Master file: Contains information about an organization’s business
situation. Most transactions and databases are stored in the master
file.
Transaction file: It is the collection of transaction records. It helps to
update themaster file and also serves as audit trailsand transaction
history.
Report file: Contains data that has been formatted
forpresentation to auser.
Work file: Temporary files in the system used during theprocessing.
Program file: Contains the instructions for the processing ofdata.
A data warehouse is a database that collects information from different
sources. When it's gathered in real-time transactions it can be used for
analysis efficiently if it's stored in a data warehouse. It provides data that
are consolidated, subject-oriented, historical and read-only:

Consolidated: Data are organised with consistent naming


conventions, measurements, attributes and semantics. It allows data from
a data warehouse from across the organization to be effectively used in a
consistent manner.

Subject-oriented: Large amounts of data are stored across an


organization, some data could be irrelevant for reports and makes
querying the data difficult. It organizes only key business information from
operational sources so that it's available for analysis.

Historical: Real-time TPS represent the current value at any time,


an example could be stock levels.

Read-only: Once data are moved into a data warehouse, it


becomes read-only, unless it was incorrect. Since it representsa snapshot
of a certain time, it must never be updated. Only operations which occur in
a data warehouse are loading and querying data.
Backup procedures

A Dataflow Diagram of backup and recovery procedures.

Since business organizations have become very dependent on TPSs, a breakdown


in their TPS may stop the business' regular routines and thus stopping its operation
for a certain amount of time. In order to prevent data loss and minimize disruptions
when a TPS breaks down a well- designed backup and recovery procedure is put
into use. The recovery process can rebuild the system when it goes down.

Recovery process

A TPS may fail for many reasons. These reasons could include a system failure,
human errors, hardware failure, incorrect or invalid data, computer viruses, software
application errors or natural or man- made disasters. As it's not possible to prevent all
TPS failures, a TPS must be able to cope with failures. The TPS must be able to
detect and correct errors when they occur. A TPS will go through a recovery of the
database to cope when the system fails, it involves the backup, journal, checkpoint,
and recovery manager:

Journal:
A journal maintains an audit trail of transactions and database
changes. Transaction logs and Database change logs are used,
a transaction log records all the essential data.
Each transactions, including data values, time of transaction and
Terminal number.
A database changelog contains before and after copies of records that have been
modified by transactions.
• Checkpoint: A checkpoint record contains necessary information to
restart the system. These should be taken frequently, such as several
times an hour. It is possible to resume processing from the most-recent
checkpoint when a failure occurs with only a few minutes of processing
work that needs to be repeated.

• Recovery Manager: A recovery manager is a program which restores


the database to a correct condition which can restart the transaction
processing.

Depending on how the system failed, there can be two different recovery
procedures used. Generally, the procedures involves restoring data that
has been collected from a backup device and then running the
transaction processing again. Two types of recovery are backward
recovery and forward recovery:

• Backward recovery: used to undo unwanted changes to the database. It


reverses the changes made by transactions which have been aborted. It
involves the logic of reprocessing each transaction, which is very time-
consuming.

• Forward recovery: it starts with a backup copy of the database. The


transaction will then reprocess according to the transaction journal that
occurred between the time the backup was made and the present time. It's
much faster and more accurate.

Types of back-up procedures

There are two main types of Back-up Procedures: Grandfather- father-son and
Partial backups:
Grandfather-father-son

This procedure refers to at least three generations of backup master files.


thus, the most recent backup is the son, the oldest backup is the
grandfather. It's commonly used for a batch transaction processing system
with a magnetic tape. If the system fails during a batch run, the master file
is recreated by using the son backup and then restarting the batch.
However if the son backup fails, is corrupted or destroyed, then the next
generation up backup (father) is required. Likewise, if that fails, then the
next generation up backup (grandfather) is required. Of course the older
the generation, the more the data may be out of date. Organizations can
have up to twenty generations of backup.

Partial
backups

This only occurs when parts of the master file are backed up. The master
file is usually backed up to magnetic tape at regular times, this could be
daily, weekly or monthly. Completed transactions since the last backup are
stored separately and are called journals, or journal files. The master file
can be recreated from the journal files on the backup tape if the system is to
fail.

Updating in a batch

This is used when transactions are recorded on paper (such as bills and
invoices) or when it's being stored on a magnetic tape.
Transactions will be collected and updated as a batch at when it's
convenient or economical to process them. Historically, this was the most
common method as the information technology did not exist to allow real-
time processing.
The two stages in batch processing are:

• Collecting and storage of the transaction data into a


transaction file - this involves sorting the data into
sequential order.

Processing the data by updating the master file - which can be difficult, this may
involve data additions, updates and deletions that may require to happen in a
certain order. If an error occurs, then the entire batch fails.

Updating in batch requires sequential access - since it uses a


magnetict ape this is the only way to access data. A batch will
start at the beginning of the tape, then reading it from the
order it was stored; it's very time- consuming to locate specific
transactions.

The information technology used includes a secondary


storage medium which can store large quantities of data
inexpensively (thus the common choice of a magnetic tape).
The software used to collect data does not have to be
online - it doesn't even need a user interface.

Updating in real-time

This is the immediate processing of data. It provides instant


confirmation of a transaction. This involves a large amount of
users who are simultaneously performing transactions to
change data.
Because of advances in technology (such as the increase in
the speed of data transmission and larger bandwidth), real-
time updating is now possible.

Steps in a real-time update involve the sending of a transaction


data to an online database in a master file. The person
providing information is usually able to help with error
correction and receives confirmation of the transaction
completion.

Updating in real-time uses direct access of data. This occurs when data are
accessed without accessing previous data items. The storage device stores data
in a particular location based on a mathematical procedure. This will then be
calculated to find an approximate location of the data. If data are not found at this
location, it will search through successive locations until it's found. The information
technology used could be a secondary storage medium that can store large
amounts of data and provide quick access (thus the common choice of a magnetic
disk). It requires a user- friendly interface as it's important for rapid response time.

Reservation Systems Reservation systems are used for any type of


business where a service or a product is set aside for a customer to use
for a future time.

MERCHANT PAYMENT SYSTEM:-

-Electronic payment systems exist in a variety of forms which can be divided


into two groups: wholesale payment systems and retail payment systems.

Wholesale payment systems exist for non consumer transactions-


transactions initiated among and between banks, corporations,
governments, and other financial service firms. High-value wholesale
payments flow through the three major interbank funds transfer systems:
the Clearing House Interbank Payment Systems (CHIPS),6 the Society
for Worldwide Interbank Financial Telecommunications (SWIFT),7 and
Fedwire.8 Electronic transfers utilizing these types of payment
systems are beyond the scope of this Note.

Retail electronic payment systems encompass those


transactions involving consumers. These transactions involve
the use of such payment mechanisms as credit cards,
automated teller machines (ATMs), debit cards, point-of-sale
(POS) terminals, home banking, and telephone bill-paying
services. Payments for these mechanisms are conducted
online and flow through the check truncation system9 and the
ACH.10 Electronic transfers involving these types of payment
mechanisms and payment systems are also beyond the scope
of this Note.

The distinction between wholesale and retail electronic payment systems parallels
the distinction that has evolved in regulating these systems.

Wholesale electronic payment systems are regulated by Article 4A of the Uniform


Commercial Code.

Retail electronic payment systems are regulated by the Consumer Credit


Protection Act;11 the Truth in Lending Act12 and its adjunct, Regulation Z;13 and
the Electronic Funds Transfer Act (EFTA)14 and its adjunct, Part 205 of
Regulation E.15

This regulatory distinction reflects the kinds of parties involved-rules for retail
electronic payment systems are fashioned with the consumer in mind whereas
rules for wholesale payment systems are fashioned with commercial parties in
mind.
Conclusion:
The term 'electronic payment' is a collective phrase for the many different
kinds of electronic payment methods available (also meaning online
payment), and the processing of transactions and their application within
online merchants and ecommerce websites. It is essential for all online
businesses to be able to accept and process electronic payments in a fast
and secure way. Businesses can gain a significant advantage over their
competitors by providing an instant electronic payment service as it lets
customers pay by their preferred way of credit or debit card. Electronic
payments systems can also increase your cash flow, reduce administrative
costs and labour and provide yet another way for your customers to pay.
Care must be taken when choosing an electronic payment solution as it will
need to fit within the constraints of your particular online business and
integrate seamlessly within your website.

You might also like