0% found this document useful (0 votes)
13 views27 pages

Data Science

The document outlines the foundations of data science, focusing on the concepts of data and information, their definitions, types, and differences. It emphasizes the importance of data processing and modeling, detailing the data processing cycle and various data models used in information systems. Additionally, it highlights the significance of transforming raw data into meaningful information for decision-making in business contexts.

Uploaded by

Ruson Berkmons
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views27 pages

Data Science

The document outlines the foundations of data science, focusing on the concepts of data and information, their definitions, types, and differences. It emphasizes the importance of data processing and modeling, detailing the data processing cycle and various data models used in information systems. Additionally, it highlights the significance of transforming raw data into meaningful information for decision-making in business contexts.

Uploaded by

Ruson Berkmons
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

VELTECH MULTITECH Dr. RANGARAJAN Dr.

SAKUNTHALA ENGINEERING
COLLEGE

Pavithra Babu B.E., M.E

Assistant Professor

Artificial Intelligence and Data Science

COURSE CODE:191AI221

FOUNDATIONS OF DATA SCIENCE

Text Books
1. Blum, Avrim, John Hopcroft, and Ravindran Kannan. Foundations of Data Science.
Cambridge University Press, 2020.

2. Hopcroft, John, and Ravi Kannan. "Foundations of data science." Microsoft (2014).

Reference Books
1. Fan, Jianqing, et al. Statistical Foundations of Data Science. CRC press, 2020.

2. Kubben, Pieter, Michel Dumontier, and Andre Dekker. Fundamentals of clinical data
science. Springer Nature, 2019.

UNIT I- DATA AND INFORMATION

 DATA

 INFORMATION

 DIFFERENCE BETWEEN DATA AND INFORMATION

 DATA MODELS

 DATA TYPES

 FILE SYSTEM VERSUS DATABASE SYSTEM.


DATA
DATA DEFINITION:

Data are units of information, often numeric, that are collected through observation. In a more
technical sense, data are a set of values of qualitative or quantitative variables about one or
more persons or objects, while a datum (singular of data) is a single value of a single variable.

EXAMPLE, WHERE IT IS APPLIED PRACTICALLY:

Data are used in

 scientific research,

 businesses management (e.g., sales data, revenue, profits, stock price),

 finance, governance (e.g., crime rates, unemployment rates, literacy rates), and

 In virtually every other form of human organizational activity (e.g., censuses of the
number of homeless people by non-profit organizations).

Data are measured, collected and reported, and analysed, and from data visualizations such as
graphs, tables or images are produced.
Data in general referred to some existing information or knowledge is represented or coded in
some form suitable for better usage or processing.

Raw data ("unprocessed data") is a collection of numbers or characters before it has been
"cleaned" and corrected by researchers. Raw data needs to be corrected to remove outliers or
obvious instrument or data entry errors (e.g., a thermometer reading from an outdoor Arctic
location recording a tropical temperature).

Data processing commonly occurs by stages, and the "processed data" from one stage may be
considered the "raw data" of the next stage.

Field data is raw data that is collected in an uncontrolled "in situ" environment. Experimental
data is data that is generated within the context of a scientific investigation by observation and
recording.
Kinds of data documents include:

i. data repository

ii. data study

iii. data set

iv. software
v. data paper

vi. database

vii. data handbook

viii. data journal

Data in Technical definition:

In general, data is any set of characters that is gathered and translated for some purpose,
usually analysis. If data is not put into context, it doesn't do anything to a human or computer.

Data can be defined as a representation of facts, concepts, or instructions in a formalized


manner, which should be suitable for communication, interpretation, or processing by human
or electronic machine.

Data is represented with the help of characters such as alphabets (A-Z, a-z), digits (0-9) or
special characters (+, -, /, *, <, >,

= etc.)

There are multiple types of data. Some of the more common types of data include the following:

 Single character

 Boolean (true or false)

 Text (string)

 Number (integer or floating-point)

 Picture

 Sound

 Video

In a computer's storage, digital data is a series of bits (binary digits) that have the value one or
zero. Data is processed by the CPU, which uses logical operations to produce new data (output)
from source data (input).

There are Two Types of Data

Primary Data

• Qualitative Data

• Quantitative Data

Secondary Data
• Internal Data

• External Data
Data Processing Cycle:

Data processing is the re-structuring or re-ordering of data by people or machine to increase


their usefulness and add values for a particular purpose. Data processing consists of the
following basic steps - input, processing, and output. These three steps constitute the data
processing cycle.

 Input − In this step, the input data is prepared in some convenient form for processing.
The form will depend on the processing machine. For example, when electronic
computers are used, the input data can be recorded on any one of the several types of
input medium, such as magnetic disks, tapes, and so on.

 Processing − In this step, the input data is changed to produce data in a more useful
form. For example, pay-checks can be calculated from the time cards, or a summary of
sales for the month can be calculated from the sales orders.

 Output − At this stage, the result of the proceeding processing step is collected. The
particular form of the output data depends on the use of the data. For example, output
data may be pay-checks for employees.

INFORMATION
What is Information?

Information is organized or classified data, which has some meaningful values for the
receiver. Information is the processed data on which decisions and actions are [Link] the
decision to be meaningful, the processed data must qualify for the following characteristics −

 Timely − Information should be available when required.

 Accuracy − Information should be accurate.

 Completeness − Information should be complete.


1. Information can be transmitted in time, via data storage, and space,
via communication and telecommunication. Information is expressed either as the
content of a message or through direct or indirect observation. That which
is perceived can be construed as a message in its own right, and in that sense,
information is always conveyed as the content of a message.

2. Information can be encoded into various forms for transmission and interpretation (for
example, information may be encoded into a sequence of signs, or transmitted via
a signal). It can also be encrypted for safe storage and communication.

What is Information in Business?

A. Data is what translates to establishing information followed by strategic success. So,


without data, the following steps won’t exist. A good business stands on market
analysis, gathered around data analysis that sieves the raw data for important insights.
So, with information exists a greater scope at deriving success in most business
ventures.

B. From data to information and from information to business intelligence, every business
relies on the data generated. Businesses are taking advantage of this process to create a
difference in their market approach.

C. Business Information like its other segments in the information industry has several
forms i.e., News, Credit & Financial Information, Market Research, IT Research, and
Industry Analysis. They can further be categorized into directories, periodicals, stats,
government information, guides, handbooks, almanacs, and directories.

D. The Internet has made it relatively easier for publishers to deliver business information,
especially with subscription models that deliver content to their user base.

E. Market research doesn’t just stem from a linear source of data, it is rather an exhaustive
process where analysts separate the good data – which is the cornerstone for any
business strategy.

Now, you will have business information systems that are designed to help organizations make
important decisions via objective attainment. This system uses the resources provided in most
IT Infrastructure to satiate the needs of variant entities existing inside a business enterprise.
Mc-Creadie and Rice Concept:

Information as a representation of knowledge: Information is stored knowledge.


Traditionally the storage medium has been books, but increasingly electronic media are
becoming important.

Information as data in the environment: Information can be obtained from a range of


environmental stimuli and phenomena; not all of which are intended to ‘convey’ a message,
but which can be informative when appropriately interpreted.

Information as part of the communication process: Timing and social factors play a
significant role in the processing and interpretation of information.

Information as a resource or commodity: Information is transmitted in a message from


sender to receiver. The receiver interprets the message as intended by the sender. There may
be added value as the information is disseminated or exchanged.

Summary as follows:

What is Data?

Data is a raw and unorganized fact that required to be processed to make it meaningful. Data
can be simple at the same time unorganized unless it is organized. Generally, data comprises
facts, observations, perceptions numbers, characters, symbols, image, etc.

Data is always interpreted, by a human or machine, to derive meaning. So, data is meaningless.
Data contains numbers, statements, and characters in a raw form.
What is Information?

Information is a set of data which is processed in a meaningful way according to the given
requirement. Information is processed, structured, or presented in a given context to make it
meaningful and useful.

It is processed data which includes data that possess context, relevance, and purpose. It also
involves manipulation of raw data.

Information assigns meaning and improves the reliability of the data. It helps to ensure
undesirability and reduces uncertainty. So, when the data is transformed into information, it
never has any useless details.

DIFFERENCE BETWEEN DATA AND


INFORMATION
KEY DIFFERENCES:

 Data is a raw and unorganized fact that is required to be processed to make it meaningful
whereas Information is a set of data that is processed in a meaningful way according to
the given requirement.

 Data does not have any specific purpose whereas Information carries a meaning that
has been assigned by interpreting data.

 Data alone has no significance while Information is significant by itself.

 Data never depends on Information while Information is dependent on Data.

 Data measured in bits and bytes, on the other hand, Information is measured in
meaningful units like time, quantity, etc.

 Data can be structured, tabular data, graph, data tree whereas Information is language,
ideas, and thoughts based on the given data.

Data Vs. Information

Parameters Data Information


Qualitative or Quantitative
It is a group of data which carries news
Description Variables which helps to develop
and meaning.
ideas or conclusions.
Information word has old French and
Data comes from a Latin word,
middle English origins. It has referred
datum, which means "To give
Etymology to the "act of informing.". It is mostly
something." Over a time "data"
used for education or other known
has become the plural of datum.
communication.
Parameters Data Information
Data is in the form of numbers,
Format Ideas and inferences
letters, or a set of characters.
It can be structured, tabular data, Language, ideas, and thoughts based on
Represented in
graph, data tree, etc. the given data.
Data does not have any specific It carries meaning that has been
Meaning
purpose. assigned by interpreting data.
Interrelation Information that is collected Information that is processed.
Information is the product and group of
Data is a single unit and is raw. It
Feature data which jointly carry a logical
alone doesn't have any meaning.
meaning.
Dependence It never depends on Information It depended on Data.
Measured in meaningful units like time,
Measuring unit Measured in bits and bytes.
quantity, etc.
Support for Decision It can't be used for decision
It is widely used for decision making.
making making
Contains Unprocessed raw factors Processed in a meaningful way
Knowledge level It is low-level knowledge. It is the second level of knowledge.
Data is the property of an
Information is available for sale to the
Characteristic organization and is not available
public.
for sale to the public.
Data depends upon the sources
Dependency Information depends upon data.
for collecting data.
Sales report by region and venue. It
Example Ticket sales on a band on tour. gives information which venue is
profitable for that business.
Significance Data alone has no significance. Information is significant by itself.
Data is based on records and
Information is considered more reliable
observations and, which are
Meaning than data. It helps the researcher to
stored in computers or
conduct a proper analysis.
remembered by a person.
The data collected by the Information is useful and valuable as it
Usefulness researcher, may or may not be is readily available to the researcher for
useful. use.
Information is always specific to the
requirements and expectations because
Data is never designed to the
Dependency all the irrelevant facts and figures are
specific need of the user.
removed, during the transformation
process.

DATA MODELS
What Is a Data Model?

A data model is a visual representation of data elements and the relationships between them.
Data models help business and technical resources collaborate in the design of information
systems and the databases that power them. They show what data is required and how it needs
to be structured to support various business processes.

A data model is an abstract model that organizes elements of data and standardizes how they
relate to one another and to the properties of real-world entities. For instance, a data model may
specify that the data element representing a car be composed of a number of other elements
which, in turn, represent the color and size of the car and define its owner.

Why Is Data Modelling Important?

Data modelling is a critical component of metadata management, data governance and data
intelligence. It provides an integrated view of conceptual, logical and physical data models to
help business and IT stakeholders understand data structures and their meaning.

Why use Data Model?

The primary goal of using data model are:

1) Ensures that all data objects required by the database are accurately represented.
Omission of data will lead to creation of faulty reports and produce incorrect results.

2) A data model helps design the database at the conceptual, physical and logical levels.

3) Data Model structure helps to define the relational tables, primary and foreign keys and
stored procedures.

4) It provides a clear picture of the base data and can be used by database developers to
create a physical database.

5) It is also helpful to identify missing and redundant data.

6) Though the initial creation of data model is labour and time consuming, in the long run,
it makes your IT infrastructure upgrade and maintenance cheaper and faster.

TYPES OF DATA MODEL

There are mainly three different types of data models: conceptual data models, logical data
models, and physical data models, and each one has a specific purpose. The data models are
used to represent the data and how it is stored in the database and to set the relationship between
data items.

A conceptual data model is a rough draft, containing the relevant concepts or entities and the
relationships between them.

A logical data model, also referred to as information modelling, is the second stage of data
modelling. It is a graphical representation of the information requirements for a given business
area.
A physical data model provides the database-specific context, elaborating on the conceptual
and logical models produced prior. Accordingly, physical data models are often treated as the
blueprint for a proposed database.

Conceptual Data Model

A Conceptual Data Model is an organized view of database concepts and their relationships.
The purpose of creating a conceptual data model is to establish entities, their attributes, and
relationships. In this data modelling level, there is hardly any detail available on the actual
database structure. Business stakeholders and data architects typically create a conceptual data
model.

The 3 basic tenants of Conceptual Data Model are

 Entity: A real-world thing

 Attribute: Characteristics or properties of an entity

 Relationship: Dependency or association between two entities

Data model example:

 Customer and Product are two entities. Customer number and name are attributes of the
Customer entity

 Product name and price are attributes of product entity

 Sale is the relationship between the customer and product


Characteristics of a conceptual data model

i. Offers Organisation-wide coverage of the business concepts.

ii. This type of Data Models is designed and developed for a business audience.

Conceptual data models known as Domain models create a common vocabulary for all
stakeholders by establishing basic concepts and scope.

Logical Data Model

The Logical Data Model is used to define the structure of data elements and to set
relationships between them. The logical data model adds further information to the conceptual
data model elements. The advantage of using a Logical data model is to provide a foundation
to form the base for the Physical model. However, the modelling structure remains generic.

Characteristics of a Logical data model

i. Describes data needs for a single project but could integrate with other logical data
models based on the scope of the project.

ii. Designed and developed independently from the DBMS.

iii. Data attributes will have datatypes with exact precisions and length.

Physical Data Model

A Physical Data Model describes a database-specific implementation of the data model. It


offers database abstraction and helps generate the schema. This is because of the richness of
meta-data offered by a Physical Data Model.
Characteristics of a physical data model:

i. The physical data model describes data need for a single project or application though
it may be integrated with other physical data models based on project scope.

ii. Data Model contains relationships between tables that which addresses cardinality and
nullability of the relationships.

METHODS OF DATA MODELLING:

A database model is a specification describing how a database is structured and used.

Flat model

This may not strictly qualify as a data model. The flat (or table) model consists of a single, two-
dimensional array of data elements, where all members of a given column are assumed to be
similar values, and all members of a row are assumed to be related to one another.

Hierarchical model

In this type of data model, the data is organized into a tree-like structure that has a single root
and the data is linked to the root. In this model, the main hierarchy begins from the root and it
expands like a tree that has child nodes and further expands in the same manner. In this model
the child node has one single parent node but one parent can have multiple child nodes. As the
data is stored like tree structure in this data model when data is retrieved the whole tree is
traversed from the root node. The hierarchical data model contains a one-to-many relationship
between various types of data. The data is stored in the form of a record and is connected
through links.
Network model

This model organizes data using two fundamental constructs, called records and sets. Records
contain fields, and sets define one-to-many relationships between records: one owner, many
members. The network data model is an abstraction of the design concept used in the
implementation of databases. The network model is a type of database model which is designed
based on a flexible approach for representing objects and the relationship exist among objects.
The schema is very important in the network data model which can be represented in the form
of a graph where a relationship is represented using edges and the nodes are used to represent
objects.

Relational model

In this data model, the data tables are used to collect a group of elements into the relations. In
this model, the relationships and data are represented using interrelated tables. And in the table,
there are multiple rows and multiple columns in which column represents the attribute of the
entity and the rows are used to represent records.
Object-relational model

Similar to a relational database model, but objects, classes and inheritance are directly
supported in database schemas and in the query language.

Object-role modelling

A method of data modelling that has been defined as "attribute free", and "fact-based". The
result is a verifiably correct system, from which other common artifacts, such as ERD, UML,
and semantic models may be derived. Associations between data objects are described during
the database design procedure, such that normalization is an inevitable result of the process.

Star schema

The simplest style of data warehouse schema. The star schema consists of a few "fact tables"
(possibly only one, justifying the name) referencing any number of "dimension tables". The
star schema is considered an important special case of the snowflake schema.

DATA STRUCTURE DIAGRAM

A data structure diagram (DSD) is a diagram and data model used to describe conceptual data
models by providing graphical notations which document entities and their relationships, and
the constraints that bind them. The basic graphic elements of DSDs are boxes, representing
entities, and arrows, representing relationships. Data structure diagrams are most useful for
documenting complex data entities. Data structure diagrams are an extension of the entity-
relationship model (ER model).
What is Data Modelling?

Data modelling is the process of producing a diagram (i.e., ERD) of relationships between
various types of information that are to be stored in a database that helps us to think
systematically about the key data points to be stored and retrieved, and how they should be
grouped and related, is what the

A data model describes information in a systematic way that allows it to be stored and retrieved
efficiently in a Relational Database System which can be thought of as a way of translating the
logic of accurately describing things in the real world and the relationships between them into
rules that can be followed and enforced by computer code. One of the goals of data modelling
is to create the most efficient method of storing information while still providing for complete
access and reporting.

Entity Relationship Diagram for Data Modelling

Entity-Relationship (ER) Model is based on the notion of real-world entities and relationships
among them. While formulating real-world scenario into the database model, the ER Model
creates entity set, relationship set, general attributes and constraints.

ER Model is best used for the conceptual design of a database.

ER Model is based on −

 Entities and their attributes.

 Relationships among entities.

ER modelling is an important technique for any database designer to master and forms the basis
of the methodology.

Entity type: It is a group of objects with the same properties that are identified by the enterprise
as having an independent existence. The basic concept of the ER model is the entity type that
is used to represent a group of ‘objects’ in the ‘real world’ with the same properties. An entity
type has an independent existence within a database. Entity − An entity in an ER Model is a
real-world entity having properties called attributes. Every attribute is defined by its set of
values called domain. For example, in a school database, a student is considered as an entity.
Student has various attributes like name, age, class, etc.

Attributes are the properties of entities that are represented using ellipse-shaped figures. Every
elliptical figure represents one attribute and is directly connected to its entity (which is
represented as a rectangle).

A relationship type is a set of associations between one or more participating entity types.
Each relationship type is given a name that describes its function. There are four types of
relationships. These are:

 One-to-one: When only a single instance of an entity is associated with the relationship,
it is termed as ‘1:1’.

 One-to-many: When more than one instance of an entity is related and linked with a
relationship, it is termed as ‘1:N’.

 Many-to-one: When more than one instance of an entity is linked with the relationship,
it is termed as ‘N:1’.

 Many-to-many: When more than one instance of an entity on the left and more than one
instance of an entity on the right can be linked with the relationship, then it is termed
as N: N relationship.

Top Six Benefits of Data Modelling

Data modelling is the first step to ensuring mission-critical information is used, understood and
trusted across the enterprise. It has many benefits. Following are the top six benefits of data
modelling organizations can realize:

1. Improve discovery, standardization and documentation of data sources.

2. Successfully design and implement databases.

3. Support regulatory compliance now and into the future by governing data modelling
teams, processes, portfolios and lifecycles.

4. Empower employees by enabling self-service data access and foster collaboration by


improving inter-departmental/IT and business alignment.

5. Improve business intelligence and make it easier to identify new opportunities by


expanding data capability, literacy and accountability across the enterprise.

6. Encourage more cohesive integrations of existing information systems as new systems


are implemented with a greater perspective of the organization’s current state.
The four criteria of a good data model:

(1) Data in a good model can be easily consumed.

(2) Large data changes in a good model are scalable.

(3) A good model provides predictable performance.

(4)A good model can adapt to changes in requirements, but not at the expense .

Advantages and Disadvantages of Data Model:

Advantages of Data model:

 The main goal of a designing data model is to make certain that data objects offered by
the functional team are represented accurately.

 The data model should be detailed enough to be used for building the physical database.

 The information in the data model can be used for defining the relationship between
tables, primary and foreign keys, and stored procedures.

 Data Model helps business to communicate the within and across organizations.

 Data model helps to documents data mappings in ETL process

 Help to recognize correct sources of data to populate the model

Disadvantages of Data model:

 To develop Data model, one should know physical data stored characteristics.

 This is a navigational system produces complex application development, management.


Thus, it requires a knowledge of the biographical truth.

 Even smaller change made in structure require modification in the entire application.

DATA TYPES
A data type refers to the format of data storage that can hold a distinct type or range of
values. When computer programs store data in variables, each variable must be designated a
distinct data type. Some common data types are as follows: integers, characters, strings,
floating point numbers and arrays. More specific data types are as follows: varchar (variable
character) formats, Boolean values, dates and timestamps.
A data type is a type of data. Of course, that is rather circular definition, and also not very
helpful. Therefore, a better definition of a data type is a data storage format that can contain a
specific type or range of values.
Database applications use data types. Database Fields require distinct type of data to be
entered. For example, school record for a student may use a string data type for student’s first
and last name. The student’s date of birth would be stored in a date format and the student’s
GPA can be stored as decimal. By ensuring that the data types are consistent across multiple
records, database applications can easily perform calculations, comparisons, searching and
sorting of fields in different records.
Data types are also used by database applications. The fields within a database often require a
specific type of data to be input. For example, a company's record for an employee may use a
string data type for the employee's first and last name. The employee's date of hire would be
stored in a date format, while his or her salary may be stored as an integer. By keeping the data
types uniform across multiple records, database applications can easily search, sort, and
compare fields in different records.

Data Type Used for Example

String Alphanumeric characters hello world, Alice, Bob123

Integer Whole numbers 7, 12, 999

Float (floating point) Number with a decimal point 3.15, 9.06, 00.13

Character Encoding text numerically 97 (in ASCII, 97 is a lower case 'a')

Boolean Representing logical values TRUE, FALSE

Common Database Data Types

 Integer – is a whole number that can have a positive, negative or zero value. It cannot
be a fraction nor can have decimal places. It is commonly used in programming
especially for increasing values. Addition, subtraction and multiplication of two
integers results to an integer. But division of two integers may result to an integer or a
decimal. The resulting decimal can be rounded off or truncated to produce an integer.
 Character – refers to any number, letter, space or symbol that can be entered in a
computer. Each character occupies one byte of space.

 String – is used to represent text. It is composed of a set of characters that can have
spaces and numbers. Strings are enclosed in quotation marks to identify the data as
string and not a variable name nor a number.

 Floating Point Number – is a number that contains decimals. Numbers that contain
fractions are also considered as floating point numbers.

 Array – contains a group of elements which can be of the same data type like an integer
or string. It is used to organise data for easier sorting and searching of related set of
values.

 Varchar – as the name implies is variable character as the memory storage has variable
length. Each character occupies one byte of space plus 2 bytes for length information.
Note: Use Character for data entries with fixed length, like phone number. Use
Varchar for data entries with variable length, like address.

 Boolean – is used for creating true or false statements. To compare values the following
operators are being used: AND, OR, XOR, and NOT.

 Date, Time and Timestamp – these data types are used to work with data containing
dates and times.

Boolean Operator Result Condition

x AND y True If both x and y are True

x AND y False If either x or y is False

If either x or y, or both x
x OR y True
and y are True
x OR y False If both x and y are False

x XOR y True If only x or y is True

If x and y are both True


x XOR y False
or both False

NOT x True If x is False

NOT x False If x is True

What is the purpose of data types?

A data type constrains the values that an expression, such as a variable or a function, might
take. This data type defines the operations that can be done on the data, the meaning of the data,
and the way values of that type can be stored.

What are access data types?

Data types are the building blocks of databases. A field's data type not only influences other
important characteristics of that field, such as field size, but also how the field is used
throughout the database, such as in objects, calculations, expressions, and so forth. Using the
right data type is a key to success.

FILE SYSTEM VERSUS DATABASE SYSTEM.


What is a File system?

 A file system is a technique of arranging the files in a storage medium like a hard disk,
pen drive, DVD, etc. It helps you to organizes the data and allows easy retrieval of files
when they are required. It mostly consists of different types of files like mp3, mp4, txt,
doc, etc. that are grouped into directories.
 A file system enables you to handle the way of reading and writing data to the storage
medium. It is directly installed into the computer with the Operating systems such as
Windows and Linux.

What is DBMS?

Database Management System (DBMS) is a software for storing and retrieving user's data
while considering appropriate security measures. It consists of a group of programs that
manipulate the database. The DBMS accepts the request for data from an application and
instructs the DBMS engine to provide the specific data. In large systems, a DBMS helps users
and other third-party software to store and retrieve data.

KEY DIFFERENCES:

1. A file system is a software that manages and organizes the files in a storage medium,
whereas DBMS is a software application that is used for accessing, creating, and
managing databases.

2. The file system doesn't have a crash recovery mechanism on the other hand, DBMS
provides a crash recovery mechanism.

3. Data inconsistency is higher in the file system. On the contrary Data inconsistency is
low in a database management system.

4. File system does not provide support for complicated transactions, while in the DBMS
system, it is easy to implement complicated transactions using SQL.

5. File system does not offer concurrency, whereas DBMS provides a concurrency
facility.

Features of a File system

Here are important elements of the file system:

 It helps you to store data in a group of files.

 Files data are dependent on each other.

 C/C++ and COBOL languages were used to design the files.

 Shared File System Support

 Fast File System Recovery.

Features of DBMS

Here, are essential features of DBMS:

 A user-accessible catalog of data


 Transaction support

 Concurrency control with Recovery services

 Authorization services

 The value of data is the same at all places.

 Offers support for data communication

 Independent utility services

 Allows multiple users to share a file at the same time


Advantages of File system

Here are pros/benefits of file system:

 Enforcement of development and maintenance standards.

 Helps you to reduce redundancy

 Avoid inconsistency across file maintenance to get the integrity of data independence.

 Firm theoretical foundation (for the relational model).

 It is more efficient and cost less than a DBMS in certain situations.

 The design of file processing is simpler than designing Database.

Advantages of DBMS system

Here, are pros/benefits of DBMS system:

 DBMS offers a variety of techniques to store & retrieve data

 Uniform administration procedures for data

 Application programmers never exposed to details of data representation and Storage.

 A DBMS uses various powerful functions to store and retrieve data efficiently.

 Offers Data Integrity and Security

 The DBMS implies integrity constraints to get a high level of protection against
prohibited access to data.

 Reduced Application Development Time

 Consume lesser space

 Reduction of redundancy.

 Data independence.

Application of File system

 Language-specific run-time libraries

 API programs using it to make requests of the file system

 It is used for data transfer and positioning.

 Helps you to update the metadata

 Managing directories.
Application of the DBMS system

 Here, are important applications of the DBMS system:

 Admission System Examination System Library System

 Payroll & Personnel Management System

 Accounting System Hotel Reservation System Airline Reservation System

 It is used in the Banking system for Customer information, account activities,


Payments, deposits, loans, etc.

 Use for Airlines for reservations and schedules

 DBMS system also used by universities to keep call records, monthly bills, maintaining
balances, etc.

 Finance for storing information about stock, sales, and purchases of financial
instruments like stocks and bonds.

Disadvantages of File system

Here, are cons/drawback of the file system:

 Each application has its data file so, the same data may have to be recorded and stored
many times.

 Data dependence in the file processing system are data-dependent, but, the problem is
incompatible with file format.

 Limited data sharing.

 The problem with security.

 Time-consuming.

 It allows you to maintain the record of the big firm having a large number of items.

 Required lots of labor work to do.

Disadvantages of the DBMS system

Here, are some cons/drawbacks of the DBMS system:

 Cost of Hardware and Software of a DBMS is quite high, which increases the budget
of your organization.

 Most database management systems are often complex systems, so the training for users
to use the DBMS is required.
 The use of the same program at a time by many users sometimes lead to the loss of
some data.

 DBMS can't perform sophisticated calculations

 Data-sets begins to grow large as it provides a more predictable query response time.

 It required a processor with the high speed of data processing.

 The database can fail because or power failure or the whole system stops.

 The cost of DBMS is depended on the environment, function, or recurrent annual


maintenance cost.

You might also like