0% found this document useful (0 votes)

4 views11 pages

Understanding Data Science and Roles

The document provides an overview of data science, including definitions of key roles such as data scientist, data analyst, data engineer, and data architect. It outlines various facets of data, including structured and unstructured data, and discusses the importance of data mining and data warehousing in extracting insights from large datasets. Additionally, it highlights the benefits and applications of data science across different sectors, emphasizing improved decision-making, efficiency, and innovation.

Uploaded by

prithivipt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views11 pages

Understanding Data Science and Roles

Uploaded by

prithivipt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

[Link] is Data Science?

• Data science is the domain of study that deals with vast volumes of data using modern tools
and techniques to find unseen patterns, derive meaningful information, and make business
decisions.
• Data science uses complex machine learning algorithms to build predictive models.
• The data used for analysis can come from many different sources and presented in various
formats.
2. What is Data Scientist
A data scientist is someone who uses their skills to mine the data, understand it and extract
insights from it. They usually work with a team of engineers and analysts to create models that
can be used for various purposes
3. Define Data Analyst
A data analyst works on getting information from various sources such as offline or online
databases, spreadsheets, surveys and so on. They also use analytical tools like
Excel/PowerPoint/Tableau etc., but mostly rely on statistical techniques to present their
findings in a readable format.
4. Define Data Engineer
A data engineer builds applications that collect and process data using technologies like
Hadoop, Spark etc. while ensuring its quality so that it can be used by other teams such as
analysts or scientists without any issues later down the line.
5. Define Data Architect
Architect’s role includes designing databases according to specific requirements so that they
function efficiently within an organization’s infrastructure
[Link] the different Facets of data
• In data science and big data you’ll come across many different types of data, and each of
them tends to require different tools and techniques. The main categories of data are these:
■ Structured
■ Unstructured
■ Natural language
■ Machine-generated
■ Graph-based
■ Audio, video, and images
■ Streaming
[Link] Data Mining.
Extracting of interesting patterns or knowledge from huge amount of data.
Extracting previously unknown data from large database & using it to make organisational
decisions
It is concerned with discovery of hidden knowledge.
It is useful in making critical organizational decisions partially those of strategic nature.
Examples for data- Relational database, data warehouse, Transactional database, Advanced
db, spatial & temporal db, time series data, stream data, multimedia data.
4. Draw KDD process
8. List the properties of Data Warehouse.
Subject-oriented, Integrated, Non-volatile, and Time-variant. These characteristics define
how data is organized and used within the warehouse for analysis and decision-making

1. Explain the Benefits and uses of Data Science.

Data science offers numerous benefits and applications across various sectors. It enables better
decision-making, improved customer experiences, increased efficiency, and new opportunities
for innovation. By analyzing data, businesses can identify trends, personalize services,
automate tasks, and ultimately drive growth.
Benefits:

Improved Decision-Making:

Data science provides insights and evidence-based information to support better

business decisions, moving away from guesswork and towards informed strategies.
 Enhanced Customer Experience:
By understanding customer behavior and preferences, businesses can personalize services,
tailor marketing efforts, and improve overall customer satisfaction.
 Increased Efficiency:
Data science can automate repetitive tasks, optimize processes, and streamline operations,
leading to increased efficiency and reduced costs.
 Innovation and Growth:
Analyzing data can uncover new opportunities, facilitate product development, and drive
innovation across various industries.
 Risk Management and Fraud Detection:
Data science can identify patterns and anomalies in data to detect fraud, manage risks, and
prevent potential losses.
Uses:
Healthcare:

Data science helps in disease prediction, personalized treatment plans, and optimizing
hospital operations.
 Finance:
It's used for fraud detection, risk management, and providing personalized financial advice.
 E-commerce:
Data science powers recommendation systems, optimizes supply chains, and personalizes
online shopping experiences.
 Transportation:
Data science helps optimize routes, manage traffic, and improve predictive maintenance for
vehicles.
 Marketing:
Data science enables targeted advertising, customer segmentation, and personalized marketing
campaigns.
 Education:
It helps in designing personalized learning experiences, tracking student performance, and
improving administrative efficiency.
 Manufacturing:
Data science optimizes production processes, predicts equipment failures, and improves overall
efficiency.
In essence, data science transforms raw data into actionable insights, enabling organizations to
make better decisions, improve efficiency, and drive innovation across various sectors

2. Explain about Facets of data

Very large amount of data will generate in big data and data science. These data is various
types and main categories of data are as follows:

a) Structured

b) Natural language

c) Graph-based

d) Streaming

e) Unstructured

f) Machine-generated

g) Audio, video and images

Structured Data

• Structured data is arranged in rows and column format. It helps for application to retrieve and
process data easily. Database management system is used for storing structured data.
• The term structured data refers to data that is identifiable because it is organized in a structure.
The most common form of structured data or records is a database where specific information
is stored based on a methodology of columns and rows.

• Structured data is also searchable by data type within content. Structured data is understood
by computers and is also efficiently organized for human readers.

• An Excel table is an example of structured data.

Unstructured Data

• Unstructured data is data that does not follow a specified format. Row and columns are not
used for unstructured data. Therefore it is difficult to retrieve required information.
Unstructured data has no identifiable structure.

• The unstructured data can be in the form of Text: (Documents, email messages, customer
feedbacks), audio, video, images. Email is an example of unstructured data.

• Even today in most of the organizations more than 80 % of the data are in unstructured form.
This carries lots of information. But extracting information from these various sources is a very
big challenge.

• Characteristics of unstructured data:

1. There is no structural restriction or binding for the data.

2. Data can be of any type.

3. Unstructured data does not follow any structural rules.

4. There are no predefined formats, restriction or sequence for unstructured data.

5. Since there is no structural binding for unstructured data, it is unpredictable in nature.

Natural Language

• Natural language is a special type of unstructured data.

• Natural language processing enables machines to recognize characters, words and sentences,
then apply meaning and understanding to that information. This helps machines to understand
language as humans do.

• Natural language processing is the driving force behind machine intelligence in many modern
real-world applications. The natural language processing community has had success in entity
recognition, topic recognition, summarization, text completion and sentiment analysis.

•For natural language processing to help machines understand human language, it must go
through speech recognition, natural language understanding and machine translation. It is an
iterative process comprised of several layers of text analysis.

Machine - Generated Data

• Machine-generated data is an information that is created without human interaction as a result

of a computer process or application activity. This means that data entered manually by an end-
user is not recognized to be machine-generated.

• Machine data contains a definitive record of all activity and behavior of our customers, users,
transactions, applications, servers, networks, factory machinery and so on.

• It's configuration data, data from APIs and message queues, change events, the output of
diagnostic commands and call detail records, sensor data from remote equipment and more.

• Examples of machine data are web server logs, call detail records, network event logs and
telemetry.

• Both Machine-to-Machine (M2M) and Human-to-Machine (H2M) interactions generate

machine data. Machine data is generated continuously by every processor-based system, as
well as many consumer-oriented systems.

• It can be either structured or unstructured. In recent years, the increase of machine data has
surged. The expansion of mobile devices, virtual servers and desktops, as well as cloud- based
services and RFID technologies, is making IT infrastructures more complex.

Graph-based or Network Data

•Graphs are data structures to describe relationships and interactions between entities in
complex systems. In general, a graph contains a collection of entities called nodes and another
collection of interactions between a pair of nodes called edges.

• Nodes represent entities, which can be of any object type that is relevant to our problem
domain. By connecting nodes with edges, we will end up with a graph (network) of nodes.

• A graph database stores nodes and relationships instead of tables or documents. Data is stored
just like we might sketch ideas on a whiteboard. Our data is stored without restricting it to a
predefined model, allowing a very flexible way of thinking about and using it.

• Graph databases are used to store graph-based data and are queried with specialized query
languages such as SPARQL.

• Graph databases are capable of sophisticated fraud prevention. With graph databases, we
can use relationships to process financial and purchase transactions in near-real time. With fast
graph queries, we are able to detect that, for example, a potential purchaser is using the same
email address and credit card as included in a known fraud case.

• Graph databases can also help user easily detect relationship patterns such as multiple people
associated with a personal email address or multiple people sharing the same IP address but
residing in different physical addresses.

• Graph databases are a good choice for recommendation applications. With graph databases,
we can store in a graph relationships between information categories such as customer interests,
friends and purchase history. We can use a highly available graph database to make product
recommendations to a user based on which products are purchased by others who follow the
same sport and have similar purchase history.

• Graph theory is probably the main method in social network analysis in the early history of
the social network concept. The approach is applied to social network analysis in order to
determine important features of the network such as the nodes and links (for example
influencers and the followers).
• Influencers on social network have been identified as users that have impact on the activities
or opinion of other users by way of followership or influence on decision made by other users
on the network as shown in Fig. 1.2.1.

• Graph theory has proved to be very effective on large-scale datasets such as social network
data. This is because it is capable of by-passing the building of an actual visual representation
of the data to run directly on data matrices.

Audio, Image and Video

• Audio, image and video are data types that pose specific challenges to a data scientist. Tasks
that are trivial for humans, such as recognizing objects in pictures, turn out to be challenging
for computers.

•The terms audio and video commonly refers to the time-based media storage format for
sound/music and moving pictures information. Audio and video digital recording, also referred
as audio and video codecs, can be uncompressed, lossless compressed or lossy compressed
depending on the desired quality and use cases.
• It is important to remark that multimedia data is one of the most important sources of
information and knowledge; the integration, transformation and indexing of multimedia data
bring significant challenges in data management and analysis. Many challenges have to be
addressed including big data, multidisciplinary nature of Data Science and heterogeneity.

Streaming Data

Streaming data is data that is generated continuously by thousands of data sources, which
typically send in the data records simultaneously and in small sizes (order of Kilobytes).

• Streaming data includes a wide variety of data such as log files generated by customers using
your mobile or web applications, ecommerce purchases, in-game player activity, information
from social networks, financial trading floors or geospatial services and telemetry from
connected devices or instrumentation in data centers.
3. Explain in detail about Data modelling phase in Data Science process.
Understanding the Business Problem and Data Requirements:

 The first step is to clearly define the business problem that the data science project aims
to solve.
 This involves understanding the specific questions the data needs to answer and the
goals the project aims to achieve.
 This stage also involves identifying the data sources and the type of data needed to
address the problem.
2. Conceptual Data Modeling:

 This stage involves creating a high-level, abstract representation of the data, focusing
on the core entities and their relationships.
 It's independent of any specific technology or database.
 For example, in a customer relationship management (CRM) system, entities might
include "Customer," "Order," and "Product," with relationships like "Customer places
Order" and "Order contains Product".
3. Logical Data Modeling:

 This stage refines the conceptual model by adding more detail, including specific data
types, attributes, and constraints.
 It defines how data will be organized within a specific database or data management
system.
 For example, it might specify that the "Customer" entity has attributes like
"CustomerID" (integer), "Name" (string), and "Address" (string).
4. Physical Data Modeling:

 This stage involves translating the logical model into a specific database schema,
including tables, columns, indexes, and relationships.
 It focuses on performance optimization and storage considerations.
 This stage is typically handled by database administrators and developers.
5. Validation and Refinement:

 Once the data model is created, it's crucial to validate it against the business
requirements and data quality standards.
 This involves ensuring that the model accurately represents the data and supports the
intended analysis and decision-making processes.
 The model might be refined based on feedback from stakeholders or during the data
exploration and analysis phase.
Key Benefits of Data Modeling:

 Data Integrity and Consistency: Ensures data accuracy, reliability, and uniformity
across the system.
 Efficient Querying and Analysis: Facilitates faster and more efficient data retrieval
and analysis.
 Improved Communication: Provides a common language for stakeholders to
understand and discuss data-related concepts.
 Better Decision-Making: Enables informed decision-making based on accurate and
reliable data.
 Compliance and Security: Helps in adhering to data governance policies and security
regulations.
4. Explain in detail about Data Mining and Data warehousing?
Data warehousing is the process of collecting, storing, and managing large volumes of data from
various sources in a central repository, while data mining is the process of analyzing that data to
discover patterns, trends, and insights that can be used for decision-making
Data Warehousing:

 Purpose:

Data warehousing aims to consolidate data from multiple sources into a single,
consistent, and reliable repository. This allows for efficient querying, reporting, and
analysis of historical data.
 Characteristics:
Data warehouses are typically subject-oriented (organized around specific business areas),
integrated (combining data from different sources), time-variant (containing historical data),
and non-volatile (data is not frequently updated).
 Key Processes:
The core process in data warehousing is ETL (Extract, Transform, Load), which involves
extracting data from various sources, transforming it into a suitable format, and loading it into
the warehouse.
 Benefits:
Data warehousing improves data quality, provides a comprehensive view of business
operations, supports informed decision-making, and enhances system performance by
separating analytical processing from transactional databases.
Data Mining:

 Purpose:
Data mining utilizes computational techniques to uncover hidden patterns, correlations,
and anomalies within large datasets.
 Key Techniques:
Common data mining techniques include association rule mining, classification, clustering, and
regression analysis.
 Applications:
Data mining is used in various industries, such as marketing (customer segmentation, targeted
advertising), finance (fraud detection, risk management), and healthcare (disease prediction,
personalized medicine).
 Benefits:
Data mining provides actionable insights that can be used to improve business strategies,
enhance customer relationships, optimize operations, and predict future trends.

Relationship between Data Warehousing and Data Mining:

 Data Warehousing as a Foundation:

Data warehousing provides the necessary infrastructure and data foundation for
effective data mining. Without a well-structured and organized data warehouse, data
mining efforts would be significantly hampered.
 Data Mining as an Analytical Tool:
Data mining leverages the data stored in the warehouse to extract valuable knowledge and
insights. It's the process of turning raw data into actionable intelligence.
 Complementary Processes:
Data warehousing and data mining work together to enable businesses to make data-driven
decisions and gain a competitive edge

5. Explain the various basic statistical descriptions of data.

Measures of Central Tendency:

 Mean: The average of all data points, calculated by summing all values and dividing
by the number of values. Sensitive to outliers.
 Median: The middle value in a sorted dataset. More robust to outliers than the mean.
 Mode: The most frequently occurring value in the dataset.


Measures of Variability:

 Range: The difference between the highest and lowest values in a dataset.
 Variance: Measures how spread out the data is from the mean, calculated by averaging
the squared differences between each data point and the mean.
 Standard Deviation: The square root of the variance. Provides a measure of spread in
the same units as the original data, making it more interpretable than variance.


Measures of Distribution:

 Skewness:
Describes the asymmetry of the data distribution. A positive skew indicates a long tail
on the right, and a negative skew indicates a long tail on the left.

 Kurtosis:

Describes the "peakedness" of the distribution. High kurtosis indicates a sharp peak and
heavy tails, while low kurtosis indicates a flatter peak and lighter tails

Data Science Fundamentals Overview
No ratings yet
Data Science Fundamentals Overview
14 pages
Introduction to Data Science Basics
No ratings yet
Introduction to Data Science Basics
20 pages
PART-B (16 Marks)
No ratings yet
PART-B (16 Marks)
73 pages
Unit 1 Student Material
No ratings yet
Unit 1 Student Material
29 pages
Ocs 353
No ratings yet
Ocs 353
229 pages
Data Science and R Programming Overview
No ratings yet
Data Science and R Programming Overview
41 pages
Understanding Data and Data Science Concepts
No ratings yet
Understanding Data and Data Science Concepts
17 pages
Foundations of Data Science Course Overview
No ratings yet
Foundations of Data Science Course Overview
65 pages
FDSA Unit1
No ratings yet
FDSA Unit1
26 pages
Introduction to Data Science Basics
No ratings yet
Introduction to Data Science Basics
60 pages
FODS Unit-1
No ratings yet
FODS Unit-1
33 pages
CS3352 Fds
No ratings yet
CS3352 Fds
23 pages
Applications and Facets of Data Science
No ratings yet
Applications and Facets of Data Science
23 pages
Foundations of Data Science Syllabus
No ratings yet
Foundations of Data Science Syllabus
277 pages
Data Science Fundamentals Overview
No ratings yet
Data Science Fundamentals Overview
127 pages
Lecture 1
No ratings yet
Lecture 1
8 pages
Introduction to Data Science Overview
No ratings yet
Introduction to Data Science Overview
98 pages
Foundations of Data Science Syllabus
No ratings yet
Foundations of Data Science Syllabus
244 pages
Foundations of Data Science Syllabus
No ratings yet
Foundations of Data Science Syllabus
217 pages
Data Science and Big Data Overview
No ratings yet
Data Science and Big Data Overview
36 pages
FDS Unit-1
No ratings yet
FDS Unit-1
33 pages
Data Science Fundamentals Overview
No ratings yet
Data Science Fundamentals Overview
63 pages
Overview of Data Science Fundamentals
100% (1)
Overview of Data Science Fundamentals
27 pages
Data Science - Mass-With Question Bank-3cs
No ratings yet
Data Science - Mass-With Question Bank-3cs
72 pages
Introduction to Data Science Basics
No ratings yet
Introduction to Data Science Basics
39 pages
Data Science Foundations Overview
No ratings yet
Data Science Foundations Overview
43 pages
Data Science and Big Data Overview
No ratings yet
Data Science and Big Data Overview
18 pages
Staging Data: From Unstructured to Structured
No ratings yet
Staging Data: From Unstructured to Structured
27 pages
Data Science Fundamentals and Process
No ratings yet
Data Science Fundamentals and Process
32 pages
Data Science Foundations Overview
No ratings yet
Data Science Foundations Overview
25 pages
Data Science Fundamentals Overview
No ratings yet
Data Science Fundamentals Overview
164 pages
Prescriptive Analysis in Data Science
No ratings yet
Prescriptive Analysis in Data Science
29 pages
Data Science: Insights and Applications
No ratings yet
Data Science: Insights and Applications
13 pages
Introduction to Data Science Course
No ratings yet
Introduction to Data Science Course
53 pages
Unit 1
No ratings yet
Unit 1
185 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
138 pages
Understanding Data Science and Big Data
No ratings yet
Understanding Data Science and Big Data
55 pages
Data Science Fundamentals Syllabus
No ratings yet
Data Science Fundamentals Syllabus
75 pages
Data Science Fundamentals Overview
No ratings yet
Data Science Fundamentals Overview
34 pages
Introduction of Ds
No ratings yet
Introduction of Ds
49 pages
Foundations of Data Science Overview
No ratings yet
Foundations of Data Science Overview
22 pages
Introduction to Big Data and Data Science
No ratings yet
Introduction to Big Data and Data Science
18 pages
AD3491 Data Science Notes Summary
100% (1)
AD3491 Data Science Notes Summary
35 pages
Understanding Big Data and Data Science
No ratings yet
Understanding Big Data and Data Science
19 pages
Ids Unit-I
No ratings yet
Ids Unit-I
13 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
57 pages
CS3352 Data Science Syllabus Overview
No ratings yet
CS3352 Data Science Syllabus Overview
30 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
45 pages
Introduction to Data Science Basics
No ratings yet
Introduction to Data Science Basics
96 pages
Understanding Data Science Basics
No ratings yet
Understanding Data Science Basics
14 pages
Data Science Foundations: Unit 1 Notes
No ratings yet
Data Science Foundations: Unit 1 Notes
75 pages
Data Science
No ratings yet
Data Science
108 pages
Understanding Data Science Fundamentals
No ratings yet
Understanding Data Science Fundamentals
135 pages
Key Components of Data Science Explained
No ratings yet
Key Components of Data Science Explained
20 pages
Data Science Fundamentals and Process
No ratings yet
Data Science Fundamentals and Process
26 pages
Fundamentals of Data Science Overview
80% (5)
Fundamentals of Data Science Overview
62 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
7 pages
Introduction to Data Science Course Overview
No ratings yet
Introduction to Data Science Course Overview
42 pages
Study Material - DSF
No ratings yet
Study Material - DSF
109 pages
E-commerce Application Test Plan Guide
No ratings yet
E-commerce Application Test Plan Guide
15 pages
Understanding Absence-of-Errors Fallacy
No ratings yet
Understanding Absence-of-Errors Fallacy
1 page
Data Science Lab Manual: Python Basics
No ratings yet
Data Science Lab Manual: Python Basics
8 pages
Software Testing Lab Manual Overview
No ratings yet
Software Testing Lab Manual Overview
9 pages
Microprocessor-Based Water Level Controller
No ratings yet
Microprocessor-Based Water Level Controller
11 pages
Hand-Pull Manual Lubricators Guide
No ratings yet
Hand-Pull Manual Lubricators Guide
5 pages
NPTEL Week 12 Assignment Overview
No ratings yet
NPTEL Week 12 Assignment Overview
5 pages
Apple Reports Second Quarter Results
No ratings yet
Apple Reports Second Quarter Results
8 pages
TCP Flow Control and Error Management
No ratings yet
TCP Flow Control and Error Management
19 pages
Bomba Hidráulica (Ppal) Serie A.55 Serie H1P250 Parts Manual (H1P250R E8 C3 N D6 C G2 NN L40 K38 R L 24 PN NNN NNN)
No ratings yet
Bomba Hidráulica (Ppal) Serie A.55 Serie H1P250 Parts Manual (H1P250R E8 C3 N D6 C G2 NN L40 K38 R L 24 PN NNN NNN)
96 pages
Visa4UK - Visa Application Complete
No ratings yet
Visa4UK - Visa Application Complete
2 pages
WWW Javatpoint Com Microservices Interview Questions
No ratings yet
WWW Javatpoint Com Microservices Interview Questions
12 pages
Key Concepts of Operating Systems
No ratings yet
Key Concepts of Operating Systems
15 pages
Bidirectional Shift Register Overview
No ratings yet
Bidirectional Shift Register Overview
11 pages
Civil Engineering Application by Joy Orama
No ratings yet
Civil Engineering Application by Joy Orama
3 pages
Uzima Borehole Survey Report
No ratings yet
Uzima Borehole Survey Report
34 pages
Real-Time Transit Tracking Solutions
No ratings yet
Real-Time Transit Tracking Solutions
10 pages
Bandhan Bank Schedule of Charges
No ratings yet
Bandhan Bank Schedule of Charges
4 pages
CNSS Model for Information Security Cells
No ratings yet
CNSS Model for Information Security Cells
5 pages
AI-Driven Auto Insurance System Overview
No ratings yet
AI-Driven Auto Insurance System Overview
6 pages
Kärcher B 40 W BP Parts List
100% (3)
Kärcher B 40 W BP Parts List
70 pages
Solar V Excavator Electrical System Guide
No ratings yet
Solar V Excavator Electrical System Guide
44 pages
Essential ICT Tools for Hardware Servicing
100% (1)
Essential ICT Tools for Hardware Servicing
1 page
Web Technology In-Semester Solutions
No ratings yet
Web Technology In-Semester Solutions
16 pages
SMS-Enabled Library Management System
No ratings yet
SMS-Enabled Library Management System
2 pages
Preview ISO+7001-2023
0% (1)
Preview ISO+7001-2023
5 pages
Pace MSP995 User Manual Overview
No ratings yet
Pace MSP995 User Manual Overview
14 pages
Overview of BTRFS File System
No ratings yet
Overview of BTRFS File System
22 pages
Data Scientist Profile: Mellas Mpakaniye
No ratings yet
Data Scientist Profile: Mellas Mpakaniye
3 pages
Career Choices and ChatGPT Insights
No ratings yet
Career Choices and ChatGPT Insights
2 pages
Operating System Resource Management Overview
No ratings yet
Operating System Resource Management Overview
3 pages
IM5116 Inductive Sensor Specifications
No ratings yet
IM5116 Inductive Sensor Specifications
3 pages
150 C Programming Examples with Output
100% (2)
150 C Programming Examples with Output
50 pages
En 5 25 110 SPCH 130038
No ratings yet
En 5 25 110 SPCH 130038
5 pages

Understanding Data Science and Roles

Uploaded by

Understanding Data Science and Roles

Uploaded by

[Link] is Data Science?

1. Explain the Benefits and uses of Data Science.

Data science provides insights and evidence-based information to support better

2. Explain about Facets of data

g) Audio, video and images

• An Excel table is an example of structured data.

• Characteristics of unstructured data:

1. There is no structural restriction or binding for the data.

2. Data can be of any type.

3. Unstructured data does not follow any structural rules.

4. There are no predefined formats, restriction or sequence for unstructured data.

5. Since there is no structural binding for unstructured data, it is unpredictable in nature.

• Natural language is a special type of unstructured data.

Machine - Generated Data

• Machine-generated data is an information that is created without human interaction as a result

• Both Machine-to-Machine (M2M) and Human-to-Machine (H2M) interactions generate

Graph-based or Network Data

Audio, Image and Video

Relationship between Data Warehousing and Data Mining:

 Data Warehousing as a Foundation:

5. Explain the various basic statistical descriptions of data.

You might also like