0% found this document useful (0 votes)
14 views2 pages

DA Notes: Data Analytics Overview

The document provides an overview of data analytics, emphasizing its role in transforming raw data into actionable insights across various industries. It discusses the sources and nature of data, classifying it into structured, semi-structured, and unstructured types, and highlights the importance of understanding these characteristics for effective analysis. Additionally, it introduces Big Data platforms like Hadoop and Apache Spark, which facilitate the management and analysis of large datasets.

Uploaded by

optionalmail512
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views2 pages

DA Notes: Data Analytics Overview

The document provides an overview of data analytics, emphasizing its role in transforming raw data into actionable insights across various industries. It discusses the sources and nature of data, classifying it into structured, semi-structured, and unstructured types, and highlights the importance of understanding these characteristics for effective analysis. Additionally, it introduces Big Data platforms like Hadoop and Apache Spark, which facilitate the management and analysis of large datasets.

Uploaded by

optionalmail512
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Design and Analysis of Algorithms (DA) - Chapter 1 Notes

Introduction to Data Analytics

Data analytics is the science of analyzing raw data to make conclusions about that information. It involves

techniques and processes used to examine datasets, identify patterns, and extract meaningful insights. Data

analytics helps organizations make data-driven decisions by transforming complex information into actionable

insights. It is widely applied in industries such as healthcare, finance, marketing, and logistics. Modern data

analytics includes tools like Python, R, and specialized software platforms that enable effective data analysis.

Sources and Nature of Data

Data originates from multiple sources such as business transactions, sensors, social media, and web logs.

The nature of data can be structured (organized in rows and columns), semi-structured (e.g., JSON, XML), or

unstructured (text, images, videos). Each type of data requires specialized tools for analysis. For instance,

relational databases efficiently manage structured data, while text mining techniques process unstructured

data. Understanding data sources and their nature is vital for selecting appropriate analytical methods.

Data Classification

Data classification is the process of organizing data into categories to enhance its utility and security. There

are three main types of data: structured (organized in tables), semi-structured (partially organized), and

unstructured (e.g., text, images). Structured data fits neatly into relational databases, while semi-structured

data requires flexible tools like JSON parsers. Unstructured data, often the largest in volume, demands

advanced techniques like natural language processing (NLP) or image recognition for meaningful analysis.

Characteristics of Data

Key characteristics of data include volume (size), velocity (speed of generation), variety (different formats),

veracity (trustworthiness), and value (insights derived). Volume describes the growing size of datasets,

especially in Big Data. Velocity refers to the rapid flow of data from sources like social media and sensors.

Variety highlights data diversity in text, images, or videos. Veracity addresses data accuracy, and value

focuses on extracting meaningful insights for decision-making.


Introduction to Big Data Platform

A Big Data platform is a robust system designed to manage, process, and analyze massive data volumes.

Platforms like Hadoop, Apache Spark, and Google BigQuery enable organizations to store and analyze

extensive datasets efficiently. These platforms support distributed computing, real-time data processing, and

data visualization. They are widely used in industries where handling large data volumes is crucial, such as

finance, healthcare, and e-commerce.

You might also like