Design and Analysis of Algorithms (DA) - Chapter 1 Notes
Introduction to Data Analytics
Data analytics is the science of analyzing raw data to make conclusions about that information. It involves
techniques and processes used to examine datasets, identify patterns, and extract meaningful insights. Data
analytics helps organizations make data-driven decisions by transforming complex information into actionable
insights. It is widely applied in industries such as healthcare, finance, marketing, and logistics. Modern data
analytics includes tools like Python, R, and specialized software platforms that enable effective data analysis.
Sources and Nature of Data
Data originates from multiple sources such as business transactions, sensors, social media, and web logs.
The nature of data can be structured (organized in rows and columns), semi-structured (e.g., JSON, XML), or
unstructured (text, images, videos). Each type of data requires specialized tools for analysis. For instance,
relational databases efficiently manage structured data, while text mining techniques process unstructured
data. Understanding data sources and their nature is vital for selecting appropriate analytical methods.
Data Classification
Data classification is the process of organizing data into categories to enhance its utility and security. There
are three main types of data: structured (organized in tables), semi-structured (partially organized), and
unstructured (e.g., text, images). Structured data fits neatly into relational databases, while semi-structured
data requires flexible tools like JSON parsers. Unstructured data, often the largest in volume, demands
advanced techniques like natural language processing (NLP) or image recognition for meaningful analysis.
Characteristics of Data
Key characteristics of data include volume (size), velocity (speed of generation), variety (different formats),
veracity (trustworthiness), and value (insights derived). Volume describes the growing size of datasets,
especially in Big Data. Velocity refers to the rapid flow of data from sources like social media and sensors.
Variety highlights data diversity in text, images, or videos. Veracity addresses data accuracy, and value
focuses on extracting meaningful insights for decision-making.
Introduction to Big Data Platform
A Big Data platform is a robust system designed to manage, process, and analyze massive data volumes.
Platforms like Hadoop, Apache Spark, and Google BigQuery enable organizations to store and analyze
extensive datasets efficiently. These platforms support distributed computing, real-time data processing, and
data visualization. They are widely used in industries where handling large data volumes is crucial, such as
finance, healthcare, and e-commerce.