1.
Data
Raw facts and figures.
Example: In a real dataset, data is applied when raw facts and figures.
2. Information
Processed and meaningful data.
Example: In a real dataset, information is applied when processed and meaningful data.
3. Dataset
A structured collection of related data.
Example: In a real dataset, dataset is applied when a structured collection of related data.
4. Variable
A characteristic that can change.
Example: In a real dataset, variable is applied when a characteristic that can change.
5. Observation
A single record or row in a dataset.
Example: In a real dataset, observation is applied when a single record or row in a dataset.
6. Attribute
A column describing a feature of data.
Example: In a real dataset, attribute is applied when a column describing a feature of data.
7. Data Type
The category of data, such as integer, float, or string.
Example: In a real dataset, data type is applied when the category of data, such as integer, float, or
string.
8. Structured Data
Organized data stored in rows and columns.
Example: In a real dataset, structured data is applied when organized data stored in rows and
columns.
9. Unstructured Data
Data without a fixed format, such as text or images.
Example: In a real dataset, unstructured data is applied when data without a fixed format, such as
text or images.
10. Semi-structured Data
Partially organized data like JSON or XML.
Example: In a real dataset, semi-structured data is applied when partially organized data like json or
xml.
11. Data Cleaning
Fixing errors, removing duplicates, and handling missing values.
Example: In a real dataset, data cleaning is applied when fixing errors, removing duplicates, and
handling missing values.
12. Data Wrangling
Transforming raw data into a usable format.
Example: In a real dataset, data wrangling is applied when transforming raw data into a usable
format.
13. Data Transformation
Changing the structure or format of data.
Example: In a real dataset, data transformation is applied when changing the structure or format of
data.
14. Preprocessing
Preparing data before analysis.
Example: In a real dataset, preprocessing is applied when preparing data before analysis.
15. Missing Values
Blank or null values in a dataset.
Example: In a real dataset, missing values is applied when blank or null values in a dataset.
16. Outliers
Values that differ significantly from the rest of the data.
Example: In a real dataset, outliers is applied when values that differ significantly from the rest of
the data.
17. Duplicate Data
Repeated records in a dataset.
Example: In a real dataset, duplicate data is applied when repeated records in a dataset.
18. Normalization
Scaling data into a specific range.
Example: In a real dataset, normalization is applied when scaling data into a specific range.
19. Standardization
Rescaling data to mean 0 and standard deviation 1.
Example: In a real dataset, standardization is applied when rescaling data to mean 0 and standard
deviation 1.
20. Filtering
Selecting rows based on conditions.
Example: In a real dataset, filtering is applied when selecting rows based on conditions.
21. Sorting
Arranging data in ascending or descending order.
Example: In a real dataset, sorting is applied when arranging data in ascending or descending
order.
22. Aggregation
Summarizing data using functions like sum or average.
Example: In a real dataset, aggregation is applied when summarizing data using functions like sum
or average.
23. Pivot Table
Summarizing large datasets efficiently.
Example: In a real dataset, pivot table is applied when summarizing large datasets efficiently.
24. Measure
A numeric data value used in analysis.
Example: In a real dataset, measure is applied when a numeric data value used in analysis.
25. Dimension
A category such as time, product, or region.
Example: In a real dataset, dimension is applied when a category such as time, product, or region.
26. KPI
A Key Performance Indicator used to measure success.
Example: In a real dataset, kpi is applied when a key performance indicator used to measure
success.
27. Metric
A measurable value used to track performance.
Example: In a real dataset, metric is applied when a measurable value used to track performance.
28. Trend
Pattern or movement in data over time.
Example: In a real dataset, trend is applied when pattern or movement in data over time.
29. Correlation
A relationship between two variables.
Example: In a real dataset, correlation is applied when a relationship between two variables.
30. Causation
One variable influencing another.
Example: In a real dataset, causation is applied when one variable influencing another.
31. Regression
Predicting values using mathematical relationships.
Example: In a real dataset, regression is applied when predicting values using mathematical
relationships.
32. Model
A representation of a real-world process.
Example: In a real dataset, model is applied when a representation of a real-world process.
33. Hypothesis
An assumption that can be tested.
Example: In a real dataset, hypothesis is applied when an assumption that can be tested.
34. Hypothesis Testing
Evaluating assumptions using statistics.
Example: In a real dataset, hypothesis testing is applied when evaluating assumptions using
statistics.
35. Population
The entire group being studied.
Example: In a real dataset, population is applied when the entire group being studied.
36. Sample
A subset of the population.
Example: In a real dataset, sample is applied when a subset of the population.
37. Mean
The average value.
Example: In a real dataset, mean is applied when the average value.
38. Median
The middle value in sorted data.
Example: In a real dataset, median is applied when the middle value in sorted data.
39. Mode
The most frequently occurring value.
Example: In a real dataset, mode is applied when the most frequently occurring value.
40. Variance
How far values spread out from the mean.
Example: In a real dataset, variance is applied when how far values spread out from the mean.
41. Standard Deviation
A measure of data spread.
Example: In a real dataset, standard deviation is applied when a measure of data spread.
42. Distribution
How data values are spread.
Example: In a real dataset, distribution is applied when how data values are spread.
43. Histogram
A graph showing frequency distribution.
Example: In a real dataset, histogram is applied when a graph showing frequency distribution.
44. Bar Chart
A chart comparing categories.
Example: In a real dataset, bar chart is applied when a chart comparing categories.
45. Pie Chart
A circular chart showing proportions.
Example: In a real dataset, pie chart is applied when a circular chart showing proportions.
46. Line Chart
A chart showing trends over time.
Example: In a real dataset, line chart is applied when a chart showing trends over time.
47. Scatter Plot
A plot showing relationships between variables.
Example: In a real dataset, scatter plot is applied when a plot showing relationships between
variables.
48. Dashboard
A visual summary of key metrics.
Example: In a real dataset, dashboard is applied when a visual summary of key metrics.
49. Visualization
Graphical representation of data.
Example: In a real dataset, visualization is applied when graphical representation of data.
50. Insights
Meaningful findings extracted from data.
Example: In a real dataset, insights is applied when meaningful findings extracted from data.
51. SQL
Language used to interact with databases.
Example: In a real dataset, sql is applied when language used to interact with databases.
52. Query
A request for data from a database.
Example: In a real dataset, query is applied when a request for data from a database.
53. Table
Structured data stored in rows and columns.
Example: In a real dataset, table is applied when structured data stored in rows and columns.
54. Schema
The structure of a database.
Example: In a real dataset, schema is applied when the structure of a database.
55. Primary Key
A unique identifier for a row.
Example: In a real dataset, primary key is applied when a unique identifier for a row.
56. Foreign Key
A field linking two tables.
Example: In a real dataset, foreign key is applied when a field linking two tables.
57. Join
Combining data from multiple tables.
Example: In a real dataset, join is applied when combining data from multiple tables.
58. Index
Improves search speed in a database.
Example: In a real dataset, index is applied when improves search speed in a database.
59. NoSQL
Non-relational database systems.
Example: In a real dataset, nosql is applied when non-relational database systems.
60. API
Interface allowing systems to exchange data.
Example: In a real dataset, api is applied when interface allowing systems to exchange data.
61. ETL
Extract, Transform, Load process.
Example: In a real dataset, etl is applied when extract, transform, load process.
62. Data Pipeline
Automated flow of data.
Example: In a real dataset, data pipeline is applied when automated flow of data.
63. Big Data
Extremely large datasets.
Example: In a real dataset, big data is applied when extremely large datasets.
64. Analytics
The study of data patterns.
Example: In a real dataset, analytics is applied when the study of data patterns.
65. Exploratory Data Analysis
Initial investigation to understand data.
Example: In a real dataset, exploratory data analysis is applied when initial investigation to
understand data.
66. Feature
Input variable used in modeling.
Example: In a real dataset, feature is applied when input variable used in modeling.
67. Label
Output variable for prediction.
Example: In a real dataset, label is applied when output variable for prediction.
68. Bias
Systematic error in data or model.
Example: In a real dataset, bias is applied when systematic error in data or model.
69. Variance
Sensitivity of a model to changes in data.
Example: In a real dataset, variance is applied when sensitivity of a model to changes in data.
70. Overfitting
Model learns noise instead of pattern.
Example: In a real dataset, overfitting is applied when model learns noise instead of pattern.
71. Underfitting
Model is too simple to capture patterns.
Example: In a real dataset, underfitting is applied when model is too simple to capture patterns.
72. Data Engineer
Professional who builds data systems.
Example: In a real dataset, data engineer is applied when professional who builds data systems.
73. Data Analyst
Professional who analyzes data.
Example: In a real dataset, data analyst is applied when professional who analyzes data.
74. Data Scientist
Professional who builds predictive models.
Example: In a real dataset, data scientist is applied when professional who builds predictive
models.
75. Business Intelligence
Technologies for analyzing business data.
Example: In a real dataset, business intelligence is applied when technologies for analyzing
business data.
76. Power BI
Microsoft's business analytics tool.
Example: In a real dataset, power bi is applied when microsoft's business analytics tool.
77. Tableau
Interactive data visualization software.
Example: In a real dataset, tableau is applied when interactive data visualization software.
78. Python
Programming language for data.
Example: In a real dataset, python is applied when programming language for data.
79. Pandas
Python library for data manipulation.
Example: In a real dataset, pandas is applied when python library for data manipulation.
80. NumPy
Library for numerical computing.
Example: In a real dataset, numpy is applied when library for numerical computing.
81. Matplotlib
Library for 2D charts.
Example: In a real dataset, matplotlib is applied when library for 2d charts.
82. Seaborn
Statistical visualization library.
Example: In a real dataset, seaborn is applied when statistical visualization library.
83. CSV
Comma-separated values file format.
Example: In a real dataset, csv is applied when comma-separated values file format.
84. JSON
Lightweight data storage format.
Example: In a real dataset, json is applied when lightweight data storage format.
85. Metadata
Information describing data.
Example: In a real dataset, metadata is applied when information describing data.
86. Time Series Data
Data recorded over time.
Example: In a real dataset, time series data is applied when data recorded over time.
87. Real-time Data
Data that updates instantly.
Example: In a real dataset, real-time data is applied when data that updates instantly.
88. Structured Query
SQL request to retrieve data.
Example: In a real dataset, structured query is applied when sql request to retrieve data.
89. Data Governance
Managing data access and quality.
Example: In a real dataset, data governance is applied when managing data access and quality.
90. Data Quality
Accuracy and reliability of data.
Example: In a real dataset, data quality is applied when accuracy and reliability of data.
91. Anomaly
Unexpected or unusual data point.
Example: In a real dataset, anomaly is applied when unexpected or unusual data point.
92. Feature Engineering
Creating new features to improve models.
Example: In a real dataset, feature engineering is applied when creating new features to improve
models.
93. Segmentation
Dividing data into groups.
Example: In a real dataset, segmentation is applied when dividing data into groups.
94. Clustering
Grouping similar data points.
Example: In a real dataset, clustering is applied when grouping similar data points.
95. Classification
Predicting categories.
Example: In a real dataset, classification is applied when predicting categories.
96. Prediction
Forecasting future values.
Example: In a real dataset, prediction is applied when forecasting future values.
97. Benchmark
Standard used for comparison.
Example: In a real dataset, benchmark is applied when standard used for comparison.
98. Automation
Using tools to simplify tasks.
Example: In a real dataset, automation is applied when using tools to simplify tasks.
99. Tokenization
Breaking text into words or pieces.
Example: In a real dataset, tokenization is applied when breaking text into words or pieces.
100. A/B Testing
Comparing two versions to see which performs better.
Example: In a real dataset, a/b testing is applied when comparing two versions to see which
performs better.