0% found this document useful (0 votes)
10 views10 pages

Data Concepts and Terminology Guide

The document provides definitions and examples for key data-related concepts, including data, information, datasets, and various data processing techniques. It covers topics such as data types, data cleaning, data analysis, and visualization tools, along with roles like data engineers and data scientists. Additionally, it discusses methods for data manipulation, statistical analysis, and performance measurement.

Uploaded by

hisham985375
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views10 pages

Data Concepts and Terminology Guide

The document provides definitions and examples for key data-related concepts, including data, information, datasets, and various data processing techniques. It covers topics such as data types, data cleaning, data analysis, and visualization tools, along with roles like data engineers and data scientists. Additionally, it discusses methods for data manipulation, statistical analysis, and performance measurement.

Uploaded by

hisham985375
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

1.

Data
Raw facts and figures.
Example: In a real dataset, data is applied when raw facts and figures.

2. Information
Processed and meaningful data.
Example: In a real dataset, information is applied when processed and meaningful data.

3. Dataset
A structured collection of related data.
Example: In a real dataset, dataset is applied when a structured collection of related data.

4. Variable
A characteristic that can change.
Example: In a real dataset, variable is applied when a characteristic that can change.

5. Observation
A single record or row in a dataset.
Example: In a real dataset, observation is applied when a single record or row in a dataset.

6. Attribute
A column describing a feature of data.
Example: In a real dataset, attribute is applied when a column describing a feature of data.

7. Data Type
The category of data, such as integer, float, or string.
Example: In a real dataset, data type is applied when the category of data, such as integer, float, or
string.

8. Structured Data
Organized data stored in rows and columns.
Example: In a real dataset, structured data is applied when organized data stored in rows and
columns.

9. Unstructured Data
Data without a fixed format, such as text or images.
Example: In a real dataset, unstructured data is applied when data without a fixed format, such as
text or images.

10. Semi-structured Data


Partially organized data like JSON or XML.
Example: In a real dataset, semi-structured data is applied when partially organized data like json or
xml.

11. Data Cleaning


Fixing errors, removing duplicates, and handling missing values.
Example: In a real dataset, data cleaning is applied when fixing errors, removing duplicates, and
handling missing values.

12. Data Wrangling


Transforming raw data into a usable format.
Example: In a real dataset, data wrangling is applied when transforming raw data into a usable
format.

13. Data Transformation


Changing the structure or format of data.
Example: In a real dataset, data transformation is applied when changing the structure or format of
data.

14. Preprocessing
Preparing data before analysis.
Example: In a real dataset, preprocessing is applied when preparing data before analysis.

15. Missing Values


Blank or null values in a dataset.
Example: In a real dataset, missing values is applied when blank or null values in a dataset.

16. Outliers
Values that differ significantly from the rest of the data.
Example: In a real dataset, outliers is applied when values that differ significantly from the rest of
the data.

17. Duplicate Data


Repeated records in a dataset.
Example: In a real dataset, duplicate data is applied when repeated records in a dataset.

18. Normalization
Scaling data into a specific range.
Example: In a real dataset, normalization is applied when scaling data into a specific range.

19. Standardization
Rescaling data to mean 0 and standard deviation 1.
Example: In a real dataset, standardization is applied when rescaling data to mean 0 and standard
deviation 1.

20. Filtering
Selecting rows based on conditions.
Example: In a real dataset, filtering is applied when selecting rows based on conditions.

21. Sorting
Arranging data in ascending or descending order.
Example: In a real dataset, sorting is applied when arranging data in ascending or descending
order.
22. Aggregation
Summarizing data using functions like sum or average.
Example: In a real dataset, aggregation is applied when summarizing data using functions like sum
or average.

23. Pivot Table


Summarizing large datasets efficiently.
Example: In a real dataset, pivot table is applied when summarizing large datasets efficiently.

24. Measure
A numeric data value used in analysis.
Example: In a real dataset, measure is applied when a numeric data value used in analysis.

25. Dimension
A category such as time, product, or region.
Example: In a real dataset, dimension is applied when a category such as time, product, or region.

26. KPI
A Key Performance Indicator used to measure success.
Example: In a real dataset, kpi is applied when a key performance indicator used to measure
success.

27. Metric
A measurable value used to track performance.
Example: In a real dataset, metric is applied when a measurable value used to track performance.

28. Trend
Pattern or movement in data over time.
Example: In a real dataset, trend is applied when pattern or movement in data over time.

29. Correlation
A relationship between two variables.
Example: In a real dataset, correlation is applied when a relationship between two variables.

30. Causation
One variable influencing another.
Example: In a real dataset, causation is applied when one variable influencing another.

31. Regression
Predicting values using mathematical relationships.
Example: In a real dataset, regression is applied when predicting values using mathematical
relationships.

32. Model
A representation of a real-world process.
Example: In a real dataset, model is applied when a representation of a real-world process.

33. Hypothesis
An assumption that can be tested.
Example: In a real dataset, hypothesis is applied when an assumption that can be tested.

34. Hypothesis Testing


Evaluating assumptions using statistics.
Example: In a real dataset, hypothesis testing is applied when evaluating assumptions using
statistics.

35. Population
The entire group being studied.
Example: In a real dataset, population is applied when the entire group being studied.

36. Sample
A subset of the population.
Example: In a real dataset, sample is applied when a subset of the population.

37. Mean
The average value.
Example: In a real dataset, mean is applied when the average value.

38. Median
The middle value in sorted data.
Example: In a real dataset, median is applied when the middle value in sorted data.

39. Mode
The most frequently occurring value.
Example: In a real dataset, mode is applied when the most frequently occurring value.

40. Variance
How far values spread out from the mean.
Example: In a real dataset, variance is applied when how far values spread out from the mean.

41. Standard Deviation


A measure of data spread.
Example: In a real dataset, standard deviation is applied when a measure of data spread.

42. Distribution
How data values are spread.
Example: In a real dataset, distribution is applied when how data values are spread.

43. Histogram
A graph showing frequency distribution.
Example: In a real dataset, histogram is applied when a graph showing frequency distribution.
44. Bar Chart
A chart comparing categories.
Example: In a real dataset, bar chart is applied when a chart comparing categories.

45. Pie Chart


A circular chart showing proportions.
Example: In a real dataset, pie chart is applied when a circular chart showing proportions.

46. Line Chart


A chart showing trends over time.
Example: In a real dataset, line chart is applied when a chart showing trends over time.

47. Scatter Plot


A plot showing relationships between variables.
Example: In a real dataset, scatter plot is applied when a plot showing relationships between
variables.

48. Dashboard
A visual summary of key metrics.
Example: In a real dataset, dashboard is applied when a visual summary of key metrics.

49. Visualization
Graphical representation of data.
Example: In a real dataset, visualization is applied when graphical representation of data.

50. Insights
Meaningful findings extracted from data.
Example: In a real dataset, insights is applied when meaningful findings extracted from data.

51. SQL
Language used to interact with databases.
Example: In a real dataset, sql is applied when language used to interact with databases.

52. Query
A request for data from a database.
Example: In a real dataset, query is applied when a request for data from a database.

53. Table
Structured data stored in rows and columns.
Example: In a real dataset, table is applied when structured data stored in rows and columns.

54. Schema
The structure of a database.
Example: In a real dataset, schema is applied when the structure of a database.
55. Primary Key
A unique identifier for a row.
Example: In a real dataset, primary key is applied when a unique identifier for a row.

56. Foreign Key


A field linking two tables.
Example: In a real dataset, foreign key is applied when a field linking two tables.

57. Join
Combining data from multiple tables.
Example: In a real dataset, join is applied when combining data from multiple tables.

58. Index
Improves search speed in a database.
Example: In a real dataset, index is applied when improves search speed in a database.

59. NoSQL
Non-relational database systems.
Example: In a real dataset, nosql is applied when non-relational database systems.

60. API
Interface allowing systems to exchange data.
Example: In a real dataset, api is applied when interface allowing systems to exchange data.

61. ETL
Extract, Transform, Load process.
Example: In a real dataset, etl is applied when extract, transform, load process.

62. Data Pipeline


Automated flow of data.
Example: In a real dataset, data pipeline is applied when automated flow of data.

63. Big Data


Extremely large datasets.
Example: In a real dataset, big data is applied when extremely large datasets.

64. Analytics
The study of data patterns.
Example: In a real dataset, analytics is applied when the study of data patterns.

65. Exploratory Data Analysis


Initial investigation to understand data.
Example: In a real dataset, exploratory data analysis is applied when initial investigation to
understand data.

66. Feature
Input variable used in modeling.
Example: In a real dataset, feature is applied when input variable used in modeling.

67. Label
Output variable for prediction.
Example: In a real dataset, label is applied when output variable for prediction.

68. Bias
Systematic error in data or model.
Example: In a real dataset, bias is applied when systematic error in data or model.

69. Variance
Sensitivity of a model to changes in data.
Example: In a real dataset, variance is applied when sensitivity of a model to changes in data.

70. Overfitting
Model learns noise instead of pattern.
Example: In a real dataset, overfitting is applied when model learns noise instead of pattern.

71. Underfitting
Model is too simple to capture patterns.
Example: In a real dataset, underfitting is applied when model is too simple to capture patterns.

72. Data Engineer


Professional who builds data systems.
Example: In a real dataset, data engineer is applied when professional who builds data systems.

73. Data Analyst


Professional who analyzes data.
Example: In a real dataset, data analyst is applied when professional who analyzes data.

74. Data Scientist


Professional who builds predictive models.
Example: In a real dataset, data scientist is applied when professional who builds predictive
models.

75. Business Intelligence


Technologies for analyzing business data.
Example: In a real dataset, business intelligence is applied when technologies for analyzing
business data.

76. Power BI
Microsoft's business analytics tool.
Example: In a real dataset, power bi is applied when microsoft's business analytics tool.

77. Tableau
Interactive data visualization software.
Example: In a real dataset, tableau is applied when interactive data visualization software.

78. Python
Programming language for data.
Example: In a real dataset, python is applied when programming language for data.

79. Pandas
Python library for data manipulation.
Example: In a real dataset, pandas is applied when python library for data manipulation.

80. NumPy
Library for numerical computing.
Example: In a real dataset, numpy is applied when library for numerical computing.

81. Matplotlib
Library for 2D charts.
Example: In a real dataset, matplotlib is applied when library for 2d charts.

82. Seaborn
Statistical visualization library.
Example: In a real dataset, seaborn is applied when statistical visualization library.

83. CSV
Comma-separated values file format.
Example: In a real dataset, csv is applied when comma-separated values file format.

84. JSON
Lightweight data storage format.
Example: In a real dataset, json is applied when lightweight data storage format.

85. Metadata
Information describing data.
Example: In a real dataset, metadata is applied when information describing data.

86. Time Series Data


Data recorded over time.
Example: In a real dataset, time series data is applied when data recorded over time.

87. Real-time Data


Data that updates instantly.
Example: In a real dataset, real-time data is applied when data that updates instantly.

88. Structured Query


SQL request to retrieve data.
Example: In a real dataset, structured query is applied when sql request to retrieve data.
89. Data Governance
Managing data access and quality.
Example: In a real dataset, data governance is applied when managing data access and quality.

90. Data Quality


Accuracy and reliability of data.
Example: In a real dataset, data quality is applied when accuracy and reliability of data.

91. Anomaly
Unexpected or unusual data point.
Example: In a real dataset, anomaly is applied when unexpected or unusual data point.

92. Feature Engineering


Creating new features to improve models.
Example: In a real dataset, feature engineering is applied when creating new features to improve
models.

93. Segmentation
Dividing data into groups.
Example: In a real dataset, segmentation is applied when dividing data into groups.

94. Clustering
Grouping similar data points.
Example: In a real dataset, clustering is applied when grouping similar data points.

95. Classification
Predicting categories.
Example: In a real dataset, classification is applied when predicting categories.

96. Prediction
Forecasting future values.
Example: In a real dataset, prediction is applied when forecasting future values.

97. Benchmark
Standard used for comparison.
Example: In a real dataset, benchmark is applied when standard used for comparison.

98. Automation
Using tools to simplify tasks.
Example: In a real dataset, automation is applied when using tools to simplify tasks.

99. Tokenization
Breaking text into words or pieces.
Example: In a real dataset, tokenization is applied when breaking text into words or pieces.
100. A/B Testing
Comparing two versions to see which performs better.
Example: In a real dataset, a/b testing is applied when comparing two versions to see which
performs better.

You might also like