0% found this document useful (0 votes)
5 views13 pages

7 Mark Answers Data Analytics

The document provides an overview of data science, its applications in business, and the differences between data analytics and data analysis. It covers various types of analytics, the concept of big data, and the importance of data preparation, visualization, and handling missing data. Additionally, it discusses R programming, regression models, textual data analysis, and key concepts such as correlation, multicollinearity, and the significance of statistical coefficients.

Uploaded by

riya1203m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views13 pages

7 Mark Answers Data Analytics

The document provides an overview of data science, its applications in business, and the differences between data analytics and data analysis. It covers various types of analytics, the concept of big data, and the importance of data preparation, visualization, and handling missing data. Additionally, it discusses R programming, regression models, textual data analysis, and key concepts such as correlation, multicollinearity, and the significance of statistical coefficients.

Uploaded by

riya1203m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

7-Mark Answers: Data Analytics & R Questions

Concept of Data Science and its use in Business


Data Science is the process of collecting, organizing, analyzing, and interpreting data to
extract useful insights. It combines statistics, programming, machine learning, and business
knowledge.

Uses in Business:
1. Helps in decision-making using data.
2. Improves customer experience through recommendations.
3. Predicts sales and market trends.
4. Detects fraud in banking and finance.
5. Optimizes inventory and supply chain.
6. Supports targeted marketing.
7. Increases efficiency and profit.

Example: Amazon uses data science to recommend products to customers.

Data Analytics vs Data Analysis


Data Analysis focuses on examining data to find conclusions. Data Analytics is broader and
includes data collection, processing, prediction, and decision-making.

Differences:
1. Data Analysis studies past data.
2. Data Analytics uses tools and models for future prediction.
3. Analysis is a part of Analytics.
4. Analysis gives insights; Analytics supports business strategy.
5. Analytics includes machine learning and forecasting.

Example: Sales report checking is analysis, while predicting future sales is analytics.

Types of Analytics and real-life uses


1. Descriptive Analytics – explains what happened.
Example: Monthly sales reports.

2. Diagnostic Analytics – explains why it happened.


Example: Finding reasons for decrease in sales.

3. Predictive Analytics – predicts future outcomes.


Example: Weather forecasting or stock prediction.

4. Prescriptive Analytics – suggests actions.


Example: Google Maps suggesting fastest route.

These analytics help businesses make better decisions.

Define Big Data, characteristics, applications and challenges


Big Data refers to extremely large and complex data that cannot be managed using
traditional methods.

Characteristics (5Vs):
1. Volume – huge amount of data.
2. Velocity – fast speed of generation.
3. Variety – different data types.
4. Veracity – data accuracy.
5. Value – usefulness of data.

Applications:
- Healthcare
- Banking
- E-commerce
- Social media
- Education

Challenges:
- Data security
- Storage issues
- Data quality
- Processing speed
- Privacy concerns

How does classification of analytics help in decision-making?


Classification of analytics helps organizations understand past performance, identify
problems, predict future trends, and choose the best actions.

1. Descriptive analytics gives past information.


2. Diagnostic analytics identifies causes.
3. Predictive analytics forecasts future outcomes.
4. Prescriptive analytics recommends solutions.

Benefits:
- Better planning
- Faster decisions
- Reduced risks
- Improved business performance
Example: Companies predict customer demand before launching products.

Process of data preparation and cleaning in spreadsheet


Data preparation and cleaning means organizing and correcting raw data before analysis.

Steps:
1. Remove duplicate records.
2. Handle missing values.
3. Correct spelling and formatting errors.
4. Standardize data format.
5. Remove unnecessary columns.
6. Check outliers.
7. Validate data accuracy.

Importance:
- Improves data quality.
- Gives accurate results.
- Reduces errors in analysis.
- Saves time during reporting.

How can outliers be identified in a dataset using spreadsheet?


Outliers are unusual values that are very different from other data.

Methods:
1. Sort data to find extreme values.
2. Use conditional formatting.
3. Create box plots or charts.
4. Use formulas like mean and standard deviation.
5. Apply IQR method.

Steps in spreadsheet:
- Calculate Q1 and Q3.
- Find IQR = Q3 – Q1.
- Values below Q1–1.5(IQR) or above Q3+1.5(IQR) are outliers.

Outlier detection improves accuracy of analysis.

Use of Pivot Table and Pivot Charts


Pivot Tables summarize large data quickly.

Uses:
1. Group data easily.
2. Calculate totals, averages, and counts.
3. Filter information.
4. Compare categories.
5. Create reports quickly.

Pivot Charts visually represent Pivot Table data using graphs.

Benefits:
- Easy visualization
- Better understanding
- Faster analysis
- Supports decision-making

Handling missing data in spreadsheet


Missing data refers to blank or unavailable values in a dataset.

Methods:
1. Remove incomplete rows.
2. Replace with mean or median.
3. Use previous values.
4. Fill manually if possible.
5. Use formulas or interpolation.

Implications:
- Incorrect analysis
- Biased results
- Reduced accuracy
- Wrong business decisions

Proper handling improves data quality.

Interactive dashboard in spreadsheet


An interactive dashboard is a visual display of key information using charts, tables, and
filters.

Steps:
1. Organize data.
2. Create Pivot Tables.
3. Add charts and graphs.
4. Use slicers and filters.
5. Apply conditional formatting.
6. Design clear layout.

Benefits:
- Real-time insights
- Easy monitoring
- Better decision-making
- User-friendly visualization

Role of scatter plots, line charts and histograms


Scatter Plot:
Shows relationship between two variables.

Line Chart:
Shows trends over time.

Histogram:
Shows frequency distribution of data.

Importance:
- Identifies patterns
- Detects trends
- Helps comparison
- Makes data easy to understand
- Supports analysis and forecasting

Techniques available in spreadsheet for data visualization


1. Bar charts
2. Pie charts
3. Line charts
4. Histograms
5. Scatter plots
6. Pivot charts
7. Conditional formatting

Importance:
- Improves understanding
- Highlights trends
- Detects errors
- Helps accurate reporting
- Supports decision-making

What is R? How do we install an R package?


R is a programming language used for statistics, data analysis, and visualization.

Features:
- Open-source
- Statistical computing
- Graphical tools
- Data analysis support

Installing package:
Use command:
[Link]("package_name")

Loading package:
library(package_name)

Example:
[Link]("ggplot2")

Features of R
1. Open-source software
2. Supports statistical analysis
3. Powerful data visualization
4. Large package library
5. Cross-platform support
6. Supports machine learning
7. Easy data manipulation

R is widely used in research and business analytics.

Difference between setwd() and getwd()


setwd():
Used to set/change the current working directory.

Example:
setwd("C:/Data")

getwd():
Used to display the current working directory.

Example:
getwd()

Difference:
setwd() changes location, while getwd() shows location.

How do you remove NA values from a data frame?


NA values represent missing data in R.

Methods:
1. [Link](dataframe)
2. [Link]()
3. Replace NA with mean/median.

Example:
data <- [Link](data)

Benefits:
- Improves accuracy
- Prevents calculation errors
- Makes analysis reliable

Logical operators in R
Logical operators compare conditions.

Operators:
1. & → AND
2. | → OR
3. ! → NOT
4. == → Equal to
5. != → Not equal to
6. >, <, >=, <=

Example:
x > 5 & y < 10

Used in filtering and decision-making.

What does a histogram represent?


A histogram represents frequency distribution of continuous data.

Features:
- Bars touch each other.
- Shows spread of data.
- Displays patterns and distribution.

Uses:
- Detects skewness
- Identifies outliers
- Understands data distribution

Example: Marks distribution of students.


Difference between bar chart and histogram
Bar Chart:
- Used for categorical data.
- Bars are separated.
- Compares categories.

Histogram:
- Used for continuous data.
- Bars touch each other.
- Shows frequency distribution.

Example:
Bar chart for subjects, histogram for marks distribution.

When is a line graph most appropriate?


A line graph is best used to show trends and changes over time.

Uses:
- Stock prices
- Temperature changes
- Monthly sales
- Population growth

Advantages:
- Easy trend analysis
- Shows increase/decrease clearly
- Useful for forecasting

Correlation vs Covariance
Correlation measures strength and direction of relationship between variables.

Covariance measures how two variables vary together.

Differences:
1. Correlation ranges from -1 to +1.
2. Covariance has no fixed range.
3. Correlation is standardized.
4. Covariance depends on units.

Correlation is easier to interpret.


Linear Regression Model
Linear Regression shows relationship between dependent and independent variables.

Equation:
Y = a + bX

Where:
Y = dependent variable
X = independent variable
a = intercept
b = slope

Uses:
- Sales prediction
- Trend analysis
- Forecasting

Advantages:
- Simple
- Easy interpretation
- Useful for prediction

Multiple Regression
Multiple Regression uses two or more independent variables to predict one dependent
variable.

Equation:
Y = a + b1X1 + b2X2 + ...

Example:
Predicting house price using size, location, and age.

Advantages:
- Better accuracy
- Handles multiple factors
- Useful in business forecasting

Multicollinearity in Regression
Multicollinearity occurs when independent variables are highly correlated.

Effects:
- Reduces accuracy
- Difficult coefficient interpretation
- Increases errors

Detection:
- Correlation matrix
- VIF (Variance Inflation Factor)

Solution:
- Remove related variables
- Use feature selection

Heteroscedasticity in Regression
Heteroscedasticity occurs when error variance is not constant.

Effects:
- Unreliable predictions
- Incorrect statistical tests

Causes:
- Outliers
- Improper data

Detection:
- Residual plots
- Statistical tests

Solutions:
- Transform data
- Remove outliers
- Use weighted regression

Textual Data Analysis


Textual Data Analysis means analyzing text data to extract useful information.

Steps:
1. Data collection
2. Cleaning text
3. Tokenization
4. Sentiment analysis
5. Interpretation

Applications:
- Social media analysis
- Customer feedback
- Review analysis
Role of Residuals in Regression
Residuals are differences between actual and predicted values.

Formula:
Residual = Actual – Predicted

Importance:
- Measures prediction error
- Checks model accuracy
- Detects outliers
- Helps validate regression assumptions

Good models have small residuals.

Purpose of Confidence Interval in Regression


Confidence Interval gives a range within which the true parameter value is expected.

Importance:
- Measures reliability
- Shows uncertainty
- Helps statistical inference

Example:
95% confidence interval means results are expected to fall within range with 95%
confidence.

Predictive Interval in Regression


Prediction Interval estimates the range for future observations.

Features:
- Wider than confidence interval
- Includes uncertainty in prediction

Uses:
- Forecasting
- Future value estimation

Example:
Predicting next month sales range.

Importance of checking statistical significance of coefficients


Statistical significance checks whether variables truly affect the outcome.
Methods:
- p-value
- t-test

Importance:
1. Identifies useful variables.
2. Improves model accuracy.
3. Removes unnecessary variables.
4. Supports reliable conclusions.

Usually p-value < 0.05 indicates significance.

Goal of Textual Data Analysis


Goals:
1. Extract useful information from text.
2. Identify sentiments and opinions.
3. Classify documents.
4. Detect patterns and trends.
5. Support business decisions.

Applications:
- Chatbots
- Review analysis
- Social media monitoring

Difference between Structured and Unstructured Data


Structured Data:
- Organized in rows and columns.
- Easy to store and analyze.
Example: Excel tables.

Unstructured Data:
- No fixed format.
- Difficult to process.
Example: Images, videos, emails.

Structured data is easier for analysis.

Tokenization in Text Analysis


Tokenization is the process of breaking text into smaller units called tokens.

Tokens may be:


- Words
- Sentences
- Characters

Example:
“I love data science” → [I, love, data, science]

Importance:
- Text processing
- Sentiment analysis
- Machine learning

Text Mining, Text Categorization and Sentiment Analysis


Text Mining:
Extracting useful patterns from text data.

Text Categorization:
Classifying text into categories.

Sentiment Analysis:
Finding emotions/opinions from text.

Applications:
- Product review analysis
- Spam detection
- Social media monitoring
- Customer feedback analysis

You might also like