1. Perform the analysis for the following.
a) Input the data warehouse data in Microsoft Excel and create the Pivot table
and Pivot chart.
Step 1. Open Microsoft Excel.
Step 2. Enter data in table format (Date, Product, Category, Sales, Quantity).
Step3. Select the data → Go to Insert → Pivot Table.
Step 4. Choose New Worksheet → OK.
Step 5. Drag fields: - Rows: Product - Values: Sum of Sales - Columns: Category
(optional)
Step 6. After Pivot Table is created: - Click inside Pivot Table. - Go to Insert →
PivotChart. - Select Column chart → OK. Your Pivot Table and Pivot Chart are
created successfully.
b) Input the cube in Microsoft Excel and create the Pivot table and Pivot chart
to perform data analysis.
Step 1 :Open blank excel file.
Step 2:Go to open option.
Step 3:Select the file and open it.
Step 4: Select the all data.
Step 5: Go to insert option and Click on PivotTable, then Click on ok.
Step 6: Select PivotTable Fields. (nbrhood, improval, saltopr)
Step 7: Return go to book1 sheet, select all data.
Step 8 :Go to the insert and click on PivotTable , click on ok.
Step 9: Select PivotTable Fields. (nbrhood, totalval, saltopr)
Step 10:Again, return on Book1 sheet, Select all data.
Step 11: Go to insert, click on PivotTable then click on Ok.
Step 12: Select PivotTable Fields. (nbrhood, landval, saltopr)
Step 13: Copy data Sheet1 and Sheet2, and Paste on Sheet3.
Step 14: Copy sheet wise data and Paste picture. (Down to the data)
Step 15: Select the picture and go to Picture Effects.
Step 16: Select 3D Rotation, choose the shape.
Step 17: Adjust picture rotation like cube.
2. Apply the what – if Analysis for data visualization. Design and generate
necessary reports based on the data warehouse data. Use Excel.
Scenario Manager
In scenario manager, we create different scenarios by proving different input
values for the same variable than by comparing scenarios to choose the correct
result. For Example, To check the cost of revenue for three different months.
Step 1: Given a data set, for Revenue Cost of Jan, with Expenses and Cost as
its columns.
Step 2: Select the numerical value cell and Go to the Data.
Step 3: Under the forecast section, click on the What-if analysis.
Step 4: A drop-down appears. Select the Scenario manager.
Step 5: A dialog box appears in the dialog box select add option.
Step 6: A new dialog appears to write the name of the new scenario in the first
column. Under Scenario name, write "Revenue of Feb".
Step
7: In
the second column select the changing cell. The changing cells for this
example, are $D$6:$D$9.
Step 8: A new dialogue box name Scenario Values appears to write the
changed value in the box. Enter the values as per shown in the image.
Click Ok.
Step 9: Repeat step5, step6, step 7and step8.
Step 10: Click Ok then select summary.
Step 11: A new Dialog box name Scenario Summary appears. Select Result
cells: $D$10/D10.
Step 12: See the result.
Goal seek
In goal seek we already know our output value we have to find the correct input
value. For example, if a student wants to know his English marks and he
knows all the rest of the marks and total marks in all subjects.
Step 1: Write all subjects and their marks in an excel sheet and do the sum by
applying the formula sum.
Step 2: Go into the data tab of the Toolbar.
Step 3: Under the Data Table section, Select the What-if analysis.
Step 4: A drop-down appears. Select the Goal Seek.
Step 5: The dialogue box appears in the first column write the name of the
cell in which you apply the formula sum. Type D10 in Set cell.
Step 6: In the second column write the value of the target. The target value for
this example is 440.
Step 7: In the third column write the name of the cell in which you want to get
marks in English. Provide absolute cell reference, i.e. $D$5.
Step 8: Click ok and see the result. The estimated marks for English are 71.
Data Table
In data, we create a table with different input values for the same variables. It is
one of the most helpful features in what-if analysis. One can change different
values in x and can achieve different outputs accordingly for research as well as
business-driven purposes.
Data table in two Variable
In the Data table in two variables, we can change two input values in both row
and column. It includes two input cells. For example, A person wants to know
about per month installments of loan by the different rates of interest
and for the different time periods for the same principal amount.
Step 1: Create a table to find PMT.
Step 2: Copy the last cell in which you get output in another cell
Step 3: Write both values you want to change in both columns and rows.
Step 4: Go to the Data tab of the toolbar.
Step 5: Select the what-if analysis.
Step 6: Select the Data Table.
Step 7: A dialogue box appears in which you have to select the cell in which
you want to change the value in both row and column. The Row input cell
value is $E$7 and the column input cell value is $E$8.
Step 8: Click ok and see the result.
3. Perform the data classification using classification algorithm using our
R/Python.
# Get the data points in form of a R vector.
> rainfall <-
c(799,1174.8,865.1,1334.6,635.4,918.5,685.5,998.6,784.2,985,882.8,1071)
> # Convert it to a time series object.
> [Link] <- ts(rainfall,start = c(2012,1),frequency = 12)
> # Print the timeseries data.
> print([Link])
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov
2012 799.0 1174.8 865.1 1334.6 635.4 918.5 685.5 998.6 784.2 985.0 882.8
Dec
2012 1071.0
> # Give the chart file a name.
> png(file = "[Link]")
> # Plot a graph of the time series.
> plot([Link])
> # Save the file.
> [Link]()
null device
1
> #After this again plot to get chart
> plot([Link])
OUTPUT:-
4. Perform the data clustering using clustering algorithm using R/Python.
#k-means clustering using R
#apply K means to iris and store result
newiris <- iris
newiris$Species <- NULL
(kc <- kmeans(newiris,3))
#Compare the Species label with the clustering result
table(iris$Species,kc$cluster)
#Plot the clusters and their centers
plot(newiris[c("[Link]","[Link]")],col=kc$cluster)
points(kc$centers[,c("[Link]","[Link]")],col=1:3,pch=8,cex=2)
[Link]()
#Plot the clusters and their centre
plot(newiris[c("[Link]","[Link]")],col=kc$cluster)
OUTPUT:-
5. Perform the linear regression on the given data warehouse data using
R/Python
Input Data Below is the sample data representing the observations –
# Values of height 151, 174, 138, 186, 128, 136, 179, 163, 152, 131
# Values of weight. 63, 81, 56, 91, 47, 57, 76, 72, 62, 48
lm() Function :- This function creates the relationship model between the
predictor and the response variable.
Syntax :-
The basic syntax for lm() function in linear regression is –
lm(formula,data)
Following is the description of the parameters used :−
• formula is a symbol presenting the relation between x and y.
• data is the vector on which the formula will be applied.
# Linear Regression Example in R
#
# A. Create Relationship Model & Get the Coefficients
#
# Values of height (predictor variable)
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
# Values of weight (response variable)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
# Create the linear regression model
relation <- lm(y ~ x)
# Print the relationship model
print(relation)
# B. Get the Summary of the Relationship
#
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
relation <- lm(y ~ x)
print(summary(relation))
# C. Predict the Weight of a New Person
#
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
relation <- lm(y ~ x)
# Predict weight for a person with height = 190 cm
a <- [Link](x = 190)
result <- predict(relation, a)
print(result)
# Create predictor and response variables
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
# Open a PNG graphics device
png(file = "[Link]")
# Plot the chart (the plot is saved to the file, not displayed in RStudio)
plot(x, y,
col = "blue",
main = "Height & Weight Regression",
cex = 1.3,
pch = 16,
xlab = "Height in cm",
ylab = "Weight in Kg")
# Add regression line
abline(lm(y ~ x), col = "blue", lwd = 2)
# Close the device
[Link]()
# Display the plot again on screen
plot(y, x, col = "blue", main = "Height & Weight Regression",
xlab = "Weight in Kg", ylab = "Height in cm",
pch = 16, cex = 1.3)
abline(lm(x ~ y), col = "blue", lwd = 2)
OUTPUT:-
6. Perform the logistic regression on the given data warehouse data using
R/Python.
input <- mtcars[, c("am", "cyl", "hp", "wt")]
> print(head(input))
am cyl hp wt
Mazda RX4 1 6 110 2.620
Mazda RX4 Wag 1 6 110 2.875
Datsun 710 1 4 93 2.320
Hornet 4 Drive 0 6 110 3.215
Hornet Sportabout 0 8 175 3.440
Valiant 0 6 105 3.460
> [Link] <- glm(am ~ cyl + hp + wt,
+ data = input,
+ family = binomial)
> summary([Link])
Call:
glm(formula = am ~ cyl + hp + wt, family = binomial, data = input)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 19.70288 8.11637 2.428 0.0152 *
cyl 0.48760 1.07162 0.455 0.6491
hp 0.03259 0.01886 1.728 0.0840 .
wt -9.14947 4.15332 -2.203 0.0276 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 43.2297 on 31 degrees of freedom
Residual deviance: 9.8415 on 28 degrees of freedom
AIC: 17.841
OUTPUT:-
Number of Fisher Scoring iterations: 8
7. Write a python program to read data from a CSV file, perform simple data
analysis, and generate basic insights.(Use Pandas is a Python library).
import pandas as pd;
# 1. Read Program
file_path = "data_sales.csv"
try:
df = pd.read_csv(file_path)
print("\nCSV Loaded Successfully\n")
except FileNotFoundError:
print("\n File not found. Ensure the CSV is in the same folder.")
exit()
print("Print 5 rows")
print([Link]())
print("Data structure")
print([Link]())
print("If empty Value Present in the data")
print([Link]().sum())
print("Stats Summary of the data")
print([Link]())
OUTPUT:-
8. Perform data visualization
a. Perform data visualization using Python on any sales data.
import pandas as pd;
import [Link] as plt
file_path = "sale_data.csv"
df = pd.read_csv(file_path)
# print(df)
# Line Plot Month Wise Sales
[Link](figsize=(10, 5))
[Link](df["Month"], df["Total_Sales"], marker="o")
[Link]("Monthly Sales Trend")
[Link]("Month")
[Link]("Sales Amount")
[Link]()
[Link]()
# Bar Chart Month Wise Sales Comparison
[Link](figsize=(10, 5))
[Link](df["Month"], df["Total_Sales"])
[Link]("Sales Comparison by Month")
[Link]("Month")
[Link]("Sales Amount")
[Link]()
# Pie Chart
[Link](figsize=(7, 7))
[Link](df["Total_Sales"], labels=df["Month"], autopct="%1.1f%%")
[Link]("Sales Contribution by Month")
[Link]()
OUTPUT:-
b. Perform data visualization using PowerBI on any sales data.
Step 1: Open Power BI Desktop
• Launch Power BI Desktop.
• You will see a blank report canvas with options to import data.
Step 2: Import Data into Power BI
1. Go to Home → Get Data.
2. Choose Excel or Text/CSV depending on your file.
3. Select file sale_data.csv from your computer.
4. Preview window opens showing all columns (Month, Year, Product,
Units_Sold, Unit_Price, Total_Sales).
5. Click Load to import the dataset into Power BI.
Step 3: View the Imported Dataset
• After loading, the dataset sale_data appears in the Data pane on the right side.
• Columns such as Month, Total_Sales, Units_Sold, etc. become available for
visualization
Step 4: Create a Pie Chart of Monthly Total Sales
1. Go to Visualizations pane and click the Pie Chart icon.
2. A blank chart appears on the canvas.
3. Set the fields:
o Legend: Month
o Values: Total_Sales
4. The pie chart will display the percentage share of each month’s total sales.
Step 5: Create a Line Chart of Total Sales by Month
1. Click on the Line Chart icon in the Visualizations pane.
2. Drag fields into these areas:
o X-axis: Month
o Y-axis: Total_Sales
3. The line chart displays month-wise sales trend.
9. Create the Data staging area for the selected database using SQL.
Step 1: Open MySQL Workbench
Launch MySQL Workbench and connect to the localhost server using:
• Hostname: [Link]
• Port: 3306
• User: root
Enter your password and click OK.
Step 2: Create a New Database (Schema)
In the SQL Query window, write the following command:
Step 3: Select the Database
Choose the database for further operations:
Step 4: Create a Staging Table
Create a table to store customer sales data:
Step 5: Insert Sales Data into the Staging Table
Insert the given records into the staging table:
Step 6: View the Staging Table Data
To display the inserted data: