0% found this document useful (0 votes)
22 views9 pages

Salary and Experience Data Analysis

Uploaded by

ngak1214
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views9 pages

Salary and Experience Data Analysis

Uploaded by

ngak1214
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Perform the following operations using Python on dataset

Salary_Data.csv (Experience, Salary)


1. Print all data from Salary_Data.csv
2. Find the empty cell from Salary_Data.csv
3. Count the missing values from column Experience
4. List the Descriptive Statistics for given dataset
5. Reset the default index of given dataset

Perform the following operations using Python on dataset


Salary_Data.csv (Experience, Salary)
1. Print all data from Salary_Data.csv
2. Find the empty cell from Salary_Data.csv
3. Count the missing values from column Experience
4. List the Descriptive Statistics for given dataset
5. Reset the default index of given dataset

Create a Data Frame Product (Product_Name, Price) with some


missing values and perform following operation on it.

1. Count missing values under the column Product_Name.


2. Count missing values under the entire data frame Product.
3. Count missing values under the entire row.
4. Count missing values across the row with index of 7.
5. To remove the duplicates across the columns of Product_Name
Create a Data Frame Product (Product_Name, Price) with some
missing values and perform following operation on it.

1. Count missing values under the column Product_Name.


2. Count missing values under the entire data frame Product.
3. Count missing values under the entire row.
4. Count missing values across the row with index of 7.
5. To remove the duplicates across the columns of Product_Name

Create a Pandas Data frame with two columns (Value1 and Value2)
with some Numeric and some Categorical values and perform
following operation on it.

1. Convert all values from data frame into float format and print it.
2. Drop all the rows with the NaN values from data frame.
3. Replace the NaN values with 0’s
4. Transpose the given data frame.
5. Rename the default index with X, Y, Z and then transpose data
frame.

Create a Pandas Data frame with two columns (Value1 and Value2)
with some Numeric and some Categorical values and perform
following operation on it.

1. Convert all values from data frame into float format and print it.
2. Drop all the rows with the NaN values from data frame.
3. Replace the NaN values with 0’s
4. Transpose the given data frame.
5. Rename the default index with X, Y, Z and then transpose data frame.
Perform following operation on Iris data set.
1. Standard Scaler and Minimax Scaler operation on Iris Data set.
2. Scale data with range 5 to 10 using Minimax Scaler operation on
Iris Data set.
3. Write a Python code for outlier detection using Z score

Perform following operation on Iris data set.


1. Standard Scaler and Minimax Scaler operation on Iris Data set.
2. Scale data with range 5 to 10 using Minimax Scaler operation on
Iris Data set.
3. Write a Python code for outlier detection using Z score

1. Perform following operation on [Link] data set.


1. Display all descriptive statistic of mtcars data set.
2. Get the Mean, Median and Mode of each column for mtcars
data set.
3. Get the Mean of each rows for mtcars data set.
2. Write a Python program to display some basic statistical details
like standard deviation, mean, standard deviation etc. of the species
of ‘Iris-setosa’, ‘Iris-versicolor’ and ‘Iris- versicolor’ of [Link] dataset.
[Link] following operation on [Link] data set.
1. Display all descriptive statistic of mtcars data set.
2. Get the Mean, Median and Mode of each column for mtcars
data set.
3. Get the Mean of each rows for mtcars data set.
2. Write a Python program to display some basic statistical details
like standard deviation, mean, standard deviation etc. of the species
of ‘Iris-setosa’, ‘Iris-versicolor’ and ‘Iris- versicolor’ of [Link] dataset.

Create a Linear Regression Model using Python to predict home


prices using Boston Housing Dataset.

Create a Linear Regression Model using Python to predict home


prices using Boston Housing Dataset.

1. Implement logistic regression using Python to perform


classification on Social_Network_Ads.csv dataset.
2. Compute Confusion matrix to find TP, FP, TN, FN, Accuracy, Error
rate, Precision, Recall on the given dataset.
1. Implement logistic regression using Python to perform
classification on Social_Network_Ads.csv dataset.

2. Compute Confusion matrix to find TP, FP, TN, FN, Accuracy, Error
rate, Precision, Recall on the given dataset.

1. Implement Simple Naïve Bayes classification algorithm using


Python on [Link] dataset.
2. Compute Confusion matrix to find TP, FP, TN, FN, Accuracy, Error
rate, Precision, Recall on the given dataset.

1. Implement Simple Naïve Bayes classification algorithm using


Python/R on [Link] dataset.

2. Compute Confusion matrix to find TP, FP, TN, FN, Accuracy, Error
rate, Precision, Recall on the given dataset.

1. Extract Sample document and apply following document


preprocessing methods:
 Tokenization,
 POS Tagging,
 stop words removal,
 Stemming and
 Lemmatization.
2. Create representation of document by calculating Term Frequency
and Inverse Document Frequency.
1. Extract Sample document and apply following document
preprocessing methods:
 Tokenization,
 POS Tagging,
 stop words removal,
 Stemming and
 Lemmatization.
2. Create representation of document by calculating Term
Frequency and Inverse Document Frequency.

 Use the inbuilt dataset 'titanic', contains information about the


passengers who boarded the unfortunate Titanic ship.

 Write a code to check how the price of the ticket (column name:
'fare') for each passenger is distributed by plotting a histogram
with and without kernel density estimation

 Use the inbuilt dataset 'titanic', contains information about the


passengers who boarded the unfortunate Titanic ship.

 Write a code to check how the price of the ticket (column name:
'fare') for each passenger is distributed by plotting a histogram
with and without kernel density estimation
Use the inbuilt dataset 'titanic' and Plot a box plot for distribution of
age with respect to each gender along with the information about
whether they survived or not. (Column names: 'sex' and 'age')

Use the inbuilt dataset 'titanic' and Plot a box plot for distribution of
age with respect to each gender along with the information about
whether they survived or not. (Column names: 'sex' and 'age')

Perform following operation on [Link] data set.


1. List down the features and their types (e.g., numeric, nominal)
available in the dataset.
2. Create a histogram for each feature in the dataset to illustrate
the feature distributions.
3. Create a box plot for each feature in the dataset.
4. Compare distributions and identify outliers.
Perform following operation on [Link] data set.
1. List down the features and their types (e.g., numeric, nominal)
available in the dataset.
2. Create a histogram for each feature in the dataset to illustrate
the feature distributions.
3. Create a box plot for each feature in the dataset.
4. Compare distributions and identify outliers.
Write a code in JAVA for a simple Word Count application that counts
the number of occurrences of each word in a given input set using the
Hadoop Map-Reduce framework on local-standalone set-up.

Write a code in JAVA for a simple Word Count application that counts
the number of occurrences of each word in a given input set using the
Hadoop Map-Reduce framework on local-standalone set-up.

Design a distributed application using Map-Reduce which processes a


log file of a system.

Design a distributed application using Map-Reduce which processes a


log file of a system.
Locate dataset (e.g., sample_weather.txt) for working on weather
data which reads the text input files and finds average for
temperature, dew point and wind speed.

Locate dataset (e.g., sample_weather.txt) for working on weather


data which reads the text input files and finds average for
temperature, dew point and wind speed.

You might also like