0% found this document useful (0 votes)
187 views3 pages

Pandas Assignment for Data Science Course

This document outlines 5 tasks to complete using Pandas on a dataset: 1. Write a function to return a Pandas series for a numeric range with default values. 2. Create a function to return a dataframe from list of column names and list of values. 3. Create a function to concatenate two dataframes, resetting the indexes. 4. Load the cars.csv data into a dataframe and print descriptive statistics. 5. Write a method to return the column most correlated to an input column using the cars data.

Uploaded by

rashid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
187 views3 pages

Pandas Assignment for Data Science Course

This document outlines 5 tasks to complete using Pandas on a dataset: 1. Write a function to return a Pandas series for a numeric range with default values. 2. Create a function to return a dataframe from list of column names and list of values. 3. Create a function to concatenate two dataframes, resetting the indexes. 4. Load the cars.csv data into a dataframe and print descriptive statistics. 5. Write a method to return the column most correlated to an input column using the cars data.

Uploaded by

rashid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
  • Pandas Assignment - 1

Python for Data Science Certification Course

Pandas Assignment - 1

Problem Statement:
You work in XYZ Company as a Python. The company officials want you to build a python program.

Link to Dataset Tasks to be performed:

1. Write a function that takes start and end of a range returns a Pandas series object containing
numbers within that range.

In case the user does not pass start or end or both they should default to 1 and 10 respectively.
eg.
range_series() -> Should Return a pandas series from 1 to 10

range_series(5) -> Should Return a pandas series from 5 to 10

range_series(5, 10) -> Should Return a pandas series from 5 to 10.

2. Create a function that takes in two lists named keys and values as arguments.

Keys would be strings and contain n string values.

Values would be a list containing n lists.

The methods should return a new pandas dataframe with keys as column names and values as
their corresponding values

e.g. -> create_dataframe(["One", "Two"], [["X", "Y"], ["A", "B"]]) -> should return a dataframe

One Two

0 X A

1 Y B
support@[Link] - +91-7022374614 - US: 1-800-216-8930 (Toll Free)
Python for Data Science Certification Course

3. Create a function that concatenates two dataframes. Use previously created function to create
two dataframes and pass them as parameters Make sure that the indexes are reset before
returning:

4. Write code to load data from [Link] into a dataframe and print its details. Details like: 'count',
'mean', 'std', 'min', '25%', '50%', '75%', 'max'.

5. Write a method that will take a column name as argument and return the name of the column
with which the

given column has the highest correlation.

The data to be used is the cars dataset.

The returned value should not the column named that was passed as the parameters.

E.G: get_max_correlated_column('mpg') -> should return 'drat'


support@[Link] - +91-7022374614 - US: 1-800-216-8930 (Toll Free)

Common questions

Powered by AI

When concatenating two DataFrames, it is crucial to manage the index to avoid potential issues with overlapping or non-unique index values. One consideration is to reset the index before or after concatenation to ensure that the resulting DataFrame has a unique and sequential index, which can aid in subsequent data handling and analysis operations. Failing to reset the index may lead to errors in data processing or incorrect assumptions about data relationships.

Implementing a function to create a DataFrame from two lists involves using the Pandas library where one list serves as column headers (keys) and the other as rows (values). This method is effective because it allows data to be organized intuitively, with easy access to columns by name. Using Pandas specifically provides the added benefit of leveraging powerful data manipulation and analysis capabilities inherent in the library.

Identifying the highest correlated column with a given column is crucial as it helps reveal potential relationships and dependencies between variables in a dataset. This practice can identify features that influence each other, aiding in better feature selection, predictive modeling, and data interpretation. High correlation may suggest redundancy or provide insights into causality, which is invaluable in optimizing data-driven decision-making processes.

Using default values in a Python function such as a range function enhances the flexibility and usability of the function by allowing it to handle cases where arguments are not provided. Without defaults, the function would raise errors or require additional handling. By setting defaults, such as having start default to 1 and end to 10, the function can produce results without requiring input, making it easier for users to interact with the function for common use cases.

To gain a comprehensive understanding of a numerical dataset's distribution and spread in Pandas, functions such as count, mean, standard deviation (std), minimum (min), quartiles (25%, 50%, 75%), and maximum (max) can be applied. These functions provide valuable insights into the central tendencies, variability, and range of the data, enabling a thorough examination of its characteristics and potential trends or anomalies.

Python for Data Science Certification Course    
   
Pandas Assignment - 1   
   
   
  
Problem Statement:   
You work in XY
Python for Data Science Certification Course    
   
3. Create a function that concatenates two dataframes. Use previousl
support@intellipaat.com - +91-7022374614 - US: 1-800-216-8930 (Toll Free)

You might also like