0% found this document useful (0 votes)

12 views2 pages

Data Science Concepts and Techniques

The document outlines a series of questions related to data science, including topics such as the differences between data structures, machine learning techniques, data collection strategies, and the data science lifecycle. It also includes practical coding tasks involving Python libraries like Numpy and DataFrame operations using Pandas. Additionally, it covers the roles of data science professionals and various applications of data science across different fields.

Uploaded by

prem prasad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views2 pages

Data Science Concepts and Techniques

Uploaded by

prem prasad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Q1 Attempt any four parts (4x5=20)

a) What is a Series and how is it different from a 1-D array, a list, and a dictionary.
b) Differentiate Supervised and Unsupervised learning techniques
c) Specify any four python Libraries and their applications.
d) Describe any five data collection strategies
e) Give 4 ways of creating Numpy arrays.
f) Explain four major tasks in data pre-processing

Q.2 (a) Explain the roles and responsibilities of any five Data Science professionals. (5)
(b) What are the various types of Data in Statistics? Explain with example (5)

Q3. (a) What is Data Science Lifecycle? Explain all stages with diagram. (5)
(b) What are missing values? What are the strategies to handle them? Explain
four methods of Imputation by giving example of each. (5)

Q4 (a) Explain four methods of creating Dataframe by using (5)

i. Multiple List of different length
ii. Multiple Series Object
iii. Nested Dictionary
iv. Numpy Array
(b) Explain five applications/use in different fields of Data Science. (5)

Q5 Give 4 ways of creating Numpy arrays (10)

Give the code or syntax to Perform the following operation on two 2D numpy array array1 and array2 and
1D array array3.
a. Add array1 and array2
b. Find sum of array1 elements over a given axis.
c. Find product of array2 elements over a given axis.
d. Change the dimension of an array3 to 2D.
e. Transpose the array created in part d.
f. Display 2 rows and third column of 2D array array1.
g. Join two 2D array along row.
h. Convert array2 to 1D array.
i. Split an array 1 into multiple subarrays

Q6 Give 4 ways of creating series by using List, arrays, dictionary, scalar value. (10)
a) Write python code to create the following series
101 Harsh
102 Arun
103 Ankur
104 Harpal
105 Divya
106 Jeet
b) Show details of 1st 3 employees using head function
c) Show details of last 3 employees using tail function
d) Show details of 1st 3 employees without using head function
e) Show details of last 3 employees without using tail function
f) Show value of index no 102.
g) Show 2nd to 4th records.
h) Show values of index no=101,103,105.
i) Show details of “Arun”

Q7. Create a dataframe for the below given data (10)

Write a code to perform following operations on above dataframe:
i. Print the batsman name along with runs scored in Test and T20 using column names and dot
notation.
ii. Display the Batsman name along with runs scored in ODI using loc
iii. Display the batsman details who scored runs more than :

More than 2000 in ODI

Less than 2500 in Test
More than 1500 in T20
iv. Display the columns using column index number like 0, 2, 4.
v. Display the alternated rows.
vi. Reindex the dataframe created above with batsman name and delete data of Hardik Pandya and
Shikhar Dhawan by their index from original dataframe.
vii. Insert 2 rows in the dataframe and delete rows whose index is 1 and 4.
viii. Delete a column Test, add one more column total at last (next to T20 column), make total of ODI
and T20 runs in that column.
ix. Rename column T20 with “T20I Runs”.
x. Print the dataframe without headers.

OR
Q8. Create the following DataFrame Sales containing year-wise sales figures for five salespersons in INR. Use
the years as column labels, and salesperson names as row labels. (10)

2014 2015 2016 2017

Madhu 100.5 12000 2000 50000

Kusum 150.8 18000 5000 60000
Kinshuk 200.9 22000 70000 70000
Ankit 30000 30000 1000 80000
Shruti 40000 45000 1250 90000

a. Display the row labels of Sales.

b. Display the column labels of Sales.
c. Display the dimensions, shape, size and values of Sales.
d. Display the last two rows of Sales.
e. Display the first two columns of Sales.
f. Change the DataFrame Sales such that it becomes its transpose.
g. Add data to Sales for salesman Sumeet where the sales made are [196.2, 37800, 52000, 78438] in the
years [2014, 2015, 2016, 2017] respectively.
h. Delete the data for the year 2014 from the DataFrame Sales.
i. Update the sale made by Shruti in 2017 to 100000.
j. Write the values of DataFrame Sales to a comma-separated file [Link] on the disk. Do not
write the row labels and column labels.
k. Change the name of the salesperson Ankit to Vivaan and Kinshuk to Shailesh.
l. Delete the data for salesman Madhu from the DataFrame Sales.

Common questions

Common Python libraries used in data analysis include: 1) NumPy, which supports large, multi-dimensional arrays and matrices, along with mathematical functions to operate on these arrays; 2) Pandas, which provides easy-to-use data structures and data analysis tools, especially for data manipulation and analysis; 3) Matplotlib, for creating static, interactive, and animated visualizations; and 4) Scikit-learn, which is used for implementing simple and efficient tools for data mining and data analysis, specifically machine learning models such as classification and regression.

Handling missing data can be approached with several strategies: 1) Deletion - removing the records or features with missing values; 2) Mean/Median Imputation - replacing missing values with the mean or median of the column; 3) Mode Imputation - using the most frequent value in the column to fill in missing entries; and 4) Prediction Model - using a predictive model to estimate and replace missing values based on other features. Each method has its own merit and depends on the data and context; deletion is straightforward but can lead to substantial data loss, while predictive modeling may give the most accurate estimates but is computationally intensive.

Creating a DataFrame from multiple lists of different lengths in pandas may result in NaN values for positions where no data is available from shorter lists. This is because pandas aligns data across the lists based on index positions, filling in NaN where data is missing to ensure alignment. This can be advantageous for consistency in data handling but might require additional data imputation to deal with the resulting missing values.

Effective data collection strategies include: 1) Surveys - structured questionnaires which, if designed well, can gather wide-ranging data; 2) Observations - collecting data by monitoring subjects, often used in behavioral studies; 3) Interviews - obtaining detailed data through interactive conversation; 4) Experiments - collecting data under controlled conditions for causal inference; and 5) Transactions - automatic logging of events in systems, ideal for large and high-velocity data. The quality of data collected by these methods depends on design, execution, and the minimization of bias and errors. Good data collection practices result in high-quality, reliable data which is crucial for accurate analysis.

The Data Science Lifecycle consists of a series of iterative stages: 1) Problem Definition - understanding and defining the problem to solve; 2) Data Collection - gathering data relevant to the problem; 3) Data Cleaning and Preparation - processing raw data for analysis; 4) Exploratory Data Analysis - summarizing main characteristics using visual and quantitative methods; 5) Modeling - selecting and applying machine learning algorithms; 6) Evaluation - assessing the model's performance; 7) Deployment - integrating the model into the decision-making process; and 8) Monitoring and Maintenance - ensuring that the model remains relevant and accurate.

A data scientist's role involves extracting insights from data through the application of statistical, analytical, and machine learning techniques; this includes building models, testing hypotheses, and interpreting data. In contrast, a data engineer focuses on the design, construction, and maintenance of systems to collect, store, and analyze data. They ensure that the infrastructure for data generation and processing is robust and efficient. While data scientists create models and derive insights, data engineers build the pipelines that support that work.

Data imputation is the process of replacing missing data with substituted values to maintain dataset integrity. This is crucial in pre-processing as missing data can result in biased estimates and affect data analysis outcomes. Imputation techniques like mean, median, or mode filling, using predictive models, or neighbor-based imputations, help maintain consistency and comprehensiveness of datasets without discarding useful data. Proper imputation aids in preserving statistical power and ensures more accurate and robust analysis results.

Supervised learning techniques involve training a model on a labeled dataset, meaning each training example is paired with an output label. This allows the model to learn the mapping from inputs to outputs, aiding tasks such as classification and regression. In contrast, unsupervised learning methods work with unlabeled data, and the system tries to learn patterns and structures from the data itself, commonly used in clustering and association tasks.

A Series in Python is a one-dimensional labeled array capable of holding any data type, similar to a column in a table. Unlike a 1-D array, a Series can hold mixed data types and has labeled indices. Compared to a list, a Series provides additional functionality linked to data analytics, like statistical operations. A dictionary, on the other hand, pairs keys with values and does not maintain the order of insertion unless using an OrderedDict, while a Series maintains order and can be indexed numerically or with custom labels.

Converting a 2D numpy array to a 1D array involves flattening the array using methods such as `flatten()` or `ravel()`. This process merges all the nested elements into a single continuous array. The benefits of this conversion include simplified data structures for operations that require linear inputs, reduced complexity, and sometimes improved computational efficiency, especially in operations better suited for one-dimensional data.

Data Science Concepts and Python Programming
No ratings yet
Data Science Concepts and Python Programming
3 pages
Week4 Week5 Question Bank
No ratings yet
Week4 Week5 Question Bank
1 page
Numpy, Pandas, SQL: Data Analysis Guide
No ratings yet
Numpy, Pandas, SQL: Data Analysis Guide
21 pages
NumPy & Pandas VIVA Questions Guide
No ratings yet
NumPy & Pandas VIVA Questions Guide
12 pages
Numpy and Pandas Operations Explained
No ratings yet
Numpy and Pandas Operations Explained
3 pages
AI & ML Lab Practical File 2024-25
No ratings yet
AI & ML Lab Practical File 2024-25
31 pages
Python Data Science Course Outline
No ratings yet
Python Data Science Course Outline
5 pages
Python Interview Questions for Analysts
No ratings yet
Python Interview Questions for Analysts
40 pages
Essential Pandas Commands and Concepts
No ratings yet
Essential Pandas Commands and Concepts
40 pages
IP 12 Holiday Homework 2024
No ratings yet
IP 12 Holiday Homework 2024
4 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
5 pages
Class 12 Informatics Practices Assignments
No ratings yet
Class 12 Informatics Practices Assignments
3 pages
Python Pandas Practice Questions
No ratings yet
Python Pandas Practice Questions
29 pages
Python Pandas Series Answer Key
No ratings yet
Python Pandas Series Answer Key
54 pages
Numpy and Pandas Interview Questions
No ratings yet
Numpy and Pandas Interview Questions
16 pages
Python Interview Questions
No ratings yet
Python Interview Questions
40 pages
OCS353 Unit 2
No ratings yet
OCS353 Unit 2
1 page
Acknowledgement
No ratings yet
Acknowledgement
15 pages
2023 Data Analysis and Visualization Using Python
100% (2)
2023 Data Analysis and Visualization Using Python
9 pages
Python Programming Tasks and Concepts
No ratings yet
Python Programming Tasks and Concepts
3 pages
Practical File 25-26 Ip
No ratings yet
Practical File 25-26 Ip
46 pages
Informatics Practices Holiday Homework
No ratings yet
Informatics Practices Holiday Homework
3 pages
Pandas DataFrame and Series Operations
No ratings yet
Pandas DataFrame and Series Operations
35 pages
Pandas Series and DataFrame Operations
No ratings yet
Pandas Series and DataFrame Operations
3 pages
Python Data Analysis Lab Manual 2025-26
No ratings yet
Python Data Analysis Lab Manual 2025-26
12 pages
Python Pandas Practical Exercises Guide
No ratings yet
Python Pandas Practical Exercises Guide
38 pages
PYQ1 Merged
No ratings yet
PYQ1 Merged
19 pages
Ip Practical File Banaj Virmani 12F 12828
No ratings yet
Ip Practical File Banaj Virmani 12F 12828
42 pages
Python Series and DataFrame Exercises
No ratings yet
Python Series and DataFrame Exercises
25 pages
Essential Pandas Concepts and Code
No ratings yet
Essential Pandas Concepts and Code
8 pages
Class 12 Informatics Practices MCQs
No ratings yet
Class 12 Informatics Practices MCQs
4 pages
Python Programming Assignments Guide
No ratings yet
Python Programming Assignments Guide
5 pages
Week7-4Pandas Series 20 Practice Questions
No ratings yet
Week7-4Pandas Series 20 Practice Questions
8 pages
File Data Science
No ratings yet
File Data Science
18 pages
XII Informatics Practices Exam Marking Scheme
No ratings yet
XII Informatics Practices Exam Marking Scheme
5 pages
Pandas Series Worksheet for Class XII
No ratings yet
Pandas Series Worksheet for Class XII
5 pages
Python Basics: MCQs and Answers
No ratings yet
Python Basics: MCQs and Answers
9 pages
Pandas Data Handling and Visualization Worksheet
No ratings yet
Pandas Data Handling and Visualization Worksheet
9 pages
Python Data Analysis Assignment Guide
No ratings yet
Python Data Analysis Assignment Guide
3 pages
Creating Series from ndarray Examples
No ratings yet
Creating Series from ndarray Examples
28 pages
Class 12 IP: Pandas & Matplotlib Guide
No ratings yet
Class 12 IP: Pandas & Matplotlib Guide
4 pages
B. Tech AI & ML Lab Assignments Guide
No ratings yet
B. Tech AI & ML Lab Assignments Guide
14 pages
Data Visualization and Analysis with Python
No ratings yet
Data Visualization and Analysis with Python
6 pages
Ai Important Questions
No ratings yet
Ai Important Questions
5 pages
Wa0050.
No ratings yet
Wa0050.
11 pages
Abhay Raj's Python Pandas Project
No ratings yet
Abhay Raj's Python Pandas Project
17 pages
Understanding Matplotlib in Hindi
No ratings yet
Understanding Matplotlib in Hindi
38 pages
QAns Python Programming II
No ratings yet
QAns Python Programming II
4 pages
File Work Complete IP XII 2026-27
No ratings yet
File Work Complete IP XII 2026-27
7 pages
Python Pandas Overview and Examples
No ratings yet
Python Pandas Overview and Examples
75 pages
Pandas NumPy Interview Questions
No ratings yet
Pandas NumPy Interview Questions
24 pages
Python for Data Science Course Overview
No ratings yet
Python for Data Science Course Overview
6 pages
Informatics Practices Rehearsal Exam 2021
No ratings yet
Informatics Practices Rehearsal Exam 2021
9 pages
IP PYthon 70marks 16.3.26
No ratings yet
IP PYthon 70marks 16.3.26
8 pages
Python Series and DataFrame Operations
No ratings yet
Python Series and DataFrame Operations
3 pages
Siemens Healthineers Internship Application
No ratings yet
Siemens Healthineers Internship Application
1 page
Goldman Sachs Summer Analyst Cover Letter
No ratings yet
Goldman Sachs Summer Analyst Cover Letter
1 page
EY Associate Appointment Letter 2024
No ratings yet
EY Associate Appointment Letter 2024
9 pages
AI Internship and Project Experience
No ratings yet
AI Internship and Project Experience
1 page
MSI Placement Cell Guidelines for Students
No ratings yet
MSI Placement Cell Guidelines for Students
2 pages
Understanding E-Commerce Evolution and Benefits
No ratings yet
Understanding E-Commerce Evolution and Benefits
20 pages
Non-Parametric Statistical Analysis Guide
No ratings yet
Non-Parametric Statistical Analysis Guide
8 pages
NAID Certification Test Insights
No ratings yet
NAID Certification Test Insights
26 pages
Khan Enterprises Office Furniture Guide
No ratings yet
Khan Enterprises Office Furniture Guide
7 pages
Understanding World Climate Zones
No ratings yet
Understanding World Climate Zones
10 pages
Class 10 NCERT Polynomials Solutions
100% (1)
Class 10 NCERT Polynomials Solutions
21 pages
Bausch & Lomb Sight Savers SDS
No ratings yet
Bausch & Lomb Sight Savers SDS
10 pages
e-Kranti: Transforming e-Governance
No ratings yet
e-Kranti: Transforming e-Governance
57 pages
Grade 11 Reading and Writing Quiz
No ratings yet
Grade 11 Reading and Writing Quiz
2 pages
User-Manual: - Alarmmanager-Basic/Pro - Multisensors and Keypad - Accessories
No ratings yet
User-Manual: - Alarmmanager-Basic/Pro - Multisensors and Keypad - Accessories
71 pages
Understanding Business Communication
No ratings yet
Understanding Business Communication
7 pages
Positivism vs. Constructivism in Research
No ratings yet
Positivism vs. Constructivism in Research
10 pages
Excavator Slope Reinforcement Guide
No ratings yet
Excavator Slope Reinforcement Guide
6 pages
Class 12 Biomolecules Solutions Guide
No ratings yet
Class 12 Biomolecules Solutions Guide
15 pages
Power Rule for Fractional Exponents
No ratings yet
Power Rule for Fractional Exponents
4 pages
History of STS in the Philippines
No ratings yet
History of STS in the Philippines
5 pages
Test Retest
No ratings yet
Test Retest
8 pages
2021 EnglishGuide 2021 DhmAthin Low
No ratings yet
2021 EnglishGuide 2021 DhmAthin Low
148 pages
Fisheries Support and Surveillance Plan
No ratings yet
Fisheries Support and Surveillance Plan
35 pages
Short Interval Control Guidelines for Mining
No ratings yet
Short Interval Control Guidelines for Mining
1 page
Airtel Opera Handler Free Internet Tricks
No ratings yet
Airtel Opera Handler Free Internet Tricks
9 pages
CnReach-N500 User Guide
No ratings yet
CnReach-N500 User Guide
236 pages
Problem Solving with C Programming Course
No ratings yet
Problem Solving with C Programming Course
7 pages
Vietnamese Proverbs and Cultural Identity
No ratings yet
Vietnamese Proverbs and Cultural Identity
8 pages
Lethan Pickett's Handmade Violin Journey
No ratings yet
Lethan Pickett's Handmade Violin Journey
1 page
Iperf User Documentation and Options
No ratings yet
Iperf User Documentation and Options
8 pages
Hinata's Struggles and Sasuke's Scorn
No ratings yet
Hinata's Struggles and Sasuke's Scorn
32 pages
Grade 4 Color and Emotion Lesson Plan
No ratings yet
Grade 4 Color and Emotion Lesson Plan
8 pages
Together Klasa 5: Unit 1 Overview
0% (1)
Together Klasa 5: Unit 1 Overview
12 pages
Understanding Truth in Philosophy
No ratings yet
Understanding Truth in Philosophy
28 pages
Isomerism and Reaction Mechanisms in C7H14 Compounds
No ratings yet
Isomerism and Reaction Mechanisms in C7H14 Compounds
26 pages
Iso 15012-2-2008
No ratings yet
Iso 15012-2-2008
12 pages
Denah Keramik Ruko Heliconia Trenggalek
No ratings yet
Denah Keramik Ruko Heliconia Trenggalek
1 page

Data Science Concepts and Techniques

Uploaded by

Data Science Concepts and Techniques

Uploaded by

Q1 Attempt any four parts (4x5=20)

Q4 (a) Explain four methods of creating Dataframe by using (5)

Q5 Give 4 ways of creating Numpy arrays (10)

Q7. Create a dataframe for the below given data (10)

More than 2000 in ODI

2014 2015 2016 2017

Madhu 100.5 12000 2000 50000

a. Display the row labels of Sales.

Common questions

What Python libraries are most commonly used in data analysis and what are their specific applications?

What are the key strategies to handle missing data and how does each method work?

How can pandas be used to create a DataFrame from multiple lists of different lengths, and what implications does this have for data alignment?

What are some effective data collection strategies and their implications on data quality?

Describe the lifecycle of data science and the stages it includes.

What is the role of a data scientist and how does it differ from a data engineer's role?

Discuss the concept of data imputation and its importance in the preprocessing phase of data analysis.

What distinguishes supervised learning techniques from unsupervised learning techniques in machine learning?

How does a Series data structure differ from a 1-D array, list, and dictionary in Python?

Explain the process and benefits of converting a 2D numpy array to a 1D array.

You might also like