0% found this document useful (0 votes)

13 views22 pages

Data Parsing and File Handling in Python

The document outlines various data parsing techniques in Python, including parsing text, CSV, JSON, HTML, and XML files. It also covers operations with NumPy arrays, reading and writing binary files, handling missing data with pandas, and performing data manipulations such as merging, filtering, and plotting. Additionally, it includes examples of writing to and reading from text files, demonstrating error handling and data aggregation.

Uploaded by

srilathadg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views22 pages

Data Parsing and File Handling in Python

Uploaded by

srilathadg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

✅ 1.

Parse text File

✅ 2. Parse CSV File

✅ 3. Parse JSON File
✅ 4. Parse HTML File (using BeautifulSoup)
📦 You need to install BeautifulSoup: pip install beautifulsoup4
from bs4 import BeautifulSoup

# Open and read the HTML file

with open('E:/[Link]', 'r') as file:

html_content = [Link]()

# Parse the HTML using BeautifulSoup

soup = BeautifulSoup(html_content, '[Link]')

# Print the title of the page

print("Title:", [Link])

# Print the text inside the h1 tag

print("Heading:", [Link])

✅ 4. Parse XML File (using ElementTree)

import [Link] as ET

# Parse the XML file

tree = [Link]('E:/[Link]')

root = [Link]()

# Print the root tag

print("Root tag:", [Link])

# Print each child element and its text

for i in root:

print([Link] + ":", [Link])

✅ Notes:
 ElementTree is part of Python’s standard library — no need to install anything.
from numpy import *

# 1. Create arrays
a1 =array([1, 2, 3, 4, 5]) # 1D array
a2 = arange(6, 16).reshape(2, 5) # 2D array with shape (2, 5)

print("Array1:\n", a1)
print("Array2:\n", a2)

# 2. Reshape array
reshaped = [Link](5, 2)
print("\nReshaped Array2 to (5,2):\n", reshaped)

# 3. Slice arrays
print("\n Slice first 3 elements of Array1:", a1[:3])
print("Slice second row of Array2:", a2[1])

# 4. Add indexes
a1[2] = 99 # change third element
print("\n Modified Array1 (index 2 changed to 99):", a1)

# 5. Apply arithmetic
x = a1 + 10
print("\nArray1 + 10:\n", x)
y = a1 - 5
print("\nArray1 - 5:\n", y)
z = a1 * 10
print("\nArray1 * 10:\n", z)
p = a1 / 10
print("\nArray1 / 10:\n", p)

# 6. Apply logic
l = a1 > 3
print("\nArray1 > 3:\n", l)

# 7. Aggregation
print("\nSum of Array2:", sum(a2))
print("Mean of Array2:", mean(a2))
print("Max of Array2:", max(a2))
2. Program for reading and writing binary file.
try:
f = open('E:/Data engineering/[Link]', 'rb')
f1 = open('E:/Data engineering/[Link]', 'wb')

for i in f:
[Link](i)

print("Image copied successfully.")

except FileNotFoundError:
print("Error: Source image not found.")

except IOError as e:
print("I/O error occurred:", e)

finally:
# Make sure files are closed properly
try:
[Link]()
[Link]()
except NameError:
# If f or f1 was never successfully opened
pass
import pandas as pd

# Single-level indexing
df_single = [Link]({
'Name': ['Anusha', 'Keerthana', 'Ajay'],
'Age': [25, 30, 35]
})
print("Single-level Indexing:")
print(df_single)

# Hierarchical (Multi-level) indexing

data = [Link](
[100, 200, 300, 400],
index=[['A', 'A', 'B', 'B'], ['Math', 'Science', 'Math', 'Science']]
)
print("\nHierarchical Indexing:")
print(data)

Program for handling missing data

import pandas as pd
import numpy as np

data = {
'Name': ['Lily', 'Khushi', 'Mohan'],
'Age': [25, [Link], 35],
'Score': [90, 85, [Link]]
}
df = [Link](data)

print("Original DataFrame with NaNs:")

print(df)
# Fill missing values
df_filled = [Link](0)
print("\nFilled Missing Values with 0:")
print(df_filled)

# Drop rows with any NaN

df_dropped = [Link]()
print("\nDropped Rows with Any NaN:")
print(df_dropped)

import pandas as pd

df = [Link]({
'A': [1, 2, 3],
'B': [4, 5, 6]
})

print("Original DataFrame:")
print(df)
# Arithmetic operation: Add a column
df['C'] = df['A'] + df['B']
print("\nAfter Arithmetic Operation (A + B):")
print(df)

# Boolean operation: Filter rows

filtered = df[df['C'] > 6]
print("\nFiltered Rows (C > 6):")
print(filtered)

import pandas as pd

# Merging
df1 = [Link]({'ID': [1, 2], 'Name': ['Khushi', 'Reshama']})
df2 = [Link]({'ID': [1, 2], 'Score': [90, 85]})

merged = [Link](df1, df2, on='ID')

print("Merged DataFrame:")
print(merged)
# Aggregation
df = [Link]({
'Department': ['HR', 'HR', 'IT', 'IT'],
'Salary': [50000, 55000, 60000, 65000]
})

agg = [Link]('Department').mean()
print("\nAggregated Average Salary by Department:")
print(agg)

7. Plot Individual Columns and Entire Table.

import pandas as pd
import [Link] as plt

# Load or create a sample DataFrame

data = {
'Year': [2018, 2019, 2020, 2021, 2022],
'Sales': [100, 150, 130, 170, 200],
'Profit': [20, 35, 30, 50, 60]
}
df = [Link](data)
# Plot individual columns
[Link](figsize=(8, 5))
[Link](df['Year'], df['Sales'], marker='o', label='Sales')
[Link]('Sales Over Years')
[Link]('Year')
[Link]('Sales')
[Link](True)
[Link]()
[Link]()
# Plot another individual column
[Link](figsize=(8, 5))
[Link](df['Year'], df['Profit'], color='green', marker='s', label='Profit')
[Link]('Profit Over Years')
[Link]('Year')
[Link]('Profit')
[Link](True)
[Link]()
[Link]()
# Plot the whole table (all numeric columns vs Year)
[Link](figsize=(8, 5))
for column in [Link]:
if column != 'Year':
[Link](df['Year'], df[column], marker='o', label=column)
[Link]('Sales & Profit Over Years')
[Link]('Year')
[Link]('Value')
[Link]()
[Link](True)
[Link]()
 Reading data from file and writing data in file

#writing data in the file

try:
f = open("E:/Data engineering/[Link]", 'w')
[Link]("Laptop")
[Link]("Mobile")
print("Data written successfully.")
except IOError as e:
print("An I/O error occurred:", e)
finally:
try:
[Link]()
except NameError:
# If 'f' was not successfully created
print("File was not opened properly.")
#appending data in the file
#reading data from file
try:

f = open("E:/Data engineering/[Link]", 'r')

print([Link](), end="")

print([Link]())

except FileNotFoundError:

print("Error: The file '[Link]' was not found.")

except IOError as e:

print("An I/O error occurred:", e)

finally:

try:

[Link]()

except NameError:

print("File could not be opened.")

AI Practical File for Class XII Students
No ratings yet
AI Practical File for Class XII Students
46 pages
Student Performance and Data Analysis Programs
No ratings yet
Student Performance and Data Analysis Programs
31 pages
Pandas Programming Examples and Visualizations
No ratings yet
Pandas Programming Examples and Visualizations
22 pages
Pandas Data Manipulation Techniques
No ratings yet
Pandas Data Manipulation Techniques
21 pages
Pandas Data Analysis: A Comprehensive Guide
No ratings yet
Pandas Data Analysis: A Comprehensive Guide
13 pages
Python Interview Questions for Data Analysts
No ratings yet
Python Interview Questions for Data Analysts
14 pages
Pandas DataFrame Operations Guide
No ratings yet
Pandas DataFrame Operations Guide
12 pages
Python Pandas DataFrame Guide
No ratings yet
Python Pandas DataFrame Guide
23 pages
Pandas Practical Guide: 15 Programs
No ratings yet
Pandas Practical Guide: 15 Programs
16 pages
Introduction to Python Pandas Library
No ratings yet
Introduction to Python Pandas Library
28 pages
Python Programs for Data Analysis
No ratings yet
Python Programs for Data Analysis
29 pages
Python Pandas Programs for DataFrames
No ratings yet
Python Pandas Programs for DataFrames
20 pages
Pandas Data Manipulation Techniques
No ratings yet
Pandas Data Manipulation Techniques
135 pages
Writing References in Practical Files
No ratings yet
Writing References in Practical Files
98 pages
Python Pandas Project on Teacher Data
No ratings yet
Python Pandas Project on Teacher Data
51 pages
Lecture14 Numpy Operations
No ratings yet
Lecture14 Numpy Operations
21 pages
Essential Pandas Interview Questions
No ratings yet
Essential Pandas Interview Questions
154 pages
IP Practical Programs Complete
No ratings yet
IP Practical Programs Complete
24 pages
Pandas Library Overview and Setup Guide
No ratings yet
Pandas Library Overview and Setup Guide
9 pages
Pandas and MySQL Programming Examples
No ratings yet
Pandas and MySQL Programming Examples
25 pages
Final Project Class12 FormattedOutputs
No ratings yet
Final Project Class12 FormattedOutputs
55 pages
Introduction to Python Libraries
No ratings yet
Introduction to Python Libraries
12 pages
Data Wrangling with Pandas in Python
No ratings yet
Data Wrangling with Pandas in Python
15 pages
Python Basics Cheat Sheet
No ratings yet
Python Basics Cheat Sheet
11 pages
FileIO & Pandas
No ratings yet
FileIO & Pandas
13 pages
Python Basics Cheat Sheet PDF
100% (4)
Python Basics Cheat Sheet PDF
10 pages
Understanding Pandas for Data Analysis
No ratings yet
Understanding Pandas for Data Analysis
28 pages
Boards Final Assigment
No ratings yet
Boards Final Assigment
49 pages
Lucknow Public School Project 2024-25
No ratings yet
Lucknow Public School Project 2024-25
44 pages
Python File Operations and Pandas Guide
No ratings yet
Python File Operations and Pandas Guide
64 pages
Introduction to Pandas Library
No ratings yet
Introduction to Pandas Library
9 pages
Python Pandas Series and DataFrame Guide
No ratings yet
Python Pandas Series and DataFrame Guide
20 pages
Image PDF 20260203 1107
No ratings yet
Image PDF 20260203 1107
38 pages
Python Data Analysis with Pandas & PyNumS
No ratings yet
Python Data Analysis with Pandas & PyNumS
10 pages
Python AI Experiments with NumPy & Pandas
No ratings yet
Python AI Experiments with NumPy & Pandas
15 pages
Pandas Data Handling Guide-1
No ratings yet
Pandas Data Handling Guide-1
13 pages
Python Pandas Cheat Sheet Guide
100% (2)
Python Pandas Cheat Sheet Guide
5 pages
DataFrame Creation and Manipulation Examples
No ratings yet
DataFrame Creation and Manipulation Examples
20 pages
Python Data Analysis Cheat Sheet
100% (3)
Python Data Analysis Cheat Sheet
9 pages
Python Pandas Program Guide
No ratings yet
Python Pandas Program Guide
2 pages
Pandas Data Handling Guide for Class XII
No ratings yet
Pandas Data Handling Guide for Class XII
18 pages
Data Manipulation with Pandas Guide
No ratings yet
Data Manipulation with Pandas Guide
5 pages
Pandas Programming Exercises Guide
No ratings yet
Pandas Programming Exercises Guide
18 pages
Pandas DataFrame Cheat Sheet
No ratings yet
Pandas DataFrame Cheat Sheet
12 pages
Lab Activity
No ratings yet
Lab Activity
11 pages
Data Engineering Lab File
No ratings yet
Data Engineering Lab File
22 pages
Python Pandas
No ratings yet
Python Pandas
30 pages
Python Package Installation and Usage
No ratings yet
Python Package Installation and Usage
36 pages
Compte Rendu Tp1
No ratings yet
Compte Rendu Tp1
12 pages
Python Data Analysis Guide
No ratings yet
Python Data Analysis Guide
9 pages
Data Analytics Using Python
No ratings yet
Data Analytics Using Python
22 pages
AI & Data Science Lab Manual
No ratings yet
AI & Data Science Lab Manual
27 pages
Pandas CSV
No ratings yet
Pandas CSV
7 pages
Master Pandas: Data Analysis with Python
No ratings yet
Master Pandas: Data Analysis with Python
19 pages
Creating Pandas Series and DataFrames
No ratings yet
Creating Pandas Series and DataFrames
31 pages
Data Analysis with Python and R
No ratings yet
Data Analysis with Python and R
28 pages
Python Data Science Cheat Sheet
100% (4)
Python Data Science Cheat Sheet
11 pages
g12 Ip-unit-1-Data Handling Using Pandas
No ratings yet
g12 Ip-unit-1-Data Handling Using Pandas
14 pages
Python DataFrame and Series Examples
No ratings yet
Python DataFrame and Series Examples
14 pages
Vasundhara RM @2
No ratings yet
Vasundhara RM @2
2 pages
Anuradha
No ratings yet
Anuradha
8 pages
AI Problem Solving and Techniques Guide
No ratings yet
AI Problem Solving and Techniques Guide
19 pages
Understanding Heuristic Methods in AI
No ratings yet
Understanding Heuristic Methods in AI
10 pages
NAGARANI
No ratings yet
NAGARANI
8 pages
Paper 2
No ratings yet
Paper 2
3 pages
Paper 1
No ratings yet
Paper 1
3 pages
Basic Plotting with Pandas in Python
No ratings yet
Basic Plotting with Pandas in Python
14 pages
Understanding Inheritance in C++
No ratings yet
Understanding Inheritance in C++
39 pages
Java Programming Basics and Features
No ratings yet
Java Programming Basics and Features
37 pages
State Space Search in AI Explained
No ratings yet
State Space Search in AI Explained
44 pages
Java Programming Language Overview
No ratings yet
Java Programming Language Overview
103 pages
Paragathi Women's Degree College Overview
No ratings yet
Paragathi Women's Degree College Overview
3 pages
8th Grade Student Color List
No ratings yet
8th Grade Student Color List
1 page
Class Schedule with Timings
No ratings yet
Class Schedule with Timings
1 page
Bigg Boss Elimination Results 2023
No ratings yet
Bigg Boss Elimination Results 2023
1 page
Patient Admission Records Summary
No ratings yet
Patient Admission Records Summary
1 page
Software Testing and Quality Assurance Guide
No ratings yet
Software Testing and Quality Assurance Guide
6 pages
Student Scores and Averages Report
No ratings yet
Student Scores and Averages Report
1 page
Student Hall Ticket Information
No ratings yet
Student Hall Ticket Information
4 pages
Student Scores and Averages Report
No ratings yet
Student Scores and Averages Report
3 pages
Pragathi WOMENS Degree COLLEGE
No ratings yet
Pragathi WOMENS Degree COLLEGE
2 pages
Sunidhi Rajpoot: BCA Student Profile
No ratings yet
Sunidhi Rajpoot: BCA Student Profile
1 page
College Placement Meeting Request
No ratings yet
College Placement Meeting Request
1 page
Understanding MVC Architecture Explained
No ratings yet
Understanding MVC Architecture Explained
2 pages
Student Marks List and Details
No ratings yet
Student Marks List and Details
1 page
Summer Vacation Letter to Brother
No ratings yet
Summer Vacation Letter to Brother
1 page
Nagini Yalla Resume 2026
No ratings yet
Nagini Yalla Resume 2026
8 pages
AN7169 Dual Audio Amplifier Overview
No ratings yet
AN7169 Dual Audio Amplifier Overview
4 pages
Recruitment and Selection Process Guide
No ratings yet
Recruitment and Selection Process Guide
8 pages
High-Performance Centrifuge Solutions
No ratings yet
High-Performance Centrifuge Solutions
2 pages
E-PICV Valve Technical Specifications
No ratings yet
E-PICV Valve Technical Specifications
1 page
Bugatti Veyron Instructions
0% (1)
Bugatti Veyron Instructions
19 pages
IUNS-ICN 2025 Abstract Submission Guide
No ratings yet
IUNS-ICN 2025 Abstract Submission Guide
4 pages
Dental Implant Design Fundamentals
No ratings yet
Dental Implant Design Fundamentals
45 pages
Discounted Cash Flow Analysis Guide
No ratings yet
Discounted Cash Flow Analysis Guide
2 pages
Potassium Dichromate Safety Data Sheet
100% (1)
Potassium Dichromate Safety Data Sheet
7 pages
C Programming Lab Manual for CSE II Semesters
No ratings yet
C Programming Lab Manual for CSE II Semesters
80 pages
Types of Sedimentary Bedding Structures
No ratings yet
Types of Sedimentary Bedding Structures
12 pages
Vocabulary Teaching Strategies Guide
No ratings yet
Vocabulary Teaching Strategies Guide
49 pages
Step-by-Step Lapbook Creation Guide
No ratings yet
Step-by-Step Lapbook Creation Guide
6 pages
Old Gmail Accounts With Instant Delivery
No ratings yet
Old Gmail Accounts With Instant Delivery
12 pages
Covered Court Construction Estimate
No ratings yet
Covered Court Construction Estimate
255 pages
GPSMapEdit Guide for Garmin Users
No ratings yet
GPSMapEdit Guide for Garmin Users
33 pages
AIS 1 Quiz: Information Systems Basics
No ratings yet
AIS 1 Quiz: Information Systems Basics
1 page
Read Greek in 30 Days New Testament Old Testament Apocrypha Philo Church Fathers 2nd Edition W. Larry Richards No Waiting Time
100% (4)
Read Greek in 30 Days New Testament Old Testament Apocrypha Philo Church Fathers 2nd Edition W. Larry Richards No Waiting Time
83 pages
FMEA in U.S. Coast Guard Regulations
No ratings yet
FMEA in U.S. Coast Guard Regulations
6 pages
SBS Instalment Plans Overview
No ratings yet
SBS Instalment Plans Overview
2 pages
IOTC Objections Review 2024
No ratings yet
IOTC Objections Review 2024
3 pages
TECHNONICOL ENVIRO WHITE Roofing Membrane
No ratings yet
TECHNONICOL ENVIRO WHITE Roofing Membrane
2 pages
Key Terms in Circle Geometry
No ratings yet
Key Terms in Circle Geometry
3 pages
eBay's Path to Perfect Competition
No ratings yet
eBay's Path to Perfect Competition
6 pages
Cellulose Extraction from Banana Peels
No ratings yet
Cellulose Extraction from Banana Peels
14 pages
Industrial Hazards in Pharma Safety Guide
No ratings yet
Industrial Hazards in Pharma Safety Guide
9 pages
Internship Report on Star Health Insurance
No ratings yet
Internship Report on Star Health Insurance
56 pages
Understanding Business Ethics Principles
No ratings yet
Understanding Business Ethics Principles
7 pages
CAP Round-I Allotment List 2023-24
No ratings yet
CAP Round-I Allotment List 2023-24
14 pages

Data Parsing and File Handling in Python

Uploaded by

Data Parsing and File Handling in Python

Uploaded by

✅ 1.

Parse text File

✅ 2. Parse CSV File

# Open and read the HTML file

with open('E:/[Link]', 'r') as file:

# Parse the HTML using BeautifulSoup

soup = BeautifulSoup(html_content, '[Link]')

# Print the title of the page

# Print the text inside the h1 tag

✅ 4. Parse XML File (using ElementTree)

# Parse the XML file

# Print the root tag

print("Root tag:", [Link])

# Print each child element and its text

print([Link] + ":", [Link])

print("Image copied successfully.")

# Hierarchical (Multi-level) indexing

Program for handling missing data

print("Original DataFrame with NaNs:")

# Drop rows with any NaN

# Boolean operation: Filter rows

merged = [Link](df1, df2, on='ID')

7. Plot Individual Columns and Entire Table.

# Load or create a sample DataFrame

#writing data in the file

f = open("E:/Data engineering/[Link]", 'r')

print("Error: The file '[Link]' was not found.")

print("An I/O error occurred:", e)

print("File could not be opened.")

You might also like