ALL METHODS OF REQUESTS & BEAUTIFULSOUP
Theory + Code Examples (Complete Reference Notes)
PART A: REQUESTS LIBRARY – ALL IMPORTANT METHODS
1. [Link]()
Used to retrieve data from a server. Most commonly used method in web scraping.
import requests
response = [Link]("[Link]
print(response.status_code)
print([Link])
2. [Link]()
Used to send data to the server (forms, login, APIs).
data = {"username": "user", "password": "pass"}
response = [Link]("[Link] data=data)
print([Link])
3. [Link]()
Used to update existing data on a server (mostly APIs).
data = {"name": "Updated Name"}
[Link]("[Link] json=data)
4. [Link]()
Used to delete data on the server.
[Link]("[Link]
5. [Link]()
Fetches only headers, no response body. Useful for checking availability.
response = [Link]("[Link]
print([Link])
6. [Link]()
Checks allowed HTTP methods for a resource.
response = [Link]("[Link]
print([Link]["Allow"])
7. Headers & Params
headers = {"User-Agent": "Mozilla/5.0"}
params = {"page": 1}
[Link]("[Link] headers=headers, params=params)
8. Sessions
Session object stores cookies and improves performance.
session = [Link]()
[Link]("[Link]
[Link]("[Link]
PART B: BEAUTIFULSOUP – ALL IMPORTANT METHODS
1. BeautifulSoup()
Creates a parse tree from HTML.
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "lxml")
2. find()
Returns the first matching tag.
[Link]("h1")
3. find_all()
Returns all matching tags as a list.
soup.find_all("a")
4. select()
Uses CSS selectors.
[Link]("[Link] > p")
5. get_text()
Extracts only text content.
[Link]("p").get_text()
6. attrs & get()
link = [Link]("a")
print([Link])
print([Link]("href"))
7. parent / children / descendants
tag = [Link]("p")
print([Link])
for child in [Link]:
print(child)
8. next_sibling / previous_sibling
tag = [Link]("h1")
print(tag.next_sibling)
9. find_next() / find_previous()
[Link]("h1").find_next("p")
10. prettify()
Formats HTML for readability.
print([Link]())
11. Decompose & Extract
tag = [Link]("script")
[Link]()
12. Summary Table (Conceptual)
Requests handles HTTP communication.
BeautifulSoup handles HTML parsing and navigation.
WEB SCRAPING USING REQUESTS &
BEAUTIFULSOUP
Complete Theory + Code + MCQs + Interview Q&A; + Mini Project
1. Introduction to Web Scraping
Web scraping is an automated technique to extract data from websites using software. It simulates how a
browser requests a web page and then processes the returned HTML to collect useful information. Web
scraping is widely used in data science, research, price comparison, news aggregation, and machine learning
dataset creation.
Applications of Web Scraping:
• Price monitoring (Amazon, Flipkart)
• Job portals data collection
• News and article aggregation
• Data collection for ML models
2. Requests Library – Detailed Explanation
The Requests library is used to send HTTP requests in Python. It supports GET, POST, PUT, DELETE
methods and handles sessions, cookies, and headers.
import requests
url = "[Link]
response = [Link](url)
print(response.status_code)
print([Link])
Status Codes:
200 – Success
404 – Page not found
403 – Forbidden
500 – Server error
3. HTTP Headers (IMPORTANT)
Headers make requests look like they come from a real browser. Without headers, many websites block
scraping.
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
}
response = [Link](url, headers=headers)
4. BeautifulSoup Library
BeautifulSoup parses HTML/XML documents and creates a navigable tree structure. It allows searching data
using tags, attributes, and CSS selectors.
from bs4 import BeautifulSoup
soup = BeautifulSoup([Link], "lxml")
print([Link])
Common Methods:
find(), find_all(), select(), get_text()
5. Real Website Scraping Example
import requests
from bs4 import BeautifulSoup
url = "[Link]
response = [Link](url)
soup = BeautifulSoup([Link], "lxml")
quotes = soup.find_all("span", class_="text")
authors = soup.find_all("small", class_="author")
for q, a in zip(quotes, authors):
print([Link], "-", [Link])
6. Pagination Handling
page = 1
while True:
url = f"[Link]
r = [Link](url)
if r.status_code != 200:
break
soup = BeautifulSoup([Link], "lxml")
quotes = soup.find_all("span", class_="text")
if not quotes:
break
for q in quotes:
print([Link])
page += 1
7. MINI PROJECT: Job Listings Scraper
Objective: Scrape job title, company name, and location from a job listing website and store data in CSV
format.
import requests, csv
from bs4 import BeautifulSoup
url = "[Link]
r = [Link](url)
soup = BeautifulSoup([Link], "lxml")
jobs = soup.find_all("div", class_="job")
with open("[Link]", "w", newline="", encoding="utf-8") as f:
writer = [Link](f)
[Link](["Title", "Company", "Location"])
for job in jobs:
title = [Link]("h2").text
company = [Link]("span", class_="company").text
location = [Link]("span", class_="location").text
[Link]([title, company, location])
8. Advanced Topics
• Sessions & cookies
• Login-based scraping
• Delays using [Link]()
• Avoiding IP blocking
• [Link] rules
9. MCQs
1. Which library is used to parse HTML?
A) NumPy B) Requests C) BeautifulSoup D) Pandas
Answer: C
2. Which HTTP status code means Forbidden?
A) 200 B) 404 C) 403 D) 500
Answer: C
10. Interview Questions & Answers
Q1. What is web scraping?
A. Automated extraction of data from websites.
Q2. Difference between Requests and BeautifulSoup?
A. Requests fetches data, BeautifulSoup parses HTML.
Q3. What is User-Agent?
A. It identifies the browser to the server.
11. Legal & Ethical Considerations
Always follow [Link], avoid excessive requests, and scrape only public data. Never scrape private or
copyrighted content without permission.
Mini Project: Flipkart Web Scraping (Industry-Oriented)
Project Objective:
To collect publicly available product information from Flipkart using Python in a clean,
industry-standard approach suitable for data analysis tasks.
This project demonstrates how companies collect market price data for analysis and comparison.
Tools & Technologies
• Python
• Jupyter Notebook
• requests – HTTP communication
• BeautifulSoup (bs4) – HTML parsing
• pandas – data storage & analysis
Business Use Case
E-commerce companies and analysts scrape product data to:
• Track competitor pricing
• Analyze product popularity
• Build pricing dashboards
• Support business decisions
Step 1: Import Required Libraries
import requests
from bs4 import BeautifulSoup
import pandas as pd
Step 2: Send Request with Browser Headers
url = "[Link]
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
}
response = [Link](url, headers=headers)
print(response.status_code)
Step 3: Parse HTML Response
soup = BeautifulSoup([Link], "lxml")
Step 4: Extract Required Fields
products = soup.find_all("div", class_="_1AtVbE")
records = []
for item in products:
name = [Link]("div", class_="_4rR01T")
price = [Link]("div", class_="_30jeq3")
rating = [Link]("div", class_="_3LWZlK")
if name and price:
[Link]({
"Product Name": [Link],
"Price": [Link],
"Rating": [Link] if rating else "N/A"
})
Step 5: Create Structured Dataset
df = [Link](records)
print([Link]())
Sample Output
Product Name Price Rating
--------------------------------------
Samsung Galaxy F14 ■9,999 4.2
Redmi 12C ■7,499 4.1
Step 6: Export Data
df.to_csv("flipkart_products.csv", index=False)
Conclusion
This project follows a clean and professional workflow used in industry:
• Ethical data collection
• Structured data storage
• Reusable and scalable code
The same approach is used in real-world data engineering and analytics roles.
HTML Tags (Basics for Web Scraping)
HTML tags define the structure of a webpage.
Common tags:
<div> container
<a> link
<span> inline container
<img> image
Example:
<h1>Product</h1>
<p>Price ■999</p>
Tag in BeautifulSoup
Tag represents an HTML element.
Example:
from bs4 import BeautifulSoup
soup = BeautifulSoup("<h1>Flipkart</h1>", "[Link]")
type(soup.h1)
NavigableString
NavigableString represents text inside a tag.
Example:
<p>Price ■999</p>
type([Link])
BeautifulSoup – All Important Functions
find(), find_all(), select(), get_text()
attrs, get()
parent, children, next_sibling
find_next(), find_previous()
decompose(), extract()
HTML Comments in BeautifulSoup
from bs4 import Comment
[Link](string=lambda text: isinstance(text, Comment))
HTML TAGS – DETAILED WITH EXAMPLES
HTML tags define elements of a webpage.
Example HTML:
<html>
<body>
<div class="product">
<h1>Mobile Phone</h1>
<p class="price">■9999</p>
<a href="/mobile">View</a>
</div>
</body>
</html>
In BeautifulSoup:
[Link] → returns first <div> tag
[Link] → Mobile Phone
TAG OBJECT (BeautifulSoup)
A Tag represents an HTML element.
Code:
type([Link])
Output:
<class '[Link]'>
Access attributes:
[Link]['class']
Output:
['product']
NAVIGABLESTRING – DETAILED
NavigableString represents text inside a tag.
Code:
type([Link])
Output:
<class '[Link]'>
Text value:
[Link]
Output:
Mobile Phone
find() FUNCTION
find() returns ONLY the first matching tag.
Code:
[Link]("p")
Output:
<p class="price">■9999</p>
If not found:
[Link]("table")
Output:
None
find_all() FUNCTION
find_all() returns ALL matching tags as a list.
Code:
soup.find_all("a")
Output:
[<a href="/mobile">View</a>]
select() FUNCTION (CSS SELECTOR)
select() uses CSS selectors.
Code:
[Link]("[Link] [Link]")
Output:
[<p class="price">■9999</p>]
get_text() FUNCTION
Extracts only text content.
Code:
[Link]("div").get_text()
Output:
Mobile Phone ■9999 View
attrs & get()
Code:
tag = [Link]("a")
[Link]
Output:
{'href': '/mobile'}
[Link]("href")
Output:
/mobile
PARENT, CHILDREN, SIBLINGS
[Link]
Output:
<div class="product">...</div>
for child in [Link]:
print(child)
Output:
<h1>Mobile Phone</h1>
<p class="price">■9999</p>
<a href="/mobile">View</a>
find_next() & find_previous()
soup.h1.find_next("p")
Output:
<p class="price">■9999</p>
soup.p.find_previous("h1")
Output:
<h1>Mobile Phone</h1>
decompose() & extract()
decompose() removes tag permanently
Code:
[Link]()
extract() removes and returns tag
tag = [Link]()
HTML COMMENTS
HTML Comment Example:
<!-- Product End -->
Code:
from bs4 import Comment
comment = [Link](string=lambda t: isinstance(t, Comment))
Output:
Product End
INDUSTRY SUMMARY
• Tags represent elements
• NavigableString stores text
• find() → first match
• find_all() → list
• select() → CSS selector
• get_text() → clean text
• These are core industry scraping concepts