Friday, 9 May 2025

Web Scraping with Beautiful Soup in Python 3

Web Scraping with Beautiful Soup in Python 3

Beautiful Soup is a powerful Python library used for parsing HTML and XML documents. It creates parse trees that are helpful for extracting data easily. This tutorial demonstrates how to use Beautiful Soup to scrape a website.

Installing Beautiful Soup

pip install beautifulsoup4 requests

Basic Example

The following Python code fetches and parses the content of a web page:

import requests
from bs4 import BeautifulSoup

# Send a request to the website
url = 'https://example.com'
response = requests.get(url)

# Parse the content
soup = BeautifulSoup(response.content, 'html.parser')

# Print the page title
print(soup.title.text)

Extracting Links

You can easily extract all the links on a web page:

for link in soup.find_all('a'):
    href = link.get('href')
    text = link.text.strip()
    print(f'Text: {text}, URL: {href}')

Finding Elements by Class or ID

# Find element by class name
item = soup.find('div', class_='item-class')
print(item.text)

# Find element by ID
header = soup.find(id='main-header')
print(header.text)

Conclusion

Beautiful Soup is an excellent tool for web scraping when used responsibly. Always check the website's terms of service and robots.txt before scraping.

No comments:

Post a Comment