import requests – Allows us to make HTTP requests to web pages. from bs4 import BeautifulSoup –It is used to parse and extract data from HTML content. import pandas as pd – It is used for organizing and manipulating data in table format. import re – It enables pattern matching using regular expressions. from time […]
web scraping with python
import requests from urllib.parse import urljoin import urllib.robotparser web scraping with python Part 1 Getting your first page def response_code(response): if response.status_code == 200: print(“Page fetched successfully!”) else: print(“Failed to retrieve page:”, response.status_code) URL = “http://books.toscrape.com/” url_response = requests.get(URL) response_code(url_response) Failed to retrieve page: 403 | a client does not have the necessary permissions to […]
BeautifulSoup4 find vs find_all
https://youtu.be/CSVnEKCWh5M import requests from bs4 import BeautifulSoup import pandas as pd import re html = “”” Ultra Running Events Ultra Running Events 50 Mile Races 100 Mile Races 50 Mile Races Rocky Mountain 50 Date: August 10, 2025 Location: Boulder, Colorado Desert Dash 50 Date: September 14, 2025 Location: Moab, Utah 100 Mile Races Mountain […]
beautifulsoup4 Selectors
import requests from bs4 import BeautifulSoup import pandas as pd import re html = “”” Ultra Running Events Ultra Running Events 50 Mile Races 100 Mile Races 50 Mile Races Rocky Mountain 50 Date: August 10, 2025 Location: Boulder, Colorado Desert Dash 50 Date: September 14, 2025 Location: Moab, Utah 100 Mile Races Mountain Madness […]
BeautifulSoup4 extract table
import requests import pandas as pd from bs4 import BeautifulSoup Basic Example HTML Code -> Runners html = “”” Personal Running Bests Personal Running Bests Distance Time 5k 18:30 10k 37:50 Half Marathon 1:25:11 Marathon 3:17:00 50 Miler 9:14:30 100 Miler 32:11:11 “”” Extract headers headers = [th.get_text(strip=True) for th in table.find_all(“th”)] headers Step 5: […]