In Python, a quantile is a statistical term used to describe a point or value below which a certain proportion of the data falls. It means a quntile split data into intervals. We start by importing numpy and pandas. numpy is used for high-performance numerical computation. Pandas is used for data manipulation, data analysis and […]
Python Pandas Data Cleaning
https://www.espncricinfo.com/records/highest-career-batting-average-282910 Here, we read the CSV file names ‘CricketTestMatchData.csv’ into a DataFrame called df using the read_csv. Here, we check for missing null values in the DataFrame df. It returns a Boolean result for each column. It returns True if the colun has any missing values and False if it doesn’t. This line filters the […]
Pandas Columns
Pandas Dataframes are composed of Rows and Columns. In this guide we are going to cover everything you need to know about working with columns. The article is based on a tutorial we published on our YouTube channel. Feel free to check it out below. Let’s start with importing in Pandas and NumPy. Here we […]
Pandas Resample
The .resample() method in pandas works similarly to .groupby(), but it is specifically designed for time-series data. It groups data into defined time intervals and then applies one or more functions to each group. This method is useful for both upsampling—where missing data points can be filled or interpolated—and downsampling, which involves aggregating data over […]
Python Pandas JSON
JSON (JavaScript Object Notation) is a lightweight, human-readable data interchange format that is widely used for both data storage and transfer. It is structured using key-value pairs and supports various data types, including strings, numbers, booleans, arrays, and nested objects. JSON is a standard format commonly used in APIs and web data, which makes it […]
beautifulsoup pagination
import requests – Allows us to make HTTP requests to web pages. from bs4 import BeautifulSoup –It is used to parse and extract data from HTML content. import pandas as pd – It is used for organizing and manipulating data in table format. import re – It enables pattern matching using regular expressions. from time […]
adaboost classifier
Adaptive Boosting, or AdaBoost, is a boosting algorithm that combines multiple low-accuracy (weak) models to form a single high-accuracy (strong) model. It works by sequentially training these weak learners, each one focusing more on the errors made by the previous ones. Any machine learning algorithm that supports weighted training samples—such as Decision Trees, Logistic Regression, […]
Gradient boosting classifier
Gradient Boosting is an ensemble technique that builds a strong model by combining multiple weak decision trees. While it may seem similar to a Random Forest, there’s a key difference: in Random Forests, each tree is built independently, whereas in Gradient Boosting, trees are built sequentially, with each new tree correcting the errors of the […]
Kaggle House price prediction Regression Analysis
train_df = train_df.drop(columns=[‘PoolQC’, ‘MiscFeature’, ‘Alley’, ‘Fence’, ‘GarageYrBlt’, ‘GarageCond’, ‘BsmtFinType2’]) test_df = test_df.drop(columns=[‘PoolQC’, ‘MiscFeature’, ‘Alley’, ‘Fence’, ‘GarageYrBlt’, ‘GarageCond’, ‘BsmtFinType2’]) #drop GarageArea or GarageCars #build models
kaggle titanic tutorial
#military – Capt, Col, Major #noble – Jonkheer, the Countess, Don, Lady, Sir #unmaried Female – Mlle, Ms, Mme #NEW Drop Sibsp, Parch, TicketNumberCounts #OLD #X = train_df.drop([‘Survived’], axis=1) #y = train_df[‘Survived’] #X_test = test_df.drop([‘Age_Cut’, ‘Fare_Cut’], axis=1)