Python

Gradient boosting classifier

Gradient Boosting is an ensemble technique that builds a strong model by combining multiple weak decision trees. While it may seem similar to a Random Forest, there’s a key difference: in Random Forests, each tree is built independently, whereas in Gradient Boosting, trees are built sequentially, with each new tree correcting the errors of the previous ones.

The goal is to minimize the loss function at each stage, gradually improving the model’s performance. Gradient Boosting is versatile and can be used for both regression and classification tasks—but in this example, we focus specifically on classification.

we import pandas as pd and we also imprt datasets from sklearn.

				
					import pandas as pd

				
					from sklearn import datasets

Next we load the wine dataset from scikit-learn.
This returns a pandas DataFrame

				
					wine = datasets.load_wine(as_frame=True)

Next we assign the ‘data’ to X.

X holds the feature matrix.

				
					X = wine['data']

Here we assign the target labels from the wine dataset to the varibale y.

				
					y = wine['target']

Next we import train_test_split to split our data into training and testing set.

				
					from sklearn.model_selection import train_test_split

We import cross_val_score, which is used to evaluate the performance of a model using cross-validation.

				
					X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=17)

Here, we import the cross_val_score function from scikit-learn, which is used to evaluate the performance of a model using cross-validation.

				
					from sklearn.model_selection import cross_val_score

Next we imports the GradientBoostingClassifier from scikit-learn’s ensemble module.

				
					gbr = GradientBoostingClassifier()

Next we train the modelwith the train data using the .fit() method.

				
					gbr.fit(X_train, y_train)

				
					cross_val_score(gbr, X_train, y_train, scoring='accuracy', cv=5, n_jobs=-1).mean()

Hyperparameters in GradientBoostingClassifier:

learning_rate: Controls the contribution of each tree to the final model. Lower values reduce overfitting risk by slowing down the learning process, often requiring more trees to compensate.
criterion: The loss function used to evaluate and determine the best feature and threshold to split the data at each node.
max_depth: Specifies the maximum depth of each individual decision tree. Shallower trees help prevent overfitting but may underfit.
n_estimators: The total number of trees (iterations) used in the boosting process. More estimators usually improve performance but increase computation time.
init: An initial estimator used to make the first predictions before boosting begins. By default, this is based on the log-odds of the target class (converted to probabilities).

Here, we define the range of hyperparameters for a gridsearch to find the best combination of values for our GradientBoostingClassifier.

				
					param_grid = {
    'n_estimators':[10, 50, 100, 500],
    'learning_rate':[0.0001, 0.001, 0.01, 0.1, 1.0],
    'max_depth':[3,7, 9],
}

Next we import the GridSearchCv, which is used for systematically searching through a specified set of hyperparameter combinations to find the best one for a machine learning model.

				
					from sklearn.model_selection import GridSearchCV

				
					gbr2 = GridSearchCV(gbr, param_grid, cv=3, n_jobs=-1)

Next we fit the dataset to the model.

				
					gbr2.fit(X_train, y_train)

				
					gbr2.best_params_

				
					gbr2.best_score_

Free Community

Join 1,000+ AI Automation Builders

Weekly tutorials, live calls & direct access to Ryan & Matt.

Join Free →

Ryan Nolan

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.

Gradient boosting classifier

Table of Contents

Hyperparameters in GradientBoostingClassifier:

Join 1,000+ AI Automation Builders

Ryan Nolan

Important Links

LinkedIn

Social Media

Keep Learning

Streamlit Title

Streamlit Async

Streamlit Caching

Streamlit Tutorial

Reflexion Prompting