extra trees classifier
Â
Aggregates the results from group of decision trees (Like a random forest)
Difference
1. ETC randomly selects the value to split features unlike a DTC which looks for the best
2. Makes ETC More random + Faster Algorithm which can help with noisy data
from sklearn.datasets import make_classification
X, y = make_classification(n_features=11, random_state=21)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=16)
from sklearn.ensemble import ExtraTreesClassifier
ETC = ExtraTreesClassifier(random_state=0)
ETC.fit(X_train, y_train)

from sklearn.model_selection import cross_val_score
cross_val_score(ETC, X_train, y_train, scoring='accuracy', cv=5, n_jobs=-1).mean()

param_grid = { 'n_estimators' : [100, 300, 500], 'min_samples_leaf': [5,10,25], 'max_features': [2,3,4,6] }
from sklearn.model_selection import GridSearchCV
ETC2 = GridSearchCV(ETC, param_grid, cv=3, n_jobs=-1)
ETC2.fit(X_train, y_train)

ETC2.best_params_

ETC2.best_score_

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.