extra trees classifier

Â

Aggregates the results from group of decision trees (Like a random forest)

Difference

1. ETC randomly selects the value to split features unlike a DTC which looks for the best
2. Makes ETC More random + Faster Algorithm which can help with noisy data


  from sklearn.datasets import make_classification
  X, y = make_classification(n_features=11, random_state=21)
  from sklearn.model_selection import train_test_split
  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=16)
  from sklearn.ensemble import ExtraTreesClassifier
  ETC = ExtraTreesClassifier(random_state=0)
  ETC.fit(X_train, y_train)
  from sklearn.model_selection import cross_val_score
  cross_val_score(ETC, X_train, y_train, scoring='accuracy', cv=5, n_jobs=-1).mean()
  param_grid = { 'n_estimators' : [100, 300, 500], 'min_samples_leaf': [5,10,25], 'max_features': [2,3,4,6] }
  from sklearn.model_selection import GridSearchCV
  ETC2 = GridSearchCV(ETC, param_grid, cv=3, n_jobs=-1)
  ETC2.fit(X_train, y_train)
  ETC2.best_params_
  ETC2.best_score_

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.

Leave a Reply

Your email address will not be published. Required fields are marked *