Boosting in machine learning is a technique that combines multiple simple models, often decision trees into a single, stronger model. It works with regression trees and improves performance by sequentially learning from the mistakes of previous models. According to the scikit-learn documentation, at each stage, a regression tree is fit on the negative gradient of […]
Random Forest Regressor
Random forest regressor is a variant of the random forest classifier. It is primarily used for classification tasks. This model is an ensemble of decision trees. It combines the predictions of multiple individual trees to imrpove performance. By aggregating the results from those trees, typically through votng or avaeraging. It produces a final prediction that […]
machine learning imbalanced classes
#Read over#data professor#emma Ding#mahesh huddar#ritvik mathPart 1 Load a Dataset Part 2 SIMPLE EDA Part 3 Set Up the Data Part 4 BASELINE MODEL – NO FIXING THE IMBALANCE part 5Oversampling ExampleOversampling Example 1 RandomOverSampler To start we’re going to create a simple dataframe in python led to overfitting part 6Oversampling Method Example 2 SMOTE […]
Column Transformer
#drop #Example Passthrough some columns, drop offthers
extra trees classifier
The Extra Trees Classifier is an ensemble machine learning methid that cimbines predictions from many individual trees. https://youtu.be/S2e70seVw3k Aggregates the results from group of decision trees (Like a random forest)Difference1. ETC randomly selects the value to split features unlike a DTC which looks for the best2. Makes ETC More random + Faster Algorithm which […]
Lasso Regression
https://youtu.be/LmpBt0tenJE#LASSO stands for Least Absolute Shrinkage and Selection Operator#L1 regularization #address overfitting – A model that is too complex may fit the training data very well#but perform poorly on new, unseen data #will get rid ofe useless features (make coefficients independent var next to 0)#- automatic feature selection # lead to a simpler model that […]
Ridge Regressor
https://youtu.be/GMF4Td7KtB0#Ridge Regression which is considered #L2 Regularization #helps with overfitting in linear regression models #keeping the coefficients small # lead to a model that is less prone to overfitting #balance between fitting the data and keeping the coefficients small #more robust and stable models, particularly when dealing with datasets that have highly correlated predictor variables […]
Stacking Regressor
SEE ALL NULL VLAUES voting classifier hyperparamater tuning
Multicollinearity
dividing the total number of bases a player records by their total number of at-batsmaybe replace this with something else? CORRELATION MATRIX VIF Instead of using raw height, you might normalize or categorize height into bins, which could reduce the numerical interdependence.Calculate Condition Index (CI) How to address MulticollinearityDrop a Feature (At Bats) look at […]
Sklearn Gaussian Mixture Models
In Scikit-Learn Gaussian Mixture Models allow you to represent clusters of data into multiple normal distributions. This tutorial will walk you through two different examples of utilizing GMMs. We will go through one with generated blobs and another with baseball card values. If you want to watch a video based around the tutorial, we have […]