We would be looking at tuning hyperparameters with Scikit-Learn. Scikit-Learn is a powerful machine learning library for Python. It provides simple , efficient tools for data analysis and modeling. Hyperparameter tuning is the process of finding the best values for the settings of a machine learning model that are not learned from data, but set […]
principal component analysis scikit learn
PCA (Principal Component Analysis) in Python using Scikit-learn is a technique used to reduce the number of features in a dataset while preserving most of the variance (information). It works by: Finding new axes (principal components) that capture the most variance. Projecting the data onto these fewer dimensions. It’s useful for visualization, speeding up models, […]
Simple Imputer
When working with data in Python, especially using pandas, handling missing values is a crucial step in data cleaning. Missing values can occur in both categorical and numeric columns. There are several common strategies to address them: you can choose to ignore them (though this is rarely recommended), remove the rows that contain them using […]
Logistic Regression
Logistic regression is a statistical model used for binary classification problems, where the goal is to predict one of two possible outcomes. Unlike linear regression, which predicts continuous values, logistic regression estimates the probability that a given input belongs to a particular class. It uses the logistic (sigmoid) function to map predicted values between 0 […]
Decision Tree
A decision tree is a non-parametric supervised learning algorithm, which is utilized for both classification and regression tasks. It has hierarchical, tree structure, which consists of a root node, branches, internal nodes and leaf nodes. note: Parametric supervised learning refers to a type of machine learning where the model assumes a specific functional form and estimates […]
Voting Classifier
Boosting Accuracy with Voting Classifiers In machine learning, combining multiple models often leads to better performance than relying on a single one. A Voting Classifier is a simple ensemble method that does just that — it aggregates predictions from several models to improve accuracy. There are two types: Hard Voting: Takes the majority vote from […]
Elastic Net Regressor
Elastic Net regression is a linear regression method that merges the strengths of both Lasso (L1) and Ridge (L2) regression techniques. It helps reduce overfitting and is especially effective when working with datasets that have many features, particularly when some of those features are highly correlated. The model’s regularization is controlled by two key hyperparameters: […]
gradient boosting classifier
Boosting in machine learning is a technique that combines multiple simple models, often decision trees into a single, stronger model. It works with regression trees and improves performance by sequentially learning from the mistakes of previous models. According to the scikit-learn documentation, at each stage, a regression tree is fit on the negative gradient of […]
Random Forest Regressor
Random forest regressor is a variant of the random forest classifier. It is primarily used for classification tasks. This model is an ensemble of decision trees. It combines the predictions of multiple individual trees to imrpove performance. By aggregating the results from those trees, typically through votng or avaeraging. It produces a final prediction that […]
machine learning imbalanced classes
#Read over#data professor#emma Ding#mahesh huddar#ritvik mathPart 1 Load a Dataset Part 2 SIMPLE EDA Part 3 Set Up the Data Part 4 BASELINE MODEL – NO FIXING THE IMBALANCE part 5Oversampling ExampleOversampling Example 1 RandomOverSampler To start we’re going to create a simple dataframe in python led to overfitting part 6Oversampling Method Example 2 SMOTE […]