• Blog
  • YouTube
  • Discord
Sponsorships
Mentorships

scikit-learn

  • Home
  • Blog
  • scikit-learn
scikit-learn

Multicollinearity

May 21, 2025 Ryan Nolan No comments yet

dividing the total number of bases a player records by their total number of at-batsmaybe replace this with something else? CORRELATION MATRIX VIF Instead of using raw height, you might normalize or categorize height into bins, which could reduce the numerical interdependence.Calculate Condition Index (CI) How to address MulticollinearityDrop a Feature (At Bats) look at […]

scikit-learn

Sklearn Gaussian Mixture Models

April 3, 2025 Ryan Nolan No comments yet

In Scikit-Learn Gaussian Mixture Models allow you to represent clusters of data into multiple normal distributions. This tutorial will walk you through two different examples of utilizing GMMs. We will go through one with generated blobs and another with baseball card values. If you want to watch a video based around the tutorial, we have […]

scikit-learn

Sklearn Support Vector Machine

April 2, 2025 admin No comments yet

A popular supervised classification algorithm used within scikit-learn is Support Vector Machine (SVM) SVM works by finding a hyperplane (a decision boundary) that separates data points from different classes. It does this by maximizing the distance (margin) between the hyperplane and the nearest data points from each class, which are called support vectors If you […]

scikit-learn

SKLearn Naive Bayes

April 2, 2025 Ryan Nolan No comments yet

In this Machine Learning lesson we are going to explore the Naive Bayes. An algorithm that is commonly used within Sci-kit learn. It’s a quite simple algorithm to use and we will practice using it on a dataset predicting if a concert sells out. Before we jump into this lesson though, I did want to […]

scikit-learn

SKlearn Multiple Linear Regressions

April 2, 2025 Ryan Nolan No comments yet

This beginner Scikit-learn lesson will cover multiple linear regressions.  This is when there is a relationship between the target (dependent variable) and two or more features (independent variables). If you’d like to watch a video that this tutorial is based on, it’s embedded down below. https://youtu.be/R2Zb5s_RrDU Let’s start by importing in everything we need for […]

scikit-learn

Scikit-learn Pipelines

April 1, 2025 admin No comments yet

import pandas as pd import numpy as np import joblib from sklearn.model_selection import train_test_split from sklearn.impute import SimpleImputer from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.pipeline import make_pipeline, Pipeline from sklearn.preprocessing import StandardScaler, OneHotEncoder from sklearn.compose import ColumnTransformer d1 = {‘Social_media_followers’:[1000000, np.nan, 2000000, 1310000, 1700000, np.nan, 4100000, 1600000, 2200000, 1000000], ‘Sold_out’:[1,0,0,1,0,0,0,1,0,1]} df1 = […]

scikit-learn

Train Test Split

April 1, 2025 Ryan Nolan No comments yet

Train Test Split is an important concept that future Data Scientists or Machine Learning Engineers need to pick up early on. When building models, you’ll want to split your data into two different sets. One for training a model, and one for testing a model. This article is based on the popular YouTube video on […]

scikit-learn

K-Nearest Neighbors

December 20, 2024 adeyanju victor No comments yet

Comprehensive Understanding to K-Nearest Neighbors (KNN) in Supervised Machine Learning. K-Nearest Neighbors (KNN) is a simple, widely used supervised learning algorithm in data science and machine learning It was developed by Evelyn Fix and Joseph Hodges in 1951. Known for it usefulness and versatality, KNN can handle both classification and regression tasks when needed. https://youtu.be/Nz73vXn5afE […]

scikit-learn

Optuna Hyperparameter Tuning

June 27, 2024 Ryan Nolan No comments yet

Optuna is a hyperparameter optimization framework for machine learning models. It can help automate and streamline the process of tuning the hyperparameters. It’s quite popular among Kaggle users and you’ll see it used within competitions. In this article, we will go over an example of using it on a basic dataset. There is also a […]

scikit-learn

Ordinal Encoder

June 20, 2024 Ryan Nolan No comments yet

When working with real world data, you’ll often have to deal with categorical information. This can be a problem when working with Machine Learning models as most cannot use it. Instead, Data Scientists and Machine Learning engineers need to convert this into a numerical format. This is where the Ordinal Encoder in Scikit-Learn can help. […]

Posts pagination

1 2 Next

Search

Categories

  • LangChain 2
  • LeetCode 8
  • Python 8
  • Python Pandas 28
  • scikit-learn 11
  • Time Series 4
  • Uncategorized 2

Recent posts

  • Augmented Dickey–Fuller test
  • KPSS-test
  • Multicollinearity

Helping Data Professions further there careers

Important Links
  • Blog
  • Sponsorships
  • Mentorships
LinkedIn
  • Ryan Nolan
  • Matt Payne
Get in touch
  • ryannolandata@gmail.com

© Ryan & Matt Data Science

  • Terms & Conditions
  • Privacy Policy