Blog - Ryan & Matt Data Science

Sklearn Gaussian Mixture Models

April 3, 2025 Ryan Nolan No comments yet

In Scikit-Learn Gaussian Mixture Models allow you to represent clusters of data into multiple normal distributions. This tutorial will walk you through two different examples of utilizing GMMs. We will go through one with generated blobs and another with baseball card values. If you want to watch a video based around the tutorial, we have […]

scikit-learn

Sklearn Support Vector Machine

April 2, 2025 admin No comments yet

A popular supervised classification algorithm used within scikit-learn is Support Vector Machine (SVM) SVM works by finding a hyperplane (a decision boundary) that separates data points from different classes. It does this by maximizing the distance (margin) between the hyperplane and the nearest data points from each class, which are called support vectors If you […]

scikit-learn

SKLearn Naive Bayes

April 2, 2025 Ryan Nolan No comments yet

In this Machine Learning lesson we are going to explore the Naive Bayes. An algorithm that is commonly used within Sci-kit learn. It’s a quite simple algorithm to use and we will practice using it on a dataset predicting if a concert sells out. Before we jump into this lesson though, I did want to […]

scikit-learn

SKlearn Multiple Linear Regressions

April 2, 2025 Ryan Nolan No comments yet

This beginner Scikit-learn lesson will cover multiple linear regressions. This is when there is a relationship between the target (dependent variable) and two or more features (independent variables). If you’d like to watch a video that this tutorial is based on, it’s embedded down below. https://youtu.be/R2Zb5s_RrDU Let’s start by importing in everything we need for […]

scikit-learn

Scikit-learn Pipelines

April 1, 2025 admin No comments yet

import pandas as pd import numpy as np import joblib from sklearn.model_selection import train_test_split from sklearn.impute import SimpleImputer from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.pipeline import make_pipeline, Pipeline from sklearn.preprocessing import StandardScaler, OneHotEncoder from sklearn.compose import ColumnTransformer d1 = {‘Social_media_followers’:[1000000, np.nan, 2000000, 1310000, 1700000, np.nan, 4100000, 1600000, 2200000, 1000000], ‘Sold_out’:[1,0,0,1,0,0,0,1,0,1]} df1 = […]

scikit-learn

Train Test Split

April 1, 2025 Ryan Nolan No comments yet

Train Test Split is an important concept that future Data Scientists or Machine Learning Engineers need to pick up early on. When building models, you’ll want to split your data into two different sets. One for training a model, and one for testing a model. This article is based on the popular YouTube video on […]

Time Series

PACF Partial Autocorrelation Function

April 1, 2025 Ryan Nolan No comments yet

In this Data Science article, we are going to take a look at the Partial Autocorrelation Function (PACF). We will go over the background and then look at plotting both non stationary and stationary data. If you want to watch a video based around this tutorial, it is embedded below. https://youtu.be/XstPVx78yi8 PACF Background The PACF […]

Time Series

ACF Autocorrelation Function

April 1, 2025 Ryan Nolan No comments yet

In this Data Science lesson we are going to take a look the Autocorrelation Function. Often abbreviated as ACF it can let us know if our data is stationary or not. We will go over some of the background behind it and plot it with the help of Python. If you want to network with […]

Python Pandas

Pandas Series

March 28, 2025 Ryan Nolan 2 comments

What is a Series in Python Pandas In Python’s Pandas library, one of the foundational data structures you’ll encounter is the Series. At first glance, a Series may seem simple—much like a single column or row in a spreadsheet. However, there’s more to it than meets the eye. A Pandas Series is a one-dimensional labeled […]

Python Pandas

Pandas Index

March 23, 2025 Ryan Nolan 1 comment

An index within Python Pandas is a way to identify a specific row within a dataframe. In this lesson we will be going over string and integer indexes as well as multindexes. If you want to watch a video based on the tutorial, it is linked down below. https://youtu.be/eEXju_yrxpM Indexes vs Indices Often you’ll hear […]

Sklearn Gaussian Mixture Models

Sklearn Support Vector Machine

SKLearn Naive Bayes

SKlearn Multiple Linear Regressions

Scikit-learn Pipelines

Train Test Split

PACF Partial Autocorrelation Function

ACF Autocorrelation Function

Pandas Series

Pandas Index

Important Links

LinkedIn

Get in touch