July, 2025 - Ryan & Matt Data Science

Pandas Resample

July 19, 2025 Ryan Nolan No comments yet

The .resample() method in pandas works similarly to .groupby(), but it is specifically designed for time-series data. It groups data into defined time intervals and then applies one or more functions to each group. This method is useful for both upsampling—where missing data points can be filled or interpolated—and downsampling, which involves aggregating data over […]

Python Pandas

Python Pandas JSON

July 19, 2025 Ryan Nolan 1 comment

JSON (JavaScript Object Notation) is a lightweight, human-readable data interchange format that is widely used for both data storage and transfer. It is structured using key-value pairs and supports various data types, including strings, numbers, booleans, arrays, and nested objects. JSON is a standard format commonly used in APIs and web data, which makes it […]

Web Scraping

beautifulsoup pagination

July 19, 2025 Ryan Nolan No comments yet

import requests – Allows us to make HTTP requests to web pages. from bs4 import BeautifulSoup –It is used to parse and extract data from HTML content. import pandas as pd – It is used for organizing and manipulating data in table format. import re – It enables pattern matching using regular expressions. from time […]

scikit-learn

adaboost classifier

July 12, 2025 Ryan Nolan No comments yet

Adaptive Boosting, or AdaBoost, is a boosting algorithm that combines multiple low-accuracy (weak) models to form a single high-accuracy (strong) model. It works by sequentially training these weak learners, each one focusing more on the errors made by the previous ones. Any machine learning algorithm that supports weighted training samples—such as Decision Trees, Logistic Regression, […]

Python

Gradient boosting classifier

July 12, 2025 Ryan Nolan No comments yet

Gradient Boosting is an ensemble technique that builds a strong model by combining multiple weak decision trees. While it may seem similar to a Random Forest, there’s a key difference: in Random Forests, each tree is built independently, whereas in Gradient Boosting, trees are built sequentially, with each new tree correcting the errors of the […]

scikit-learn

Kaggle House price prediction Regression Analysis

July 12, 2025 Ryan Nolan No comments yet

train_df = train_df.drop(columns=[‘PoolQC’, ‘MiscFeature’, ‘Alley’, ‘Fence’, ‘GarageYrBlt’, ‘GarageCond’, ‘BsmtFinType2’]) test_df = test_df.drop(columns=[‘PoolQC’, ‘MiscFeature’, ‘Alley’, ‘Fence’, ‘GarageYrBlt’, ‘GarageCond’, ‘BsmtFinType2’]) #drop GarageArea or GarageCars #build models

scikit-learn

kaggle titanic tutorial

July 12, 2025 Ryan Nolan No comments yet

#military – Capt, Col, Major #noble – Jonkheer, the Countess, Don, Lady, Sir #unmaried Female – Mlle, Ms, Mme #NEW Drop Sibsp, Parch, TicketNumberCounts #OLD #X = train_df.drop([‘Survived’], axis=1) #y = train_df[‘Survived’] #X_test = test_df.drop([‘Age_Cut’, ‘Fare_Cut’], axis=1)

Statistics

python variance and standard deviation

July 6, 2025 Ryan Nolan No comments yet

https://youtu.be/p4H2b2x_nWc#population and sample variance/std deviationVariance measures how far each data point in the set is from the mean andthus from every other point in the set. It is the average of the squared differences from the mean.Population variance is calculated when you have data for the entire population.It gives a measure of the dispersion of […]

LangChain

FAISS LangChain

July 5, 2025 Pere No comments yet

FAISS (Facebook AI Similarity Search) is a vector library developed by Facebook that is used to store and search embeddings efficiently. It is particularly useful for tasks like question answering within documents, where you need to retrieve relevant parts of the content based on semantic similarity. By converting text into embeddings, FAISS allows you to […]

scikit-learn

hyperparameter tuning with scikit learn

July 5, 2025 Ryan Nolan No comments yet

We would be looking at tuning hyperparameters with Scikit-Learn. Scikit-Learn is a powerful machine learning library for Python. It provides simple , efficient tools for data analysis and modeling. Hyperparameter tuning is the process of finding the best values for the settings of a machine learning model that are not learned from data, but set […]

Pandas Resample

Python Pandas JSON

beautifulsoup pagination

adaboost classifier

Gradient boosting classifier

Kaggle House price prediction Regression Analysis

kaggle titanic tutorial

python variance and standard deviation

FAISS LangChain

hyperparameter tuning with scikit learn

Important Links

LinkedIn

Get in touch