Skip to content
  • Blog
  • YouTube
  • Skool Community
  • Services
    • Streamlit Developer for Hire
    • Hire n8n automation engineer
Data Consulting
Mentorships

Blog

  • Home
  • Blog
  • Page 10
scikit-learn

adaboost classifier

July 12, 2025 Ryan Nolan No comments yet

Adaptive Boosting, or AdaBoost, is a boosting algorithm that combines multiple low-accuracy (weak) models to form a single high-accuracy (strong) model. It works by sequentially training these weak learners, each one focusing more on the errors made by the previous ones. Any machine learning algorithm that supports weighted training samples—such as Decision Trees, Logistic Regression, […]

Python

Gradient boosting classifier

July 12, 2025 Ryan Nolan No comments yet

Gradient Boosting is an ensemble technique that builds a strong model by combining multiple weak decision trees. While it may seem similar to a Random Forest, there’s a key difference: in Random Forests, each tree is built independently, whereas in Gradient Boosting, trees are built sequentially, with each new tree correcting the errors of the […]

scikit-learn

Kaggle House price prediction Regression Analysis

July 12, 2025 Ryan Nolan No comments yet

train_df = train_df.drop(columns=[‘PoolQC’, ‘MiscFeature’, ‘Alley’, ‘Fence’, ‘GarageYrBlt’, ‘GarageCond’, ‘BsmtFinType2’]) test_df = test_df.drop(columns=[‘PoolQC’, ‘MiscFeature’, ‘Alley’, ‘Fence’, ‘GarageYrBlt’, ‘GarageCond’, ‘BsmtFinType2’]) #drop GarageArea or GarageCars #build models

scikit-learn

kaggle titanic tutorial

July 12, 2025 Ryan Nolan No comments yet

#military – Capt, Col, Major #noble – Jonkheer, the Countess, Don, Lady, Sir #unmaried Female – Mlle, Ms, Mme #NEW Drop Sibsp, Parch, TicketNumberCounts #OLD #X = train_df.drop([‘Survived’], axis=1) #y = train_df[‘Survived’] #X_test = test_df.drop([‘Age_Cut’, ‘Fare_Cut’], axis=1)

Statistics

python variance and standard deviation

July 6, 2025 Ryan Nolan No comments yet

https://youtu.be/p4H2b2x_nWc#population and sample variance/std deviationVariance measures how far each data point in the set is from the mean andthus from every other point in the set. It is the average of the squared differences from the mean.Population variance is calculated when you have data for the entire population.It gives a measure of the dispersion of […]

LangChain

FAISS LangChain

July 5, 2025 Pere No comments yet

FAISS (Facebook AI Similarity Search) is a vector library developed by Facebook that is used to store and search embeddings efficiently. It is particularly useful for tasks like question answering within documents, where you need to retrieve relevant parts of the content based on semantic similarity. By converting text into embeddings, FAISS allows you to […]

scikit-learn

hyperparameter tuning with scikit learn

July 5, 2025 Ryan Nolan No comments yet

We would be looking at tuning hyperparameters with Scikit-Learn. Scikit-Learn is a powerful machine learning library for Python. It provides simple , efficient tools for data analysis and modeling. Hyperparameter tuning is the process of finding the best values for the settings of a machine learning model that are not learned from data, but set […]

scikit-learn

principal component analysis scikit learn

July 5, 2025 Ryan Nolan No comments yet

PCA (Principal Component Analysis) in Python using Scikit-learn is a technique used to reduce the number of features in a dataset while preserving most of the variance (information). It works by: Finding new axes (principal components) that capture the most variance. Projecting the data onto these fewer dimensions. It’s useful for visualization, speeding up models, […]

Python

Reflexion Prompting

July 3, 2025 Ryan Nolan No comments yet

This technique is highly effective for chatbots and problem-solving tasks. It also helps reduce hallucinations by incorporating a form of quality control. The process involves: Starting with an initial prompt Getting the AI’s first response Sending a reflexion prompt asking the AI to review and reflect on its first answer Receiving an optimized response, improved […]

Python Pandas

Python Pandas Lambda Function

July 3, 2025 Ryan Nolan No comments yet

Lambda functions in Python are small, anonymous functions defined using the lambda keyword.  They are typically used for short, throwaway functions that are needed for a brief period, such as within map(), filter(), or sorted() calls.  A lambda can take any number of arguments but only one expression, which is evaluated and returned. For example, […]

Posts pagination

Previous 1 … 9 10 11 … 24 Next

Search

Categories

  • LangChain (3)
  • LeetCode (8)
  • Linear Algebra (7)
  • N8N (37)
  • Python (31)
  • Python Pandas (36)
  • Sbert (1)
  • scikit-learn (29)
  • Statistics (24)
  • Streamlit (43)
  • Time Series (6)
  • Uncategorized (4)
  • Web Scraping (5)

Recent posts

  • n8n RAG Embeddings with OpenAI
  • n8n RAG Text Splitters
  • n8n Binary Data

Helping Data Professions further there careers

Important Links
  • Blog
  • Sponsorships
  • Mentorships
  • Data Freelancing
LinkedIn
  • Ryan Nolan
  • Matt Payne
Get in touch
  • ryannolandata@gmail.com

© Ryan & Matt Data Science

  • Terms & Conditions
  • Privacy Policy