#import pandas as pd import numpy as np import matplotlib.pyplot as plt from statsmodels.tsa.stattools import adfuller # Generate synthetic stationary and non-stationary data np.random.seed(17) # Stationary data: White noise stationary_data = np.random.normal(size=82) # Non-stationary data: Random walk non_stationary_data = np.cumsum(np.random.normal(size=82)) # Plot the data plt.figure(figsize=(10,5)) plt.subplot(1, 2, 1) plt.plot(stationary_data) plt.title(‘Stationary Data’) plt.subplot(1, 2, 2) plt.plot(non_stationary_data) […]
KPSS-test
#import pandas as pd import numpy as np import matplotlib.pyplot as plt from statsmodels.tsa.stattools import kpss # Generate synthetic stationary and non-stationary data np.random.seed(17) # Stationary data: White noise stationary_data = np.random.normal(size=100) # Create a random walk with larger step size to make it more volatile random_walk = np.cumsum(np.random.normal(scale=2, size=n)) # Increase the scale for […]
Multicollinearity
dividing the total number of bases a player records by their total number of at-batsmaybe replace this with something else? CORRELATION MATRIX VIF Instead of using raw height, you might normalize or categorize height into bins, which could reduce the numerical interdependence.Calculate Condition Index (CI) How to address MulticollinearityDrop a Feature (At Bats) look at […]
Pandas Sample
To start we’re going to create a simple dataframe in python: https://youtu.be/REhRhRUcluI Example 1 – if else state location To start we’re going to create a simple dataframe in python: #DataFrame.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, ignore_index=False) import pandas as pd import random import string import numpy as np Prep The Dataframe # Function to […]
Simple Exponential Smoothing
import pandas as pd import numpy as np import matplotlib.pyplot as plt from statsmodels.tsa.api import SimpleExpSmoothing df = pd.read_csv(‘/content/all_stocks_5yr.csv’) apple_df = df[df[“Name”] == “AAPL”].copy() apple_df[“date”] = pd.to_datetime(apple_df[“date”]) apple_df.sort_values(“date”, inplace=True) apple_df.set_index(“date”, inplace=True) apple_df = apple_df.asfreq(‘B’) apple_df[“close”] = apple_df[“close”].interpolate() apple_close = apple_df[“close”] plt.figure(figsize=(10, 4)) plt.plot(apple_close, label=”Apple Closing Price”, color=”black”) plt.title(“Apple Stock Closing Prices”) plt.xlabel(“Date”) plt.ylabel(“Price”) plt.legend() plt.grid(True) […]
Pandas MultiIndex
https://www.youtube.com/watch?v=XHOmBV4js_Emaybe other ideas#https://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.html#https://pandas.pydata.org/pandas-docs/version/1.2.1/user_guide/advanced.html #https://jessicastringham.net/2019/12/10/multiindex/ #You have now created a multi-index, or hierarchical index (become comfortable with both these terms as you’ll find them used interchangeably)#It may be important to address that despite being able to convert the contents of more than one column into index, we cannot consider that now one row has several indexes. […]
Pandas Replace
import pandas as pd import numpy as np #Similiar to map # .replace() is not operating on the contents of the DataFrame as strings—it’s trying to match the entire value data = { ‘Player’: [‘Barry Bonds’, ‘Hank Aaron’, ‘Babe Ruth’, ‘Alex Rodriguez’, ‘Albert Pujols’, ‘Willie Mays’, ‘Ken Griffey Jr.’], ‘HR’: [762, 755, 714, 696, 703, […]
Pandas Where
https://www.youtube.com/watch?v=Y7HMkDuR_DA&feature=youtu.be The where() function in Pandas is used to replace values in a DataFrame or Series where a condition is not met. It is used to check a data frame for one or more conditions and return the result. To start with we import pandas and numpy import pandas as pd import numpy as np […]
Pandas Mask
To start we’re going to create a simple dataframe in python: https://www.youtube.com/watch?v=XHOmBV4js_E Prep the Data To start we’re going to create a simple dataframe in python: import pandas as pd import numpy as np df = pd.DataFrame({ ‘Hourly_Salary’: [‘500.00’, ‘10000.00’, ‘200.00’, ‘20.00’, np.nan] }) df[‘Hourly_Salary’] = pd.to_numeric(df[‘Hourly_Salary’]) Example 1 – if else state location To […]
Pandas Interpolation
To start we’re going to create a simple dataframe in python: https://youtu.be/BJHwPeRvyPE?si=lvsDqXBjb0mcC4ae import pandas as pd import numpy as np data = { ‘day’: pd.date_range(start=’2025-04-19′, periods=7), ‘temperature’: [np.nan, 30, np.nan, np.nan, 45, 40, np.nan] } df = pd.DataFrame(data) df2 = df.copy() df3 = df.copy() df4 = df.copy() Example 1 To start we’re going to create […]