Pandas Columns
import pandas as pd import numpy as np
Here we define a dataset containing records of eight runners.
Each entry includes:
The runners first and last name.
Thier finish times in hours for threee race types:
- Marathon
- 50-miler
- 100-miler
The date of their last race and a running_clud_id showing which running club each runner belongs to.
data = { 'Runner': ['Alice', 'Bob', 'Charlie', 'Diana', 'Ethan', 'Fiona', 'George', 'Hannah'], 'last_name': ['Morgan', 'Thompson', 'Reed', 'Santiago', 'Kim', 'Nguyen', 'Patel', 'Brooks'], 'Marathon_Time_hrs': [3.5, 4.0, 3.2, 4.5, 3.8, 4.1, 3.9, 4.3], 'Fifty_Miler_Time_hrs': [8.0, 9.5, 7.8, 10.2, 9.0, 10.0, 9.2, 10.5], 'Hundred_Miler_Time_hrs': [20.5, 23.0, 18.7, 25.2, 21.5, 24.1, 22.3, 26.0], 'last_race_date': ['04/27/2025', '05/24/2025', '12/13/2025', '04/27/2025', '05/24/2025', '12/13/2025', '04/27/2025', '05/24/2025'], 'running_club_id': [123, 345, 123, 678, 342, 123, 345, 123], }
Next, we create a tabular structure with pandas DataFrame.
We create this so we can perform data anaylsis and manipulation using pandas.
Â
df = pd.DataFrame(data)
df.head()

Example 1 Access Column Version 1
Here we access the Runner column using df[‘Runner’]
df['Runner']

Example 2 Access Column Version 2
We can also access the Runners column by using df.Runner
df.Runner

Example 3 access multiple columns
Here we get two columns ‘Runner’ and ‘Marathon_Time_hrs’.
Note the double square bracket, we use it when we want to access multiple columns
df[['Runner', 'Marathon_Time_hrs']]

Example 4 get all columns Version 1
We use .columns to see all the columns in the DataFrame.
df.columns

unlike .columns whihc returns the index,Â
.columns.values returns a numpy array of the column names
df.columns.values

Example 5 get all columns Version 2 - list
Using the list() metthod is a quick way to get a list of all column names.
list(df)

list(df.columns)

Example 6 get all columns Version 3 - Sorted
we use the sorted() method to sort the column names and return them as a list.
It does not chnage the DataFrame, it only gives us the sorted list of column headers.
sorted(df)

sorted(df.columns)

Example 6 get all columns Version 4 - Loop
We use a for loop to iterate over df.columns so that we can display each colum name individually.
for column_name in df.columns: print(column_name)

Example 7 get all columns Version 5 - keys
.keys() returns the DataFrame’s column labels,
.tolist() converts the keys into a regular Python list.
df.keys().tolist()

Example 8 what if a dataframe has hundreds of columns - two ways to display max columns
Here we create a dataframe with 100 columns.
We use random.rand to generate a 2D array with the shape defined by num_rows and num_cols.
Each element is a random float between 0 and 1.
num_rows = 10 num_cols = 100
data = np.random.rand(num_rows, num_cols)
Here, we use list comprehension to generate names like ‘col_1’, ‘col_2’ up to the number of columns (num_cols).
column_names = [f'col_{i+1}' for i in range(num_cols)]
Next, we create a new DataFrame using the random data and assigns the generated column names.
df_100 = pd.DataFrame(data, columns=column_names)
df_100

pd.set_option('display.max_columns', None)
pd.options.display.max_columns = None
sets the same option as pd.set_option('display.max_columns', None)
, ensuring no columns get hidden in the output view. It’s just a different syntax to control display settings.
pd.options.display.max_columns = None
df_100

Example 9 column types
This returns a Series listing the tyoe (e.g int, float or object) for every column.Â
df.dtypes

Example 10 convert column types - date time
.to_datetime() transforms string dates into panda’s datetime format.
df['last_race_date'] = pd.to_datetime(df['last_race_date'])
df.dtypes

Example 11 convert column types - numeric - TO DO
Here, using .astyoe(‘float’) we convert interger IDs into floating-point numbers.
df['running_club_id'] = df['running_club_id'].astype('float')
df.dtypes

Example 12 convert column types - string
Here we chnage the data type of ‘runnung_clud_id’ to text(strings).
df['running_club_id'] = df['running_club_id'].astype(str)
df.dtypes

Example 13 Rename Column
Here we use .rename() to change the column name ‘Marathon_Tine_hrs’ to ‘262_Timr_hrs’.
The inplace=True updates te DataFrame directly without needing to assign it to new variables.
df.rename(columns={'Marathon_Time_hrs': '262_Time_hrs'}, inplace=True)
Example 14 Create new column from numbers
Here, we add up the hours from different columns and create a new one called ‘total_times’.
df['total_times'] = df['262_Time_hrs'] + df['Fifty_Miler_Time_hrs'] + df['Hundred_Miler_Time_hrs']
Example 15 create new column from strings
Here, we combine ‘Runner’ and ‘last_name’ columns with a space in between.
The full_name column stores the complete name of each person.
df['full_name'] = df['Runner'] + ' ' + df['last_name']
Example 16 create new column from new data
shirt_sizes = ['S', 'M', 'M', 'L', 'M', 'S', 'L', 'XL']
Here we create a new column called ‘race_shirt_size’ using the created shirt_sizes data
df['race_shirt_size'] = shirt_sizes
Example 17 drop column
Here we delete the specified column from the DataFrame.
df.drop(columns=['race_shirt_size'], inplace=True)
Example 18 reorder columns
Here we reorder and selec specific columns from the DataFrame.
df = df[['full_name', 'running_club_id', 'Hundred_Miler_Time_hrs', 'Fifty_Miler_Time_hrs', '262_Time_hrs']]
Example 19 - Set Index
Here we use the .set_index() to make the full_name column the new row labels(index).
df.set_index("full_name")

Example 20 set multiple columns as index
Here we set a multi-level index using both ‘full_name’ and ‘running_club_id’
df = df.set_index(['full_name', 'running_club_id'])
Example 21 - reset index
Here we use .reset_index() to restore ‘full_name’ and ‘running_club_id’ from the index to normal columns, making the DataFrame flat again.
df = df.reset_index()
df

Example 22 - Get only numeric columns
This extracts the names of all numeric columns in the DataFrame.
numeric_columns = df._get_numeric_data().columns
numeric_columns

Example 23 Describe
.describe() provide key stats likeÂ
mean, standard deviation, min, max and percentile.
df.describe()

Example 24 Set Column Names post creation
Here we build a DataFrame from a list of lists where each inner list is a row.
df2 = pd.DataFrame([['23:12', '22:11', '21:00', '20:30', '19:55', '18:55']])
df2

Next we assign custom column names to the DataFrame df2.
df2.columns = ['Race_1','Race_2','Race_3','Race_4','Race_5','Race_6']
SECTION ON CLEANING UP COLUMNS Values
The data includes runners’ names, temperatures, times, distances, and locations. Some entries use np.nan
to represent missing or unknown values
data3 = { 'Runner': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'], 'Temperature (C)': [20, 18, np.nan, 22, 21], 'Time (min)': [25.5, 30.2, np.nan, 22.1, 28.4], 'Distance (km)': [5.0, np.nan, 5.0, 4.5, 5.0], 'Location': ['Park', 'Track', 'Trail', np.nan, 'Park'], }
Using pd.DataFrame(data3)
we organize the raw dictionary data into a tabular format with rows and columns,
df3 = pd.DataFrame(data3)
Example 25 Missing Values single column
This returns a boolean Series where “True” marks rows with missing time values, helping identify incomplete data.
df3['Time (min)'].isna()

Example 26 all columns that have a null value
df3.isna().any()
checks which columns contain any NaNs, and using df3.columns[...]
selects their names. Finally, .tolist()
converts the result into a regular Python list.
df3.columns[df3.isna().any()].tolist()

Example 27 Fill Value
fillna(30)
replaces all NaN
entries with the number 30.
df3['Time (min)'] = df3['Time (min)'].fillna(30)

Example 27 Drop Rows will null value in column
f3.dropna(subset=['Distance (km)'])
drops all rows with NaN
in the 'Distance (km)'
column, helping clean the data by keeping only entries with valid distances.
df3_drop = df3.dropna(subset=['Distance (km)'])

Example 28 drop the column if null value
dropna(axis=1)
drops columns with at least one NaN
.
df3_drop_column = df3_drop.dropna(axis=1)
df3_drop_column

Example 29 Insert
fahrenheit = df3['Temperature (C)'] * 9/5 + 32
Here, we insert the new fahrenheit column into the DataFrame
df3.insert(2, 'Temperature (F)', fahrenheit)
df3

SECTION ON CLEANUP COLuMN NAMES
df4 = pd.DataFrame({ ' Runner Name ': ['Alex', 'Bri', 'Chris'], ' Total Time ': [45, 50, 42], 'Average Pace': [5.0, 5.5, 4.8], ' Calories Burned ': [300, 320, 310] })

Example 30 remove spaces on the end
Here, we remove any leading or trailing spaces from column names using .strip() method.
df4.columns = df4.columns.str.strip()
df4

df4['Runner Name']

Example 31 remove spaces and replace with underscore
This replaces spaces in all column names with underscores (_
), by using .replace()
df4.columns = df4.columns.str.replace(' ', '_')
df4

Example 32 make lowercase
Here we use the .lower() method to make the columns lower case.
df4.columns = df4.columns.str.lower()

Example 33 Filter Column
Here we filter the DataFrame by checking the column (avaerage_pace) thats greater than 5
df4[df4['average_pace'] > 5]

Example 34 Filter Column AND
The &
operator combines both conditions
This filters df4
to show rows where:
'calories_burned'
is less than 350, and'total_time'
is greater than 43.
df4[(df4['calories_burned'] < 350) & (df4['total_time'] > 43)]

Example 35 Filter Column OR
This filters df4
to show rows where either condition is true:
'calories_burned'
is less than 350, or'total_time'
is greater than 43.
The |
operator means “OR”, so rows meeting at least one of the conditions will be included in the result.
df4[(df4['calories_burned'] < 350) | (df4['total_time'] > 43)]

Example 36 loc
df4.loc[0, 'total_time']
uses .loc[]
to access data by label/index, returning a single cell’s value.
This retrieves the value from the 'total_time'
column in the first row (index 0).
df4.loc[0, 'total_time']
Example 37 loc
This selects all rows and the columns from 'total_time'
to 'calories_burned'
, inclusive.
Using df4.loc[:, 'total_time':'calories_burned']
slices the DataFrame to return only those columns in that range, across all rows
df4.loc[:, 'total_time':'calories_burned']

Example 38 loc
This selects all rows and only the 'total_time'
and 'calories_burned'
columns from df4
.
df4.loc[:, ['total_time','calories_burned']]
uses label-based indexing to extract specific columns across the entire dataset
df4.loc[:, ['total_time','calories_burned']]

Example 39 iloc
This gets the value at the first row and first column of df4
df4.iloc[0, 0]
uses integer-based indexing to return a single cell’s value
df4.iloc[0, 0]

Example 40 iloc
This selects a specific range of rows and columns by position.
df4.iloc[0:2, 1:3]
returns rows 0 and 1 (Python excludes the end index) and columns at positions 1 and 2.
df4.iloc[0:2, 1:3]

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.