Pandas Columns

July 19, 2025 Ryan Nolan No comments yet

Pandas Dataframes are composed of Rows and Columns. In this guide we are going to cover everything you need to know about working with columns.

The article is based on a tutorial we published on our YouTube channel. Feel free to check it out below.

Let’s start with importing in Pandas and NumPy.

				
					import pandas as pd
import numpy as np

Here we define a dataset containing records of eight runners.

Each entry includes:

The runners first and last name.

Thier finish times in hours for threee race types:

Marathon
50-miler
100-miler

The date of their last race and a running_clud_id showing which running club each runner belongs to.

				
					data = {
    'Runner': ['Alice', 'Bob', 'Charlie', 'Diana', 'Ethan', 'Fiona', 'George', 'Hannah'],
    'last_name': ['Morgan', 'Thompson', 'Reed', 'Santiago', 'Kim', 'Nguyen', 'Patel', 'Brooks'],
    'Marathon_Time_hrs': [3.5, 4.0, 3.2, 4.5, 3.8, 4.1, 3.9, 4.3],
    'Fifty_Miler_Time_hrs': [8.0, 9.5, 7.8, 10.2, 9.0, 10.0, 9.2, 10.5],
    'Hundred_Miler_Time_hrs': [20.5, 23.0, 18.7, 25.2, 21.5, 24.1, 22.3, 26.0],
    'last_race_date': ['04/27/2025', '05/24/2025', '12/13/2025', '04/27/2025', '05/24/2025', '12/13/2025', '04/27/2025', '05/24/2025'],
    'running_club_id': [123, 345, 123, 678, 342, 123, 345, 123],
}

Next, we create a tabular structure with pandas DataFrame.

We create this so we can perform data anaylsis and manipulation using pandas.

				
					df = pd.DataFrame(data)

				
					df.head()

Example 1 Access Column Version 1

Here we access the Runner column using df[‘Runner’]

				
					df['Runner']

Example 2 Access Column Version 2

We can also access the Runners column by using df.Runner

				
					df.Runner

Example 3 access multiple columns

Here we get two columns ‘Runner’ and ‘Marathon_Time_hrs’.

Note the double square bracket, we use it when we want to access multiple columns

				
					df[['Runner', 'Marathon_Time_hrs']]

Example 4 get all columns Version 1

We use .columns to see all the columns in the DataFrame.

				
					df.columns

unlike .columns whihc returns the index,

.columns.values returns a numpy array of the column names

				
					df.columns.values

Example 5 get all columns Version 2 - list

Using the list() metthod is a quick way to get a list of all column names.

				
					list(df)

				
					list(df.columns)

Example 6 get all columns Version 3 - Sorted

we use the sorted() method to sort the column names and return them as a list.

It does not chnage the DataFrame, it only gives us the sorted list of column headers.

				
					sorted(df)

				
					sorted(df.columns)

Example 6 get all columns Version 4 - Loop

We use a for loop to iterate over df.columns so that we can display each colum name individually.

				
					
for column_name in df.columns:
	print(column_name)

Example 7 get all columns Version 5 - keys

.keys() returns the DataFrame’s column labels,

.tolist() converts the keys into a regular Python list.

				
					df.keys().tolist()

Example 8 what if a dataframe has hundreds of columns - two ways to display max columns

Here we create a dataframe with 100 columns.

We use random.rand to generate a 2D array with the shape defined by num_rows and num_cols.

Each element is a random float between 0 and 1.

				
					num_rows = 10
num_cols = 100

				
					data = np.random.rand(num_rows, num_cols)

Here, we use list comprehension to generate names like ‘col_1’, ‘col_2’ up to the number of columns (num_cols).

				
					column_names = [f'col_{i+1}' for i in range(num_cols)]

Next, we create a new DataFrame using the random data and assigns the generated column names.

				
					df_100 = pd.DataFrame(data, columns=column_names)

				
					df_100

pd.set_option(‘display.max_columns’, None) tells pandas to show all columns when printing a DataFrame.

				
					pd.set_option('display.max_columns', None)

pd.options.display.max_columns = None sets the same option as pd.set_option('display.max_columns', None), ensuring no columns get hidden in the output view. It’s just a different syntax to control display settings.

				
					pd.options.display.max_columns = None

				
					df_100

Example 9 column types

This returns a Series listing the tyoe (e.g int, float or object) for every column.

				
					df.dtypes

Example 10 convert column types - date time

.to_datetime() transforms string dates into panda’s datetime format.

				
					df['last_race_date'] = pd.to_datetime(df['last_race_date'])

				
					df.dtypes

Example 11 convert column types - numeric - TO DO

Here, using .astype(‘float’) we convert interger IDs into floating-point numbers.

				
					df['running_club_id'] = df['running_club_id'].astype('float')

				
					df.dtypes

Example 12 convert column types - string

Here we chnage the data type of ‘runnung_clud_id’ to text(strings).

				
					df['running_club_id'] = df['running_club_id'].astype(str)

				
					df.dtypes

Example 13 Rename Column

Here we use .rename() to change the column name ‘Marathon_Tine_hrs’ to ‘262_Timr_hrs’.

The inplace=True updates te DataFrame directly without needing to assign it to new variables.

				
					df.rename(columns={'Marathon_Time_hrs': '262_Time_hrs'}, inplace=True)

Example 14 Create new column from numbers

Here, we add up the hours from different columns and create a new one called ‘total_times’.

				
					df['total_times'] = df['262_Time_hrs'] + df['Fifty_Miler_Time_hrs'] + df['Hundred_Miler_Time_hrs']

Example 15 create new column from strings

Here, we combine ‘Runner’ and ‘last_name’ columns with a space in between.

The full_name column stores the complete name of each person.

				
					df['full_name'] = df['Runner'] + ' ' + df['last_name']

Example 16 create new column from new data

				
					shirt_sizes = ['S', 'M', 'M', 'L', 'M', 'S', 'L', 'XL']

Here we create a new column called ‘race_shirt_size’ using the created shirt_sizes data

				
					df['race_shirt_size'] = shirt_sizes

Example 17 drop column

Here we delete the specified column from the DataFrame.

				
					df.drop(columns=['race_shirt_size'], inplace=True)

Example 18 reorder columns

Here we reorder and selec specific columns from the DataFrame.

				
					df = df[['full_name', 'running_club_id', 'Hundred_Miler_Time_hrs', 'Fifty_Miler_Time_hrs', '262_Time_hrs']]

Example 19 - Set Index

Here we use the .set_index() to make the full_name column the new row labels(index).

				
					df.set_index("full_name")

Example 20 set multiple columns as index

Here we set a multi-level index using both ‘full_name’ and ‘running_club_id’

				
					df = df.set_index(['full_name', 'running_club_id'])

Example 21 - reset index

Here we use .reset_index() to restore ‘full_name’ and ‘running_club_id’ from the index to normal columns, making the DataFrame flat again.

				
					df = df.reset_index()

df

Example 22 - Get only numeric columns

This extracts the names of all numeric columns in the DataFrame.

				
					numeric_columns = df._get_numeric_data().columns

				
					numeric_columns

Example 23 Describe

.describe() provide key stats like

mean, standard deviation, min, max and percentile.

				
					df.describe()

Example 24 Set Column Names post creation

Here we build a DataFrame from a list of lists where each inner list is a row.

				
					df2 = pd.DataFrame([['23:12', '22:11', '21:00', '20:30', '19:55', '18:55']])

df2

Next we assign custom column names to the DataFrame df2.

				
					df2.columns = ['Race_1','Race_2','Race_3','Race_4','Race_5','Race_6']

SECTION ON CLEANING UP COLUMNS Values

The data includes runners’ names, temperatures, times, distances, and locations. Some entries use np.nan to represent missing or unknown values

				
					data3 = {
    'Runner': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Temperature (C)': [20, 18, np.nan, 22, 21],
    'Time (min)': [25.5, 30.2, np.nan, 22.1, 28.4],
    'Distance (km)': [5.0, np.nan, 5.0, 4.5, 5.0],
    'Location': ['Park', 'Track', 'Trail', np.nan, 'Park'],
}

Using pd.DataFrame(data3) we organize the raw dictionary data into a tabular format with rows and columns,

				
					df3 = pd.DataFrame(data3)

Example 25 Missing Values single column

This returns a boolean Series where “True” marks rows with missing time values, helping identify incomplete data.

				
					df3['Time (min)'].isna()

Example 26 all columns that have a null value

df3.isna().any() checks which columns contain any NaNs, and using df3.columns[...] selects their names. Finally, .tolist() converts the result into a regular Python list.

				
					df3.columns[df3.isna().any()].tolist()

Example 27 Fill Value

fillna(30) replaces all NaN entries with the number 30.

				
					df3['Time (min)'] = df3['Time (min)'].fillna(30)

Example 27 Drop Rows will null value in column

f3.dropna(subset=['Distance (km)']) drops all rows with NaN in the 'Distance (km)' column, helping clean the data by keeping only entries with valid distances.

				
					df3_drop = df3.dropna(subset=['Distance (km)'])

Example 28 drop the column if null value

dropna(axis=1) drops columns with at least one NaN.

				
					df3_drop_column = df3_drop.dropna(axis=1)

				
					df3_drop_column

Example 29 Insert

				
					fahrenheit = df3['Temperature (C)'] * 9/5 + 32

Here, we insert the new fahrenheit column into the DataFrame

				
					df3.insert(2, 'Temperature (F)', fahrenheit)

df3

SECTION ON CLEANUP COLuMN NAMES

				
					df4 = pd.DataFrame({
    ' Runner Name ': ['Alex', 'Bri', 'Chris'],
    ' Total Time ': [45, 50, 42],
    'Average Pace': [5.0, 5.5, 4.8],
    '  Calories Burned  ': [300, 320, 310]
})

Example 30 remove spaces on the end

Here, we remove any leading or trailing spaces from column names using .strip() method.

				
					df4.columns = df4.columns.str.strip()

df4

				
					df4['Runner Name']

Example 31 remove spaces and replace with underscore

This replaces spaces in all column names with underscores (_), by using .replace()

				
					df4.columns = df4.columns.str.replace(' ', '_')

df4

Example 32 make lowercase

Here we use the .lower() method to make the columns lower case.

				
					df4.columns = df4.columns.str.lower()

Example 33 Filter Column

Here we filter the DataFrame by checking the column (avaerage_pace) thats greater than 5

				
					df4[df4['average_pace'] > 5]

Example 34 Filter Column AND

The & operator combines both conditions

This filters df4 to show rows where:

'calories_burned' is less than 350, and
'total_time' is greater than 43.

				
					df4[(df4['calories_burned'] < 350) & (df4['total_time'] > 43)]

Example 35 Filter Column OR

This filters df4 to show rows where either condition is true:

'calories_burned' is less than 350, or
'total_time' is greater than 43.

The | operator means “OR”, so rows meeting at least one of the conditions will be included in the result.

				
					df4[(df4['calories_burned'] < 350) | (df4['total_time'] > 43)]

Example 36 loc

df4.loc[0, 'total_time'] uses .loc[] to access data by label/index, returning a single cell’s value.

This retrieves the value from the 'total_time' column in the first row (index 0).

				
					df4.loc[0, 'total_time']

Example 37 loc

This selects all rows and the columns from 'total_time' to 'calories_burned', inclusive.

Using df4.loc[:, 'total_time':'calories_burned'] slices the DataFrame to return only those columns in that range, across all rows

				
					df4.loc[:, 'total_time':'calories_burned']

Example 38 loc

This selects all rows and only the 'total_time' and 'calories_burned' columns from df4.

df4.loc[:, ['total_time','calories_burned']] uses label-based indexing to extract specific columns across the entire dataset

				
					df4.loc[:, ['total_time','calories_burned']]

Example 39 iloc

This gets the value at the first row and first column of df4

df4.iloc[0, 0] uses integer-based indexing to return a single cell’s value

				
					df4.iloc[0, 0]

Example 40 iloc

This selects a specific range of rows and columns by position.

df4.iloc[0:2, 1:3] returns rows 0 and 1 (Python excludes the end index) and columns at positions 1 and 2.

				
					df4.iloc[0:2, 1:3]

Final Thoughts

Thank you for checking out this tutorial on Pandas Columns. Make sure to read our other Pandas content on the blog

Ryan Nolan

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.