Pandas Series

What is a Series in Python Pandas

In Python Pandas a series acts similar to a row or a column within a spreadsheet.  It’s a list of values that can contain multiple different datatypes.

This article is based on a YouTube video we have on our channel. If you want to watch the video it’s linked down below.

To start we’re going to import in Pandas and Numpy. Don’t worry if you aren’t familiar with numpy, we will be using it to generate null values a bit later in the tutorial.

  import pandas as pd import numpy as np

Example 1 - Create a Series From List

Let’s build out your first Pandas series. To do so, we will convert a list into a series. This is done by passing in the list to pd.series()

  points = [30, 25, 15, 10, 20] player_points_1 = pd.Series(points)

Example 2 - Create a series from list & assign index

Like dataframes, lists can also have an index. We can assign a custom index from the start. This replaces the integers that were present in the first example.

  player_points_2 = pd.Series( [30, 25, 15, 10, 20], # Points scored by each player in a match index=["Player 1", "Player 2", "Player 3", "Player 4", "Player 5"] # Player Names )

Example 3 - Dictionary

By utilizing a dictionary, we can have an index when creating a series. The first value (keys) are the index.

  data_dictionary = {'Player 1': 30, 'Player 2': 25, 'Player 3': 15, 'Player 4': 10, 'Player 5': 20} player_points_3 = pd.Series(data_dictionary)

Example 4 - Numpy Array

It’s also quite easy to convert a numpy array into a series.

  data_numpy = np.array([30, 25, 15, 10, 20]) series_numpy = pd.Series(data_numpy)

Example 5 - multiple data types

While the examples above used the same datatypes, we can have a mixture within a series. In this example we have an integer, string, float, and boolean.

  multiple_data_types = [10, 'Hello', 3.14, True] series_multiple_data_types = pd.Series(multiple_data_types)

Example 6 - Creating an Empty Pandas Series

If you want an empty series, don’t populate the parameter.

  empty_series = pd.Series()

Example 7 - Head

One of the ways we can quickly see data in a series is by using head. This will grab the first X values. By default, it’s 5, but you can specify a number.

  points_4 = [32, 18, 27, 12, 24, 24, 24, 29, 10, 19, 30, 22, 13, 28, 17, 25, 14, 27, 16, 26, 11, 20, 31, 27, 27, 19, 12, 24, 30, 14]
  player_points_4 = pd.Series(points_4)
  player_points_4.head()

Now let’s input in the value of 10

  player_points_4.head(10)

Example 8 - tail

Tail is the opposite of head. Instead of getting the top results, we get the bottom of the dataframe.Â

  player_points_4.tail()

Input in 10 to see the last 10 records.

  player_points_4.tail(10)

Example 9 - Accessing Elements in a Series (iloc)

If you want to grab rows you can use iloc. iloc allows you to use an integer to select based on position.Â

One thing to note, in Python the rows start at 0 so when the code says iloc[5] we are grabbing the 6th row.

  player_points_4.iloc[5]

Example 10 -Accessing Elements in a Series (loc)

Loc uses strings instead of integers.

  player_points_2['Player 1']

Example 11 - Add More Data to a series

If we want to add new data to a series, create a new index value and then assign the value we want to it.

  player_points_4[30] = 22
  player_points_4.tail()

Example 12 - number of value counts

Value counts allow you to see how often a value appears within your series.

  player_points_4.value_counts().head(5)

Example 13 - number non-NA/null observations in the Series

Count will tell us the amount of non null values in our series.

  player_points_4.count()

Example 14 - size (includes null values)

If you want to see the number of values including null ones use size.

  player_points_4.size

Example 15 - number of unique values

nunique shows us the number of unique values

  player_points_4.nunique()

Example 16 - Method returns boolean if values in the object are unique

If you want to see if every value is different use .is_unique

  player_points_4.is_unique

Example 17 - datatypes for a series

As shown earlier we can have a series with 1 or multiple data types. By using .dtype we can see what a series is.

  player_points_4.dtype
  series_multiple_data_types.dtype

Example 18 - datatypes conversion

You can also convert the datatypes within a series. Down below we change an int series to float.

  player_points_4 = player_points_4.astype("float64")
  player_points_4.dtype

Example 19 - Update Values dataframe

To update values across a series, we have to generate a new series.Â

  player_points_updated = player_points_4 + 5

Example 20 - sum two series dataframes

If instead of a static number, you wanted to sum up two different series, you can.

  runs1 = pd.Series([3, 5, 4])
  runs2 = pd.Series([11, 3, 4])
  runs_3 = runs1 + runs2

Example 21 - Descriptive statistics

It’s quite easy to grab statistics from a series. Down below we will cover some f the most basic ones.

  print("Mean:", player_points_4.mean())
  print("Median:", player_points_4.median())
  print("Standard Deviation:", player_points_4.std())
  print("Sum:", player_points_4.sum())
  print("Min:", player_points_4.min())
  print("Max:", player_points_4.max())

Example 22- Sort by index

We can also sort a pandas series by its index. By default it’s ascending, but there is also the option to put it in descending order.

  player_points_updated.sort_index(inplace=True)
  player_points_updated.sort_index(ascending=False, inplace=True)

Example 23 - Sort by Values

Instead of sorting by an index, we can sort by values. Again both ascending and descending is available.

  player_points_updated.sort_values(inplace=True)
  player_points_updated.sort_values(ascending=False, inplace=True)

Example 24 - Element Wise Comparrison

It’s quite easy to compare each element in a series to a value. When doing a comparison, the result will be either True or False.

  player_points_updated > 30

If you want to only see the values that meet the criteria, put the comparison inside .loc.

  player_points_updated.loc[player_points_updated > 30]

Example 25 - Check for Null Values

Looking for null values is simple. All we have to do is use .isna(). It’ll return true or false for each value in the series.

  null_series = pd.Series([25, 10, 5, np.nan, 8, 41, np.nan])
  null_series.isna()

Example 26 - Remove Null Values

To start we’re going to create a simple dataframe in python:

  null_series_removed = null_series.dropna()

Example 27 -Fill Null Values

There are quite a few ways to fill null values. In our first example we will just use the value of 0.

  null_series_filled = null_series.fillna(0)

Another common technique is to use the median or mean.

  null_series_filled_2 = null_series.fillna(df.median())

Example 28 - Apply

Apply allows you to create a new series based off of a function. While this example just showcases a custom one, you can also use a lambda and built in functions with libraries like numpy.

  def points_times_two(x): return x * 2
  player_points_updated_applied = player_points_updated.apply(points_times_two)

Example 29 - Turn dataframe column into a series

By using .squeeze() we can extract a series from a dataframe. This example will grab columns where the next will look at rows.

  score_dictionary = {'Player A': [10, 12, 23], 'Player B': [14, 16, 16], 'Player C': [17, 22, 29]}
  df = pd.DataFrame(score_dictionary)
  df.head(10)
  series_from_df_1 = df['Player A'].squeeze()
  series_from_df_2 = df['Player B'].squeeze()

Example 30 - Turn Dataframe row into series

.Squeeze() can be used again. In this case we are grabbing a row with iloc which we covered earlier.Â

  game1 = df.iloc[0].squeeze()

Example 31 - Create a DataFrame from two Series:

Since we grabbed two columns from a dataframe and turned them into a series, we can now combine them into a dataframe through concat.

  df_2 = pd.concat([series_from_df_1, series_from_df_2], axis=1)

Example 32 - Turn Series into list

One of the first examples was turning a list into a series. It only feels appropriate to end the tutorial by doing the opposite.

  series_from_df_1.to_list()

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.

Leave a Reply

Your email address will not be published. Required fields are marked *