Title Image - Pandas: Why and How to Use idxmin() and idxmax()

Pandas: Why and How to Use idxmin() and idxmax()

Yashmeet Singh · · 7 minute read

Today we’ll closely examine two pandas methods - idxmin() and idxmax().

You can use these methods to get the index labels for the minimum or maximum values of a Series or DataFrame.

We’ll explore these operations using practical examples. Let’s dive in!

The Example Dataset

Suppose you own a small bookstore and want to review the sales data for the past week.

Let’s create a pandas Series containing these sales numbers (US dollars). We’ll set the day as the index:

# Load pandas
import pandas as pd
 
# Day of the past week
days = [
  'Sunday', 'Monday', 'Tuesday', 'Wednesday',
  'Thursday', 'Friday', 'Saturday'
]
# and corresponding sales
sales = [987, 600, 320, 437, 520, 680, 1067]
 
store_sales = pd.Series(
    data=sales, # Sales numbers
    name='Sales', # Series name
    index=days # Use days as the index
)
store_sales
Sunday        987
Monday        600
Tuesday       320
Wednesday     437
Thursday      520
Friday        680
Saturday     1067
Name: Sales, dtype: int64

idxmin() - Index of the Minimum Value

Suppose you want to know - which day of the last week had the lowest sales amount?

Since we used the day as the index, we can translate this question to: What’s the index of the minimum value of the Series store_sales?

The method idxmin() will get you the answer:

# The series method idxmin() returns
# the index of the minimum value
store_sales.idxmin()
'Tuesday'

Thus the lowest sales of the week occurred on Tuesday. And what was that amount? Let’s get that using the method min():

# Get the lowest sales amount
store_sales.min()
320

idxmax() - Index of the Maximum Value

The Series method idxmax() returns the index of the maximum value. We’ll use it to find the day when the store generated the highest sales amount:

# The series method idxmax() returns
# the index of the maximum value
store_sales.idxmax()
'Saturday'

Let’s get the highest amount using the max() method:

store_sales.max()
1067

Thus the store had the best sales ($1067) on Saturday.

Repeated Min / Max Values

What if the dataset contains the minimum or maximum value more than once? How would the methods idxmin() or idxmax() behave? Let’s find out.

Suppose you calculate the average sales per weekday for the last year. And store them in a pandas Series:

# week day
days = [
  'Sunday', 'Monday', 'Tuesday', 'Wednesday',
  'Thursday', 'Friday', 'Saturday'
]
# Average sale for each day
sales = [1100, 550, 300, 450, 300, 700, 1100]
 
avg_sales = pd.Series(
    data=sales,
    name='Sales',
    index=days
)
# print as DataFrame for readability
avg_sales.to_frame()
Sales
Sunday 1100
Monday 550
Tuesday 300
Wednesday 450
Thursday 300
Friday 700
Saturday 1100

The minimum value, 300, occurs twice on Tuesday and Thursday. The maximum value, 1100, is repeated on Sunday and Saturday.

We expect the method idxmin() to return multiple index labels. But it only returns Tuesday, the row label of the first occurrence of the minimum value:

avg_sales.idxmin()
'Tuesday'

So idxmin() failed to get us all the days when the minimum value occurred. What’s the alternative?

We can use the boolean mask followed by the select operation. Let’s see this action.

First, we create a boolean mask that returns True when the data value equals the minimum value:

# Boolean mask -
# Entries that equal the minimum value
# will be set to True
avg_sales == avg_sales.min()
Sunday       False
Monday       False
Tuesday       True
Wednesday    False
Thursday      True
Friday       False
Saturday     False
Name: Sales, dtype: bool

The mask is set to True for both Tuesday and Thursday. Next, use this mask to select the True entries:

avg_sales[avg_sales == avg_sales.min()]
Tuesday     300
Thursday    300
Name: Sales, dtype: int64

We got the expected answer! Both Tuesday and Thursday had minimum sales of $300.

You can also extract the index labels (days) from the above query using the index attribute:

avg_sales[avg_sales == avg_sales.min()].index.to_list()
['Tuesday', 'Thursday']

Let’s use this technique to extract the days of maximum average sales:

avg_sales[avg_sales == avg_sales.max()].index.tolist()
['Sunday', 'Saturday']

idxmin() & idxmax() for DataFrames

The method idxmin() provides additional functionality for DataFrames. You can use it to get the index label of the minimum value across rows or columns.

Let me illustrate with an example.

Suppose you have employed three sales agents in your bookstore - Ava, Maya, and Priya. Let’s look at their average sales per day. We’ll store this data in a DataFrame:

days = [
  'Sunday', 'Monday', 'Tuesday', 'Wednesday',
  'Thursday', 'Friday', 'Saturday'
]
# Sales per day for each agent
sales = {
    'Ava': [ 168,  186,  107,  155,  152,  212, 175],
    'Maya': [ 200,  168, 106,  173,  164,  188, 184],
    'Priya': [ 176,  172,  158,  164,  91,  198, 234]
}
 
# Load in a DataFrame
agent_sales = pd.DataFrame(
    data=sales,
    index=pd.Index(days, name='Days')
)
agent_sales
Ava Maya Priya
Days
Sunday 168 200 176
Monday 186 168 172
Tuesday 107 106 158
Wednesday 155 173 164
Thursday 152 164 91
Friday 212 188 198
Saturday 175 184 234

Suppose you want to know: Get me the day when each agent generated the minimum sales?

You can use the DataFrame idxmin() method to get this information.

By default, it gets the minimum value across the row index. That is, it returns the row index (day) of the minimum value (sales) for each column (agent):

agent_sales.idxmin()
Ava       Tuesday
Maya      Tuesday
Priya    Thursday
dtype: object

However, what if you ask a different question: Which sales agent had the minimum sales on each day?

We’ll need to use idxmin() with the axis parameter. When you set the axis to 'columns', idxmin() finds the index of minimum value across columns.

Thus, the below code will return the agent (column index) with minimum sales for each day (row):

# Index of the minimum values across columns
agent_sales.idxmin(axis='columns')
Days
Sunday         Ava
Monday        Maya
Tuesday       Maya
Wednesday      Ava
Thursday     Priya
Friday        Maya
Saturday       Ava
dtype: object

We can perform similar operations to get the index of the maximum values.

The default idxmax() call returns the index of the maximum value across rows. Thus below code finds the day (row index) of the maximum sales for each agent (column):

# DataFrame idxmax() call
# Find the index of the maximum value across rows
agent_sales.idxmax()
Ava        Friday
Maya       Sunday
Priya    Saturday
dtype: object

When you set the axis parameter to 'columns', idxmax() finds the index of maximum value across columns.

Thus it returns the agent (column index) with the maximum sales for each day (row index):

agent_sales.idxmax(axis='columns')
Days
Sunday        Maya
Monday         Ava
Tuesday      Priya
Wednesday     Maya
Thursday      Maya
Friday         Ava
Saturday     Priya
dtype: object

Summary

This quick tutorial showed you how to use pandas methods - idxmin() and idxmax().

You can use them to get the index of the minimum or maximum value of a pandas Series.

Things get interesting with DataFrames. The idxmin() and idxmax() methods allow you to work across the row or column index. You can use the parameter axis to control that behavior.

The tutorial included plenty of examples. So you should feel confident using these methods on your own.

Title Image by Pexels