Pandas: Why and How to Use idxmin() and idxmax()
Today we’ll closely examine two pandas methods - idxmin() and idxmax().
You can use these methods to get the index labels for the minimum or maximum values of a Series or DataFrame.
We’ll explore these operations using practical examples. Let’s dive in!
The Example Dataset
Suppose you own a small bookstore and want to review the sales data for the past week.
Let’s create a pandas Series containing these sales numbers (US dollars). We’ll set the day as the index:
# Load pandas
import pandas as pd
# Day of the past week
days = [
'Sunday', 'Monday', 'Tuesday', 'Wednesday',
'Thursday', 'Friday', 'Saturday'
]
# and corresponding sales
sales = [987, 600, 320, 437, 520, 680, 1067]
store_sales = pd.Series(
data=sales, # Sales numbers
name='Sales', # Series name
index=days # Use days as the index
)
store_sales
Sunday 987
Monday 600
Tuesday 320
Wednesday 437
Thursday 520
Friday 680
Saturday 1067
Name: Sales, dtype: int64
idxmin() - Index of the Minimum Value
Suppose you want to know - which day of the last week had the lowest sales amount?
Since we used the day as the index, we can translate this question to: What’s the index of the minimum value of the Series store_sales
?
The method idxmin() will get you the answer:
# The series method idxmin() returns
# the index of the minimum value
store_sales.idxmin()
'Tuesday'
Thus the lowest sales of the week occurred on Tuesday. And what was that amount? Let’s get that using the method min():
# Get the lowest sales amount
store_sales.min()
320
idxmax() - Index of the Maximum Value
The Series method idxmax() returns the index of the maximum value. We’ll use it to find the day when the store generated the highest sales amount:
# The series method idxmax() returns
# the index of the maximum value
store_sales.idxmax()
'Saturday'
Let’s get the highest amount using the max() method:
store_sales.max()
1067
Thus the store had the best sales ($1067) on Saturday.
Repeated Min / Max Values
What if the dataset contains the minimum or maximum value more than once? How would the methods idxmin() or idxmax() behave? Let’s find out.
Suppose you calculate the average sales per weekday for the last year. And store them in a pandas Series:
# week day
days = [
'Sunday', 'Monday', 'Tuesday', 'Wednesday',
'Thursday', 'Friday', 'Saturday'
]
# Average sale for each day
sales = [1100, 550, 300, 450, 300, 700, 1100]
avg_sales = pd.Series(
data=sales,
name='Sales',
index=days
)
# print as DataFrame for readability
avg_sales.to_frame()
Sales | |
---|---|
Sunday | 1100 |
Monday | 550 |
Tuesday | 300 |
Wednesday | 450 |
Thursday | 300 |
Friday | 700 |
Saturday | 1100 |
The minimum value, 300, occurs twice on Tuesday and Thursday. The maximum value, 1100, is repeated on Sunday and Saturday.
We expect the method idxmin() to return multiple index labels. But it only returns Tuesday, the row label of the first occurrence of the minimum value:
avg_sales.idxmin()
'Tuesday'
So idxmin() failed to get us all the days when the minimum value occurred. What’s the alternative?
We can use the boolean mask followed by the select operation. Let’s see this action.
First, we create a boolean mask that returns True
when the data value equals the minimum value:
# Boolean mask -
# Entries that equal the minimum value
# will be set to True
avg_sales == avg_sales.min()
Sunday False
Monday False
Tuesday True
Wednesday False
Thursday True
Friday False
Saturday False
Name: Sales, dtype: bool
The mask is set to True
for both Tuesday and Thursday. Next, use this mask to select the True
entries:
avg_sales[avg_sales == avg_sales.min()]
Tuesday 300
Thursday 300
Name: Sales, dtype: int64
We got the expected answer! Both Tuesday and Thursday had minimum sales of $300.
You can also extract the index labels (days) from the above query using the index
attribute:
avg_sales[avg_sales == avg_sales.min()].index.to_list()
['Tuesday', 'Thursday']
Let’s use this technique to extract the days of maximum average sales:
avg_sales[avg_sales == avg_sales.max()].index.tolist()
['Sunday', 'Saturday']
idxmin() & idxmax() for DataFrames
The method idxmin() provides additional functionality for DataFrames. You can use it to get the index label of the minimum value across rows or columns.
Let me illustrate with an example.
Suppose you have employed three sales agents in your bookstore - Ava, Maya, and Priya. Let’s look at their average sales per day. We’ll store this data in a DataFrame:
days = [
'Sunday', 'Monday', 'Tuesday', 'Wednesday',
'Thursday', 'Friday', 'Saturday'
]
# Sales per day for each agent
sales = {
'Ava': [ 168, 186, 107, 155, 152, 212, 175],
'Maya': [ 200, 168, 106, 173, 164, 188, 184],
'Priya': [ 176, 172, 158, 164, 91, 198, 234]
}
# Load in a DataFrame
agent_sales = pd.DataFrame(
data=sales,
index=pd.Index(days, name='Days')
)
agent_sales
Ava | Maya | Priya | |
---|---|---|---|
Days | |||
Sunday | 168 | 200 | 176 |
Monday | 186 | 168 | 172 |
Tuesday | 107 | 106 | 158 |
Wednesday | 155 | 173 | 164 |
Thursday | 152 | 164 | 91 |
Friday | 212 | 188 | 198 |
Saturday | 175 | 184 | 234 |
Suppose you want to know: Get me the day when each agent generated the minimum sales?
You can use the DataFrame idxmin() method to get this information.
By default, it gets the minimum value across the row index. That is, it returns the row index (day) of the minimum value (sales) for each column (agent):
agent_sales.idxmin()
Ava Tuesday
Maya Tuesday
Priya Thursday
dtype: object
However, what if you ask a different question: Which sales agent had the minimum sales on each day?
We’ll need to use idxmin() with the axis
parameter. When you set the axis
to 'columns'
, idxmin() finds the index of minimum value across columns.
Thus, the below code will return the agent (column index) with minimum sales for each day (row):
# Index of the minimum values across columns
agent_sales.idxmin(axis='columns')
Days
Sunday Ava
Monday Maya
Tuesday Maya
Wednesday Ava
Thursday Priya
Friday Maya
Saturday Ava
dtype: object
We can perform similar operations to get the index of the maximum values.
The default idxmax() call returns the index of the maximum value across rows. Thus below code finds the day (row index) of the maximum sales for each agent (column):
# DataFrame idxmax() call
# Find the index of the maximum value across rows
agent_sales.idxmax()
Ava Friday
Maya Sunday
Priya Saturday
dtype: object
When you set the axis
parameter to 'columns'
, idxmax() finds the index of maximum value across columns.
Thus it returns the agent (column index) with the maximum sales for each day (row index):
agent_sales.idxmax(axis='columns')
Days
Sunday Maya
Monday Ava
Tuesday Priya
Wednesday Maya
Thursday Maya
Friday Ava
Saturday Priya
dtype: object
Summary
This quick tutorial showed you how to use pandas methods - idxmin() and idxmax().
You can use them to get the index of the minimum or maximum value of a pandas Series.
Things get interesting with DataFrames. The idxmin() and idxmax() methods allow you to work across the row or column index. You can use the parameter axis
to control that behavior.
The tutorial included plenty of examples. So you should feel confident using these methods on your own.