 # Summary Statistics Using Pandas

Sometimes the default statistics provided by pandas' `describe()` method are not enough. In such cases, you can generate custom statistics using the `agg()` method.

I’ll explain this using below list of numbers:

``````import numpy as np
import pandas as pd

# 30 random points from normal distribution
# with mean = 0 and standard deviation = 15
data = np.random.normal(5, 15, 30)
data_df = pd.DataFrame({"data":data})
``````

## The Problem 🔗

You can get some statistics using pandas' `describe()` method like below:

``````data_df.describe()
``````

data
count 30.00
mean 5.92
std 15.47
min -23.72
25% -3.37
50% 4.62
75% 13.86
max 37.45

But let’s say you need the below statistics:

• Minimum Value
• Maximum Value
• Range
• Mean
• Median
• Variance
• Inter Quartile Range (IQR)

The output of the `describe()` didn’t include range, variance, or IQR.

How can you get all the statistics you want?

## The Solution 🔗

You’ll need to do two things:

• Write custom aggregate functions for statistics like range and IQR
• Use pandas `agg()` method to generate all the statistics

### Custom Aggregate Functions 🔗

``````# Input's inter quartile range
# it's the distance between 75th and 25th percentiles
def IQR(column):
q25, q75 = column.quantile([0.25, 0.75])
return q75-q25

# Input's range
# It's the difference between input's maximum and minimum values
#
# range() is already a built-in function in Python.
# So I chose another name for our custom function
def range_f(column):
return column.max() - column.min()
``````

### The `agg()` Method 🔗

Let’s prepare a list of all the required statistics.

You can mix a variety of functions in this list:

• pandas' built-in functions like `min`, `max`, etc.
• the custom functions we defined above.
• NumPy aggregate functions. For example, we’ll use NumPy’s `var()` to calculate variance.
``````stats_list = [
'min', 'max',
range_f, # custom function
'mean', 'median',  'std',
np.var, # numpy function
IQR   # custom function
]
``````

Next, we generate the statistics using pandas' `agg()` method.

``````summary_stats = data_df.agg(func=stats_list)
summary_stats
``````

data
min -23.72
max 37.45
range_f 61.17
mean 5.92
median 4.62
std 15.47
var 239.35
IQR 17.23

### Pretty Names 🔗

The statistics above look good. But you may not like the default names for the statistics.

You can use custom names:

``````# custom names for the statistics.
# Make sure they are in the same order
# as in the 'stat_list' variable
pretty_names = [
'Minimum', 'Maximum', 'Range', 'Mean', 'Median',
'Standard Deviation', 'Variance', 'IQR'
]

# update the index labels with our custom names
summary_stats.index = pretty_names
summary_stats
``````

data
Minimum -23.72
Maximum 37.45
Range 61.17
Mean 5.92
Median 4.62
Standard Deviation 15.47
Variance 239.35
IQR 17.23

That’s it! We have the statistics just the way you wanted.

Title Image by Pexels