# Summary Statistics Using Pandas

Sometimes the default statistics provided by pandas’ ** describe()** method are not enough. In such cases, you can generate custom statistics using the

**method.**

`agg()`

I’ll explain this using below list of numbers:

```
import numpy as np
import pandas as pd
# 30 random points from normal distribution
# with mean = 0 and standard deviation = 15
data = np.random.normal(5, 15, 30)
data_df = pd.DataFrame({"data":data})
```

## The Problem

You can get some statistics using pandas’ ** describe()** method like below:

`data_df.describe()`

data | |
---|---|

count | 30.00 |

mean | 5.92 |

std | 15.47 |

min | -23.72 |

25% | -3.37 |

50% | 4.62 |

75% | 13.86 |

max | 37.45 |

But let’s say you need the below statistics:

- Minimum Value
- Maximum Value
- Range
- Mean
- Median
- Variance
- Inter Quartile Range (IQR)

The output of the ** describe()** didn’t include range, variance, or IQR.

How can you get all the statistics you want?

## The Solution

You’ll need to do two things:

- Write
**custom aggregate functions**for statistics like range and IQR - Use pandas
method to generate all the statistics`agg()`

### Custom Aggregate Functions

```
# Input's inter quartile range
# it's the distance between 75th and 25th percentiles
def IQR(column):
q25, q75 = column.quantile([0.25, 0.75])
return q75-q25
# Input's range
# It's the difference between input's maximum and minimum values
#
# range() is already a built-in function in Python.
# So I chose another name for our custom function
def range_f(column):
return column.max() - column.min()
```

### The `agg()`

Method

Let’s prepare a list of all the required statistics.

You can mix a variety of functions in this list:

- pandas’ built-in functions like
,`min`

, etc.`max`

- the custom functions we defined above.
- NumPy aggregate functions. For example, we’ll use NumPy’s
to calculate variance.`var()`

```
stats_list = [
'min', 'max',
range_f, # custom function
'mean', 'median', 'std',
np.var, # numpy function
IQR # custom function
]
```

Next, we generate the statistics using pandas’ ** agg()** method.

```
summary_stats = data_df.agg(func=stats_list)
summary_stats
```

data | |
---|---|

min | -23.72 |

max | 37.45 |

range_f | 61.17 |

mean | 5.92 |

median | 4.62 |

std | 15.47 |

var | 239.35 |

IQR | 17.23 |

### Pretty Names

The statistics above look good. But you may not like the default names for the statistics.

You can use custom names:

```
# custom names for the statistics.
# Make sure they are in the same order
# as in the 'stat_list' variable
pretty_names = [
'Minimum', 'Maximum', 'Range', 'Mean', 'Median',
'Standard Deviation', 'Variance', 'IQR'
]
# update the index labels with our custom names
summary_stats.index = pretty_names
summary_stats
```

data | |
---|---|

Minimum | -23.72 |

Maximum | 37.45 |

Range | 61.17 |

Mean | 5.92 |

Median | 4.62 |

Standard Deviation | 15.47 |

Variance | 239.35 |

IQR | 17.23 |

That’s it! We have the statistics just the way you wanted.