## Problem

The documentation demonstrates how to use a dict with the output column names as keys to apply multiple functions on a groupby object at once:

```
In [563]: grouped['D'].agg({'result1' : np.sum,
.....: 'result2' : np.mean})
.....:
Out[563]:
result2 result1
A
bar -0.579846 -1.739537
foo -0.280588 -1.402938
```

This, however, only applies to a Series groupby object. When a dict is supplied to a groupby DataFrame in the same way, the keys are expected to represent the column names to which the function will be applied.

The current method is to go column by column and use lambdas for functions that are dependent on other rows, similar to the code above. However, it is taking a long time (I think it takes a long time to iterate through a groupby object). I’ll have to tweak it so that I iterate through the entire groupby object in a single pass, but I’m wondering if there’s a built-in mechanism to do this in pandas in a tidy way.

For instance, I’ve attempted something similar to

```
grouped.agg({'C_sum' : lambda x: x['C'].sum(),
'C_std': lambda x: x['C'].std(),
'D_sum' : lambda x: x['D'].sum()},
'D_sumifC3': lambda x: x['D'][x['C'] == 3].sum(), ...)
```

However, I get a KeyError as predicted (since the keys have to be a column if agg is called from a DataFrame).

Is there a built-in way to achieve what I want, or a chance that this functionality will be implemented, or will I have to manually iterate through the groupby?

Asked by beardc

## Solution #1

The present acceptable answer’s second half is out of date and contains two deprecations. First and foremost, you can no longer use the agg groupby method to pass a dictionary of dictionaries. Second,.ix should never be used.

If you want to work with two different columns at the same time, you should use the apply method, which passes a DataFrame to the applied function implicitly. Let’s utilize a dataframe that’s comparable to the one we used earlier.

```
df = pd.DataFrame(np.random.rand(4,4), columns=list('abcd'))
df['group'] = [0, 0, 1, 1]
df
a b c d group
0 0.418500 0.030955 0.874869 0.145641 0
1 0.446069 0.901153 0.095052 0.487040 0
2 0.843026 0.936169 0.926090 0.041722 1
3 0.635846 0.439175 0.828787 0.714123 1
```

An aggregate can still be performed using a dictionary that maps column names to aggregation functions.

```
df.groupby('group').agg({'a':['sum', 'max'],
'b':'mean',
'c':'sum',
'd': lambda x: x.max() - x.min()})
a b c d
sum max mean sum <lambda>
group
0 0.864569 0.446069 0.466054 0.969921 0.341399
1 1.478872 0.843026 0.687672 1.754877 0.672401
```

If you don’t like the name of the lambda column, you can use a regular function and give it a custom name using the special __name__ property, as seen below:

```
def max_min(x):
return x.max() - x.min()
max_min.__name__ = 'Max minus Min'
df.groupby('group').agg({'a':['sum', 'max'],
'b':'mean',
'c':'sum',
'd': max_min})
a b c d
sum max mean sum Max minus Min
group
0 0.864569 0.446069 0.466054 0.969921 0.341399
1 1.478872 0.843026 0.687672 1.754877 0.672401
```

You can’t use agg, which implicitly provides a Series to the aggregating function, if you have numerous columns that need to interact. When using apply, the complete group is supplied into the method as a DataFrame.

Making a single custom function that returns a Series of all the aggregations is what I advocate. As labels for the new columns, use the Series index:

```
def f(x):
d = {}
d['a_sum'] = x['a'].sum()
d['a_max'] = x['a'].max()
d['b_mean'] = x['b'].mean()
d['c_d_prodsum'] = (x['c'] * x['d']).sum()
return pd.Series(d, index=['a_sum', 'a_max', 'b_mean', 'c_d_prodsum'])
df.groupby('group').apply(f)
a_sum a_max b_mean c_d_prodsum
group
0 0.864569 0.446069 0.466054 0.173711
1 1.478872 0.843026 0.687672 0.630494
```

If you’re a big fan of MultiIndexes, you can still use one like this to return a Series:

```
def f_mi(x):
d = []
d.append(x['a'].sum())
d.append(x['a'].max())
d.append(x['b'].mean())
d.append((x['c'] * x['d']).sum())
return pd.Series(d, index=[['a', 'a', 'b', 'c_d'],
['sum', 'max', 'mean', 'prodsum']])
df.groupby('group').apply(f_mi)
a b c_d
sum max mean prodsum
group
0 0.864569 0.446069 0.466054 0.173711
1 1.478872 0.843026 0.687672 0.630494
```

Answered by Ted Petrou

## Solution #2

You can pass a dict of column names for keys and a list of functions for values in the first part:

```
In [28]: df
Out[28]:
A B C D E GRP
0 0.395670 0.219560 0.600644 0.613445 0.242893 0
1 0.323911 0.464584 0.107215 0.204072 0.927325 0
2 0.321358 0.076037 0.166946 0.439661 0.914612 1
3 0.133466 0.447946 0.014815 0.130781 0.268290 1
In [26]: f = {'A':['sum','mean'], 'B':['prod']}
In [27]: df.groupby('GRP').agg(f)
Out[27]:
A B
sum mean prod
GRP
0 0.719580 0.359790 0.102004
1 0.454824 0.227412 0.034060
```

UPDATE 1:

Because the aggregate function is based on Series, references to the names of the other columns are lost. To get around this, you can use the lambda function to reference the entire dataframe and index it using the group indices.

Here’s an ingenious workaround:

```
In [67]: f = {'A':['sum','mean'], 'B':['prod'], 'D': lambda g: df.loc[g.index].E.sum()}
In [69]: df.groupby('GRP').agg(f)
Out[69]:
A B D
sum mean prod <lambda>
GRP
0 0.719580 0.359790 0.102004 1.170219
1 0.454824 0.227412 0.034060 1.182901
```

The resulting ‘D’ column contains the total of the ‘E’ values.

UPDATE 2:

Here’s a way that, in my opinion, will fulfill all of your requirements. Make a custom lambda function first. The letter g refers to the group. g will be a Series when aggregating. The current group from df is selected when g.index is sent to df.ix[]. After that, I check to see if column C is less than 0.5. The boolean series that is returned is sent to g[], which chooses just the rows that fulfill the criterion.

```
In [95]: cust = lambda g: g[df.loc[g.index]['C'] < 0.5].sum()
In [96]: f = {'A':['sum','mean'], 'B':['prod'], 'D': {'my name': cust}}
In [97]: df.groupby('GRP').agg(f)
Out[97]:
A B D
sum mean prod my name
GRP
0 0.719580 0.359790 0.102004 0.204072
1 0.454824 0.227412 0.034060 0.570441
```

Answered by Zelazny7

## Solution #3

Since pandas version 0.25.0 or higher, we are moving away from the dictionary based aggregation and renaming, and moving towards named aggregations which accepts a tuple. Now we can simultaneously aggregate + rename to a more informative column name:

Example:

```
df = pd.DataFrame(np.random.rand(4,4), columns=list('abcd'))
df['group'] = [0, 0, 1, 1]
a b c d group
0 0.521279 0.914988 0.054057 0.125668 0
1 0.426058 0.828890 0.784093 0.446211 0
2 0.363136 0.843751 0.184967 0.467351 1
3 0.241012 0.470053 0.358018 0.525032 1
```

With named aggregation, use GroupBy.agg:

```
df.groupby('group').agg(
a_sum=('a', 'sum'),
a_mean=('a', 'mean'),
b_mean=('b', 'mean'),
c_sum=('c', 'sum'),
d_range=('d', lambda x: x.max() - x.min())
)
a_sum a_mean b_mean c_sum d_range
group
0 0.947337 0.473668 0.871939 0.838150 0.320543
1 0.604149 0.302074 0.656902 0.542985 0.057681
```

Answered by Erfan

## Solution #4

I found that a little more condensed listing was preferable to Ted Petrou’s answer (primarily for aesthetic reasons). Please don’t accept it; it’s just a lot more extensive comment on Ted’s response, as well as code and data. Although Python/pandas is not my first/favorite programming language, I thought this to be a good read:

```
df.groupby('group') \
.apply(lambda x: pd.Series({
'a_sum' : x['a'].sum(),
'a_max' : x['a'].max(),
'b_mean' : x['b'].mean(),
'c_d_prodsum' : (x['c'] * x['d']).sum()
})
)
a_sum a_max b_mean c_d_prodsum
group
0 0.530559 0.374540 0.553354 0.488525
1 1.433558 0.832443 0.460206 0.053313
```

It reminds me of dplyr pipes and data.table chained commands more than anything else. They aren’t always better, but they are more recognizable to me. (I realize the power of utilizing more formalized def functions for these types of operations, as well as the desire of many.) This is merely an alternative; it is not always superior.)

I generated data in the same way Ted did, but I’ll add a seed for consistency.

```
import numpy as np
np.random.seed(42)
df = pd.DataFrame(np.random.rand(4,4), columns=list('abcd'))
df['group'] = [0, 0, 1, 1]
df
a b c d group
0 0.374540 0.950714 0.731994 0.598658 0
1 0.156019 0.155995 0.058084 0.866176 0
2 0.601115 0.708073 0.020584 0.969910 1
3 0.832443 0.212339 0.181825 0.183405 1
```

Answered by r2evans

## Solution #5

In version 0.25.0, there is a new feature.

Pandas accepts the special syntax known as “named aggregation” in GroupBy.agg() to support column-specific aggregation with control over the output column names.

```
>>> animals = pd.DataFrame({
... 'kind': ['cat', 'dog', 'cat', 'dog'],
... 'height': [9.1, 6.0, 9.5, 34.0],
... 'weight': [7.9, 7.5, 9.9, 198.0]
... })
>>> print(animals)
kind height weight
0 cat 9.1 7.9
1 dog 6.0 7.5
2 cat 9.5 9.9
3 dog 34.0 198.0
>>> print(
... animals
... .groupby('kind')
... .agg(
... min_height=pd.NamedAgg(column='height', aggfunc='min'),
... max_height=pd.NamedAgg(column='height', aggfunc='max'),
... average_weight=pd.NamedAgg(column='weight', aggfunc=np.mean),
... )
... )
min_height max_height average_weight
kind
cat 9.1 9.5 8.90
dog 6.0 34.0 102.75
```

pandas. NamedAgg is just a namedtuple. Plain tuples are allowed as well.

```
>>> print(
... animals
... .groupby('kind')
... .agg(
... min_height=('height', 'min'),
... max_height=('height', 'max'),
... average_weight=('weight', np.mean),
... )
... )
min_height max_height average_weight
kind
cat 9.1 9.5 8.90
dog 6.0 34.0 102.75
```

The aggregation functions do not accept any additional keyword parameters. **kwargs should only contain pairs of (column, aggfunc). If your aggregation functions require additional arguments, use functools to partially apply them. partial().

Series groupby aggregations can also use named aggregation. There is no column selection in this example, thus the values are just the functions.

```
>>> print(
... animals
... .groupby('kind')
... .height
... .agg(
... min_height='min',
... max_height='max',
... )
... )
min_height max_height
kind
cat 9.1 9.5
dog 6.0 34.0
```

Answered by exan

**Post is based on https://stackoverflow.com/questions/14529838/apply-multiple-functions-to-multiple-groupby-columns**