# Using Pandas, count the number of unique values per group [duplicate].

## Problem

In each domain, I need to count the number of unique ID values.

I have data:

``````ID, domain
123, 'vk.com'
123, 'vk.com'
456, 'vk.com'
456, 'vk.com'
789, 'vk.com'
``````

I try df.groupby([‘domain’, ‘ID’]). count()

But I’d like to get something.

``````domain, count
vk.com   3
``````

## Solution #1

You need nunique:

``````df = df.groupby('domain')['ID'].nunique()

print (df)
domain
'vk.com'          3
Name: ID, dtype: int64
``````

If you need to remove’characters, do do as follows:

``````df = df.ID.groupby([df.domain.str.strip("'")]).nunique()
print (df)
domain
vk.com          3
Name: ID, dtype: int64
``````

Or, as Jon Clements put it,

``````df.groupby(df.domain.str.strip("'"))['ID'].nunique()
``````

You can keep the column name as follows:

``````df = df.groupby(by='domain', as_index=False).agg({'ID': pd.Series.nunique})
print(df)
domain  ID
0       fb   1
1      ggl   1
3       vk   3
``````

The distinction between nunique() and agg() is that nunique() returns a Series, while agg() returns a DataFrame.

## Solution #2

In general, you can use Series.value counts to count distinct values in a single column:

``````df.domain.value_counts()

#'vk.com'          5
#Name: domain, dtype: int64
``````

Use Series.nunique to count the number of unique values in a column:

``````df.domain.nunique()
# 4
``````

You can use unique or drop duplicates to acquire all of these distinct values; the only difference between the two procedures is that unique returns a numpy. drop duplicates returns a pandas, while array returns an array. Series:

``````df.domain.unique()

df.domain.drop_duplicates()
#0          'vk.com'
#Name: domain, dtype: object
``````

In this case, as you want to count distinct values in relation to another variable, instead of using the groupby approach suggested by other replies, you may simply delete duplicates first and then use value counts():

``````import pandas as pd
df.drop_duplicates().domain.value_counts()

# 'vk.com'          3
# Name: domain, dtype: int64
``````

## Solution #3

df.domain.value_counts()

``````>>> df.domain.value_counts()

vk.com          5

Name: domain, dtype: int64
``````

## Solution #4

If I understand correctly, you want to know how many distinct IDs each domain has. Then give this a shot:

``````output = df.drop_duplicates()
output.groupby('domain').size()
``````

Output:

``````    domain
vk.com          3
dtype: int64
``````

Value counts is another option, however it is significantly less efficient. But Jezrael’s nunique response is the best:

``````%timeit df.drop_duplicates().groupby('domain').size()
1000 loops, best of 3: 939 µs per loop
%timeit df.drop_duplicates().domain.value_counts()
1000 loops, best of 3: 1.1 ms per loop
%timeit df.groupby('domain')['ID'].nunique()
1000 loops, best of 3: 440 µs per loop
``````