Count distinct in Pandas aggregation
Pandas
import pandas as pd
import numpy as np
Create a dataframe
#create a dataframe
df = pd.DataFrame({'date': ['2013-04-01','2013-04-01','2013-04-01','2013-04-02', '2013-04-02'],
'user_id': ['0001', '0001', '0002', '0002', '0002'],
'duration': [30, 15, 20, 15, 30]})
df
date | duration | user_id | |
---|---|---|---|
0 | 2013-04-01 | 30 | 0001 |
1 | 2013-04-01 | 15 | 0001 |
2 | 2013-04-01 | 20 | 0002 |
3 | 2013-04-02 | 15 | 0002 |
4 | 2013-04-02 | 30 | 0002 |
Count distinct in Pandas aggregation
#here we can count the number of distinct users viewing on a given day
df = df.groupby("date").agg({"duration": np.sum, "user_id": pd.Series.nunique})
df
duration | user_id | |
---|---|---|
date | ||
2013-04-01 | 65 | 2 |
2013-04-02 | 45 | 1 |