Select Pandas dataframe rows between two dates
Pandas
import modules
import pandas as pd
import numpy as np
create dummy dataframe
raw_data = {'name': ['Willard Morris', 'Al Jennings', 'Omar Mullins', 'Spencer McDaniel'],
'age': [20, 19, 22, 21],
'favorite_color': ['blue', 'red', 'yellow', "green"],
'grade': [88, 92, 95, 70],
'birth_date': ['01-02-1996', '08-05-1997', '04-28-1996', '12-16-1995']}
df = pd.DataFrame(raw_data, index = ['Willard Morris', 'Al Jennings', 'Omar Mullins', 'Spencer McDaniel'])
df
age | birth_date | favorite_color | grade | name | |
---|---|---|---|---|---|
Willard Morris | 20 | 01-02-1996 | blue | 88 | Willard Morris |
Al Jennings | 19 | 08-05-1997 | red | 92 | Al Jennings |
Omar Mullins | 22 | 04-28-1996 | yellow | 95 | Omar Mullins |
Spencer McDaniel | 21 | 12-16-1995 | green | 70 | Spencer McDaniel |
Select Pandas dataframe rows between two dates
We can perform this using a boolean mask First, lets ensure the 'birth_date' column is in date format
df['birth_date'] = pd.to_datetime(df['birth_date'])
next, set the desired start date and end date to filter df with -- these can be in datetime (numpy and pandas), timestamp, or string format
start_date = '03-01-1996'
end_date = '06-01-1997'
next, set the mask -- we can then apply this to the df to filter it
mask = (df['birth_date'] > start_date) & (df['birth_date'] <= end_date)
assign mask to df to return the rows with birth_date between our specified start/end dates
df = df.loc[mask]
df
age | birth_date | favorite_color | grade | name | |
---|---|---|---|---|---|
Omar Mullins | 22 | 1996-04-28 | yellow | 95 | Omar Mullins |