Problem
I have a Dataframe, df, that has the column:
df['ArrivalDate'] =
...
936 2012-12-31
938 2012-12-29
965 2012-12-31
966 2012-12-31
967 2012-12-31
968 2012-12-31
969 2012-12-31
970 2012-12-29
971 2012-12-31
972 2012-12-29
973 2012-12-29
...
The column’s elements are pandas.tslib.Timestamp.
I’d like to include only the year and month. I assumed there would be an easy method to do that, but I’m stumped.
Here’s what I’ve done so far:
df['ArrivalDate'].resample('M', how = 'mean')
I received the following error message:
Only valid with DatetimeIndex or PeriodIndex
Then I tried:
df['ArrivalDate'].apply(lambda(x):x[:-2])
I received the following error message:
'Timestamp' object has no attribute '__getitem__'
Any suggestions?
I worked it out in the end.
df.index = df['ArrivalDate']
The index can then be used to resample another column.
However, I’d still prefer a way to re-arrange the entire column. Do you have any suggestions?
Asked by monkeybiz7
Solution #1
If you wish to add new columns that show the year and month individually, follow these steps:
df['year'] = pd.DatetimeIndex(df['ArrivalDate']).year
df['month'] = pd.DatetimeIndex(df['ArrivalDate']).month
or…
df['year'] = df['ArrivalDate'].dt.year
df['month'] = df['ArrivalDate'].dt.month
Then you can mix and match them or work with them as is.
Answered by KieranPC
Solution #2
The date time format of the df[‘date column’] is required.
df['month_year'] = df['date_column'].dt.to_period('M')
For other sampling intervals, you may use D for Day, 2M for 2 Months, and so on. If you have time series data with a time stamp, you can use granular sampling intervals like 45Min for 45 minutes, 15Min for 15 minutes, and so on.
Answered by kabrapankaj32
Solution #3
You can request a datetime or directly access the year and month characteristics. datetime:
In [15]: t = pandas.tslib.Timestamp.now()
In [16]: t
Out[16]: Timestamp('2014-08-05 14:49:39.643701', tz=None)
In [17]: t.to_pydatetime() #datetime method is deprecated
Out[17]: datetime.datetime(2014, 8, 5, 14, 49, 39, 643701)
In [18]: t.day
Out[18]: 5
In [19]: t.month
Out[19]: 8
In [20]: t.year
Out[20]: 2014
Making an integer encoding them, such as 201408 for August, 2014, is one approach to merge year and month. This might be done throughout a whole column as:
df['YearMonth'] = df['ArrivalDate'].map(lambda x: 100*x.year + x.month)
or a variety of variations thereto
However, I’m not a huge supporter of this because it makes date alignment and arithmetic more difficult in the future, and it’s especially painful for people who come across your code or data without this convention. Choose a day-of-month convention, such as the last non-US-holiday weekday, or the first day, then leave the data in a date/time format with that date convention.
The calendar module can be used to calculate the number value of specific days, such as the last weekday. You could then do something like this:
import calendar
import datetime
df['AdjustedDateToEndOfMonth'] = df['ArrivalDate'].map(
lambda x: datetime.datetime(
x.year,
x.month,
max(calendar.monthcalendar(x.year, x.month)[-1][:5])
)
)
If you only need to convert the datetime column into a stringified representation, you may simply use the strftime method from the datetime.datetime class, as seen below:
In [5]: df
Out[5]:
date_time
0 2014-10-17 22:00:03
In [6]: df.date_time
Out[6]:
0 2014-10-17 22:00:03
Name: date_time, dtype: datetime64[ns]
In [7]: df.date_time.map(lambda x: x.strftime('%Y-%m-%d'))
Out[7]:
0 2014-10-17
Name: date_time, dtype: object
Answered by ely
Solution #4
If you want a month-year pair that is unique, using apply is a good option.
df['mnth_yr'] = df['date_column'].apply(lambda x: x.strftime('%B-%Y'))
In one column, it displays the month and year.
Don’t forget to first change the format to date-time before, I generally forget.
df['date_column'] = pd.to_datetime(df['date_column'])
Answered by kabrapankaj32
Solution #5
Taking the year from [‘2018-03-04’], for example.
df['Year'] = pd.DatetimeIndex(df['date']).year
A new column is created by using the df[‘Year’] function. If you wish to extract the month, simply type.month in the command line.
Answered by Douglas
Post is based on https://stackoverflow.com/questions/25146121/extracting-just-month-and-year-separately-from-pandas-datetime-column