Coder Perfect

Month and Year are extracted independently from the Pandas Datetime column.

Problem

I have a Dataframe, df, that has the column:

df['ArrivalDate'] =
...
936   2012-12-31
938   2012-12-29
965   2012-12-31
966   2012-12-31
967   2012-12-31
968   2012-12-31
969   2012-12-31
970   2012-12-29
971   2012-12-31
972   2012-12-29
973   2012-12-29
...

The column’s elements are pandas.tslib.Timestamp.

I’d like to include only the year and month. I assumed there would be an easy method to do that, but I’m stumped.

Here’s what I’ve done so far:

df['ArrivalDate'].resample('M', how = 'mean')

I received the following error message:

Only valid with DatetimeIndex or PeriodIndex 

Then I tried:

df['ArrivalDate'].apply(lambda(x):x[:-2])

I received the following error message:

'Timestamp' object has no attribute '__getitem__' 

Any suggestions?

I worked it out in the end.

df.index = df['ArrivalDate']

The index can then be used to resample another column.

However, I’d still prefer a way to re-arrange the entire column. Do you have any suggestions?

Asked by monkeybiz7

Solution #1

If you wish to add new columns that show the year and month individually, follow these steps:

df['year'] = pd.DatetimeIndex(df['ArrivalDate']).year
df['month'] = pd.DatetimeIndex(df['ArrivalDate']).month

or…

df['year'] = df['ArrivalDate'].dt.year
df['month'] = df['ArrivalDate'].dt.month

Then you can mix and match them or work with them as is.

Answered by KieranPC

Solution #2

The date time format of the df[‘date column’] is required.

df['month_year'] = df['date_column'].dt.to_period('M')

For other sampling intervals, you may use D for Day, 2M for 2 Months, and so on. If you have time series data with a time stamp, you can use granular sampling intervals like 45Min for 45 minutes, 15Min for 15 minutes, and so on.

Answered by kabrapankaj32

Solution #3

You can request a datetime or directly access the year and month characteristics. datetime:

In [15]: t = pandas.tslib.Timestamp.now()

In [16]: t
Out[16]: Timestamp('2014-08-05 14:49:39.643701', tz=None)

In [17]: t.to_pydatetime() #datetime method is deprecated
Out[17]: datetime.datetime(2014, 8, 5, 14, 49, 39, 643701)

In [18]: t.day
Out[18]: 5

In [19]: t.month
Out[19]: 8

In [20]: t.year
Out[20]: 2014

Making an integer encoding them, such as 201408 for August, 2014, is one approach to merge year and month. This might be done throughout a whole column as:

df['YearMonth'] = df['ArrivalDate'].map(lambda x: 100*x.year + x.month)

or a variety of variations thereto

However, I’m not a huge supporter of this because it makes date alignment and arithmetic more difficult in the future, and it’s especially painful for people who come across your code or data without this convention. Choose a day-of-month convention, such as the last non-US-holiday weekday, or the first day, then leave the data in a date/time format with that date convention.

The calendar module can be used to calculate the number value of specific days, such as the last weekday. You could then do something like this:

import calendar
import datetime
df['AdjustedDateToEndOfMonth'] = df['ArrivalDate'].map(
    lambda x: datetime.datetime(
        x.year,
        x.month,
        max(calendar.monthcalendar(x.year, x.month)[-1][:5])
    )
)

If you only need to convert the datetime column into a stringified representation, you may simply use the strftime method from the datetime.datetime class, as seen below:

In [5]: df
Out[5]: 
            date_time
0 2014-10-17 22:00:03

In [6]: df.date_time
Out[6]: 
0   2014-10-17 22:00:03
Name: date_time, dtype: datetime64[ns]

In [7]: df.date_time.map(lambda x: x.strftime('%Y-%m-%d'))
Out[7]: 
0    2014-10-17
Name: date_time, dtype: object

Answered by ely

Solution #4

If you want a month-year pair that is unique, using apply is a good option.

df['mnth_yr'] = df['date_column'].apply(lambda x: x.strftime('%B-%Y')) 

In one column, it displays the month and year.

Don’t forget to first change the format to date-time before, I generally forget.

df['date_column'] = pd.to_datetime(df['date_column'])

Answered by kabrapankaj32

Solution #5

Taking the year from [‘2018-03-04’], for example.

df['Year'] = pd.DatetimeIndex(df['date']).year  

A new column is created by using the df[‘Year’] function. If you wish to extract the month, simply type.month in the command line.

Answered by Douglas

Post is based on https://stackoverflow.com/questions/25146121/extracting-just-month-and-year-separately-from-pandas-datetime-column