Extracting the year and month from a datetime column in Pandas is a common task. This article explores three efficient methods, comparing their strengths and weaknesses to help you choose the best approach for your needs.
Table of Contents
Using the .dt
accessor
The .dt
accessor provides a straightforward and efficient way to extract datetime components. It’s often the preferred method due to its readability and conciseness.
import pandas as pd
data = {'date': pd.to_datetime(['2024-03-15', '2023-11-20', '2024-05-10'])}
df = pd.DataFrame(data)
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
print(df)
This code will output:
date year month
0 2024-03-15 2024 3
1 2023-11-20 2023 11
2 2024-05-10 2024 5
Utilizing the strftime()
method
The strftime()
method offers greater flexibility, allowing you to customize the output format. This is particularly useful when you need specific string representations of the year and month for reporting or other purposes.
import pandas as pd
data = {'date': pd.to_datetime(['2024-03-15', '2023-11-20', '2024-05-10'])}
df = pd.DataFrame(data)
df['year'] = df['date'].dt.strftime('%Y')
df['month'] = df['date'].dt.strftime('%m') # Use '%b' for abbreviated month name, '%B' for full name
print(df)
This will produce:
date year month
0 2024-03-15 2024 03
1 2023-11-20 2023 11
2 2024-05-10 2024 05
Remember to consult Python’s strftime()
documentation for a complete list of format codes.
Direct Access with DatetimeIndex
If your ‘date’ column is already a DatetimeIndex
, you can directly access the year and month attributes. While less common, this can be efficient if your data is already in this format.
import pandas as pd
data = {'date': pd.to_datetime(['2024-03-15', '2023-11-20', '2024-05-10'])}
df = pd.DataFrame(data)
date_index = pd.DatetimeIndex(df['date'])
df['year'] = date_index.year
df['month'] = date_index.month
print(df)
This produces the same output as the first example. However, this method is generally less preferred unless you are already working with a DatetimeIndex
object.
Conclusion: The .dt
accessor offers the most concise and efficient approach for most scenarios. strftime()
provides greater formatting control, while direct DatetimeIndex
access is situationally useful. Choose the method that best aligns with your needs and coding style. Always ensure your ‘date’ column is of datetime dtype using pd.to_datetime()
if necessary.