Introduction
In this post, we will dive into the DateTime related module of Python and Pandas. Handling DateTime is always a boring part of any programming language. Many times we can achieve most of our requirements without delving much into this module. But if we understand it structurally, it is not that boring. It will make your life pretty easy when handling a Timeseries dataset.
We will try to develop a mindmap along with this post. We will cover,
- Datetime objects in Python
- Operations and Arithmetic on Python Datetime object
- Read DateTime from String and format back to String
- Datetime objects in Pandas
- Learning to operate TimeSeries data based on Datetime Index
- Understanding and applying Delta, Offsets, Timezone
Python Date and Time ecosystem
The above-mentioned modules we will cover in this post. We will dive deep into the datetime module of Python and all the shown modules of Pandas. These are enough for all our DateTime need.
Python Internal packages/modules
Time module
Time is the first package that we will discuss. You may not need it more often because the datetime module will cover everything that is available in this module.
Create a Time object
There are 3 ways we can input the information for a time
- epoch - Seconds since a reference instant, known as the epoch. Midnight, UTC, of January 1, 1970, is a popular epoch used on both Unix and Windows platforms.
- As a tuple - An alternative to seconds since the epoch, a time instant can be represented by a tuple of nine integers, called a timetuple. As show below
tm_year=2005, tm_mon=8, tm_mday=7, tm_hour=23, tm_min=21, tm_sec=29, tm_wday=6, tm_yday=219, tm_isdst=0
This is an intuitive approach since we have the option to input all the relevant values with a keyword argument. This approach is common across different modules but with different names of the underlying Class.struct_time
is the name for the Class intime
module - From String - We can also read from strings like '2020-11-18 23:59:59'
Let's see the functions that are required to achieve the above methods.
import time
tm = time.gmtime(1123456889.5) # epoch --> time.struct_time object
time.mktime(tm) # time.struct_time object --> epoch
time.struct_time((2005, 8, 7, 23, 21, 29, 6, 219, 0)) # Create struct_time explicitly
time.time() # Current time in epoch
# Get the individual attributes
print(tm.tm_year, tm.tm_mon, tm.tm_mday,tm.tm_hour, tm.tm_min, tm.tm_sec)
Code-explanation
We have simply used the 3 methods of time class [ in the time module ]
All other parts of the code is quite trivial and self-explanatory.
With the above code snippet, we are equipped to read and save time data. let's read from Sring and format back to a string
read_time = time.strptime("2018-04-02 23:59:50", '%Y-%m-%d %H:%M:%S')
str_time = time.strftime('%d-%b-%Y %H:%M:%S', read_time)
Code-explanation
We have two method to our service - strptime and strftime.
The meaning of each alphabetic code can be checked Here
Datetime module
The datetime
module has all the functionality of the time module and has many APIs on top of it. So, you might ignore the time module.
The datetime
module has Classes for - Date, Time, and Datetime. The first two are for Date and Time respectively and the last one is the superset for the two. Hence the last one i.e datetime
Class is sufficient for all of our tasks.
Why we need
datetime
when we have thetime
module
The high-level reason is that thetime
module is to handle time as a Float. It is not designed keeping humans in mind.datetime
has all the required API needed to handle date and time by a Human. Check his Reddit Answer Reddit
Let's check the datetime
module with the required code. Be mindful that the Object of the Datetime which stores the values will be datetime
. Also, take a note that the name of the top-level package is also datetime
from datetime import datetime # Both are named datetime
dtm = datetime(2000, 5, 23, hour=0, minute=0,second=0, microsecond=0,tzinfo=None) # Time tuple
dtm = datetime.fromtimestamp(1123456889.5) # epoch --> datetime. Similar to mktime
datetime.now() # Current time
# Read from String
datetime.strptime("2018-04-02 23:59:50", '%Y-%m-%d %H:%M:%S') # string--> datetime
# Back to String
d = datetime.strptime("2018-04-02 23:59:50", '%Y-%m-%d %H:%M:%S')
datetime.strftime(d, '%d-%b-%Y %H:%M:%S') # datetime --> string
# Individual attributes of datetime
d2.year,d2.month, d2.day, d2.minute, d2.second
# Weekdays names are not directly avaialble as attribute
d2.strftime("%A"), d2.strftime("%a")
Code-explanation
Code is quite intuitive to understand. In addition, now we have an option for timezone(tzinfo parameter). We will use it later
Datetime arithmetic and Timedelta module
We now know the approach to input, format, and print formatted datetime. So, let's learn how to do Arithmetic with datetime.
timedelta
is the module to create and manage the difference between to datetime. We can also calculate the future date if the delta is known.
Instances of the timedelta class represent time intervals with three read-only integer attributes days, seconds, and microseconds.
Let's check the timedelta
module with the required code.
from datetime import timedelta, datetime
d1 = datetime.strptime("2018-04-02 23:59:50", '%Y-%m-%d %H:%M:%S')
d2 = datetime.strptime("2019-05-03 23:57:12", '%Y-%m-%d %H:%M:%S')
d2 - d1 # >>> datetime.timedelta(days=395, seconds=86242) # this is timedelta object
delta = timedelta(days=395, seconds=86242) # this is timedelta object
delta.days,delta.seconds # Check attributes
# Add the delta to d1
datetime.strftime(d1+delta, '%d-%b-%Y %H:%M:%S') # Same as d2
str(d1+delta) # str function implementation of timedelta
Code-explanation
As mentioned above, timedeta can be expressed in only 3 attributes days, seconds and microseconds.
Difference of two datetime object is a timedelta object
Pytz package
pytz
is a third-party module to handle timezone-related manipulations. Timezone handling can be prone to bugs and issues. Here are the words of wisdom from "Python in a Nutshell"
The best way to program around the traps and pitfalls of time zones is to always use the UTC time zone internally, converting from other time zones on input, and to other time zones only for display purposes.
Let's check a quick code snippet to handle timezone with datetime.
!pip install pytz
import pytz
# Get the list of all available timezones
pytz.common_timezones #1
# Timezone for a particular country # Use the ISO format of country code
pytz.country_timezones('IN') # >>> ['Asia/Kolkata'] #2
inp_ny = datetime(2021,11,11, tzinfo=pytz.timezone('America/New_York')) # Return datetime with New york time #3
# use the astimezone method of datetime object
out_ind = inp_ny.astimezone( pytz.timezone('Asia/Kolkata')) #4
Code-explanation
#1 - Fetech the list of all avaialble timezones
#2 - Fetch the list of all timezones for a country [India here]
#3 -Use the tzinfo of datetime constructor
#4 -Covert to the desired timezone
When we create a datetime without a tzinfo
it's a naive datetime
i.e. just a datetime without any timezone attached. When we pass the timezone to the tzinfo parameter, the datetime became the datetime for that timezone.
Let's do a small exercise and create two datetime with the same values but pinned to different timezones. Then calculate the timedelta of the two.
time_1 = datetime(2021,11,11, tzinfo=pytz.timezone('America/New_York'))
time_2 = datetime(2021,11,11, tzinfo=pytz.timezone('Pacific/Auckland'))
time_1 - time_2 # >>>datetime.timedelta(seconds=59700) | Equivalent to ~16.5 Hours
Conclusion
This was all for this post. If you keep these few snippets in mind, datetime will never haunt you. We will continue this post and add on to the Pandas library. That post will not just focus on core pf pandas datetime objects but also on the Timeseries data.
You may try,
- The dateutil module - a third-party package that offers modules to manipulate dates. [Link]
- The calendar module - calendar module supplies calendar-related functions
- The arrow library - It offers a sensible and human-friendly approach to creating, manipulating, formatting and converting dates, times and timestamps. [Link]