Datetime Information
dataretrieval
attempts to normalize time data to UTC time when converting
web service data into dataframes. To do this, in-built pandas functions are
used; either pandas.to_datetime()
during the initial datetime object
conversion, or pandas.DataFrame.tz_localize()
if the datetime objects
exist but are not UTC-localized. In most cases (single-site and multi-site),
dataretrieval
assigns the datetime information as the dataframe index,
the exception to this is when incomplete datetime information is available, in
these cases integers are used as the dataframe index (see PR#58 for more
details).
Inspecting Timestamps
For single sites, the index of the returned dataframe contains pandas timestamps.
>>> import dataretrieval.nwis as nwis
>>> site = '03339000'
>>> df = nwis.get_record(sites=site, service='peaks',
... start='2015-01-01', end='2017-12-31')
>>> print(df)
agency_cd site_no peak_tm peak_va peak_cd gage_ht gage_ht_cd year_last_pk ag_dt ag_tm ag_gage_ht ag_gage_ht_cd
datetime
2015-06-08 00:00:00+00:00 USGS 03339000 17:30 25100 C 22.83 NaN NaN NaN NaN NaN NaN
2015-12-29 00:00:00+00:00 USGS 03339000 18:45 37600 C 26.66 NaN NaN NaN NaN NaN NaN
2017-05-05 00:00:00+00:00 USGS 03339000 04:45 17000 C 18.47 NaN NaN NaN NaN NaN NaN
Here the index of the dataframe df
is a set of datetime objects. Each has
the format, YYYY-MM-DD HH:MM:SS+HH:MM
. Because these timestamps are
localized to be in UTC, the expected offset (+HH:MM
) is +00:00
.
These values can be converted to a local timezone of your choosing using
pandas
functionality.
>>> df.index = df.index.tz_convert(tz='America/New_York')
>>> print(df)
agency_cd site_no peak_tm peak_va peak_cd gage_ht gage_ht_cd year_last_pk ag_dt ag_tm ag_gage_ht ag_gage_ht_cd
datetime
2015-06-07 20:00:00-04:00 USGS 03339000 17:30 25100 C 22.83 NaN NaN NaN NaN NaN NaN
2015-12-28 19:00:00-05:00 USGS 03339000 18:45 37600 C 26.66 NaN NaN NaN NaN NaN NaN
2017-05-04 20:00:00-04:00 USGS 03339000 04:45 17000 C 18.47 NaN NaN NaN NaN NaN NaN
Above, the index was converted to localize the timestamps to New York.
In the updated dataframe index, the resulting timestamps now have offsets of
-04:00
and -05:00
as New York is either 4 or 5 hours behind UTC
depending on the time of year (due to daylight savings).
When information for multiple sites is requested, dataretrieval
creates a
dataframe with a multi-index, with the first entry containing the site number,
and the second containing the datetime information.
>>> import dataretrieval.nwis as nwis
>>> sites = ['180049066381200', '290000095192602']
>>> df = nwis.get_record(sites=sites, service='gwlevels',
... start='2021-10-01', end='2022-01-01')
>>> df
agency_cd site_tp_cd lev_dt lev_tm lev_tz_cd ... lev_dt_acy_cd lev_acy_cd lev_src_cd lev_meth_cd lev_age_cd
site_no datetime ...
180049066381200 2021-10-04 19:54:00+00:00 USGS GW 2021-10-04 19:54 +0000 ... m NaN S S A
2021-11-16 14:28:00+00:00 USGS GW 2021-11-16 14:28 +0000 ... m NaN S S A
2021-12-09 10:43:00+00:00 USGS GW 2021-12-09 10:43 +0000 ... m NaN S S A
290000095192602 2021-12-08 19:07:00+00:00 USGS GW 2021-12-08 19:07 +0000 ... m NaN S S P
[4 rows x 15 columns]
Here note that the default datetime index information returned is also UTC
localized, and therefore the offset values are +00:00
.