Date/Time Data
#
OverviewFor many purposes (ecological, technical, timestamps, etc.) tracking an instance in time is an obvious desirable goal. Storing and using such values has proven technically challenging.
Simply put, the goal should be to represent a point in time in a consistent and unambiguous manner.
Language-specific and database-specific implementations may vary, but this goal should be the desired end result.
#
Stumbling blocksA great many problems arise around time zones and date/time values. Languages, operating systems, database layers, ORMs, etc, often make some assumptions about time zones when manipulating/storing/displaying values. This can effectively result in loss (or corruption) of data -- or perhaps more accurately ambiguity in data.
#
Example of troubleHere is an example of current implementation (within houston) which is possibly resulting in ambiguity. Note the values of created
and
updated
in User objects, and how they are represented.
The schema (in sqlite3):
Data as stored:
Are these values in UTC? Local to server time zone? Local to some python-set time zone?
If there is, for example, an assumption that these values are UTC, then:
- be sure to document (wiki, code, etc) that this is case
- ensure you are getting the results you expect - make sure all layers (orm, database) do not try to "smartly" convert values, etc.
- when presenting date/time data (e.g. via json output) be explicit and consistent (see below)
#
StandardsWhen a "default" or base time zone must be set, the preferred value is UTC versus any "local" time zone. This will hopefully minimize confusion when moving between various systems and installs. Beware that some systems (e.g. set up by third-parties etc) may not follow this on some important level. For example, a different setting on PostgreSQL may have rippling effects.
Store/represent values with time zone whenever possible. This is much easier said than done. For example, values can get lost or manipulated with each layer of representation it goes through -- ORM, database, serialization to json, etc. In particular, user-provided data (e.g. ecological) must contain the original (user-provided) time zone for their data. Simply converting the value to UTC is insufficient, as it loses the local/relative time zone.
Presentation/export should be ISO 8601 unless the specific purpose warrants otherwise. In particular, this should be used (as a string) for json values output via api.
#
Implementation notes#
EDM/JavaThe class org.ecocean.ComplexDateTime we developed to handle introduction of ambiguity through existing solutions. It is based on java.time.ZonedDateTime, but ensures the original time zone data is retained when persisted.
#
Non-specific date/time valuesWORK IN PROGRESS
For some data (e.g. ecological), accomodating broad, "non-instant" date/time value is a desirable option. For example, a user may have a date without time, such as "2012-05-01" or even as vague as "March 2018". The EDM attempts to handle these values.
- document internal storage
- how values are represented in json for api
- document conversion to approximate instant (e.g. for sake of sorting etc)
- dealing with ranges (start/end instants)
#
Random useful(?) linksWhy does datetime.datetime.utcnow() not contain timezone information?
ugly world of PostgreSQL and timestamps with/without time zones: 1, 2, etc...
DataNucleus notes on timezone woes and multi-column implementations