Giter Club home page Giter Club logo

date_time_parser's Introduction

DateTimeParser

Hex.pm Version Hex docs License: MIT Contributor Covenant

DateTimeParser is a tokenizer for strings that attempts to parse into a DateTime, NaiveDateTime if timezone is not determined, Date, or Time.

You're currently looking at the main branch. Check out the docs for the latest published version.

Documentation

See examples automatically generated by the tests

The biggest ambiguity between datetime formats is whether it's ymd (year month day), mdy (month day year), or dmy (day month year); this is resolved by checking if there are slashes or dashes. If slashes, then it will try dmy first. All other cases will use the international format ymd. Sometimes, if the conditions are right, it can even parse dmy with dashes if the month is a vocal month (eg, "Jan").

If the string consists of only numbers, then we will try two other parsers depending on the number of digits: Epoch or Serial. Otherwise, we'll try the tokenizer.

If the string is 10-11 digits with optional precision, then we'll try to parse it as a Unix Epoch timestamp.

If the string is 1-5 digits with optional precision, then we'll try to parse it as a Serial timestamp (spreadsheet time) treating 1899-12-31 as 1. This will cause Excel-produced dates from 1900-01-01 until 1900-03-01 to be incorrect, as they really are.

digits parser range notes
1-5 Serial low = 1900-01-01, high = 2173-10-15. Negative numbers go to 1626-03-17 Floats indicate time. Integers do not.
6-9 Tokenizer any This allows for "20190429" to be parsed as 2019-04-29
10-11 Epoch low = -1100-02-15 14:13:21, high = 5138-11-16 09:46:39 If padded with 0s, then it can capture entire range.
else Tokenizer any

Required reading

Examples

iex> DateTimeParser.parse("19 September 2018 08:15:22 AM")
{:ok, ~N[2018-09-19 08:15:22]}

iex> DateTimeParser.parse_datetime("19 September 2018 08:15:22 AM")
{:ok, ~N[2018-09-19 08:15:22]}

iex> DateTimeParser.parse_datetime("2034-01-13", assume_time: true)
{:ok, ~N[2034-01-13 00:00:00]}

iex> DateTimeParser.parse_datetime("2034-01-13", assume_time: ~T[06:00:00])
{:ok, ~N[2034-01-13 06:00:00]}

iex> DateTimeParser.parse("invalid date 10:30pm")
{:ok, ~T[22:30:00]}

iex> DateTimeParser.parse("2019-03-11T99:99:99")
{:ok, ~D[2019-03-11]}

iex> DateTimeParser.parse("2019-03-11T10:30:00pm UNK")
{:ok, ~N[2019-03-11T22:30:00]}

iex> DateTimeParser.parse("2019-03-11T22:30:00.234+00:00")
{:ok, DateTime.from_naive!(~N[2019-03-11T22:30:00.234Z], "Etc/UTC")}
# `~U[2019-03-11T22:30:00.234Z]` in Elixir 1.9+

iex> DateTimeParser.parse_date("2034-01-13")
{:ok, ~D[2034-01-13]}

iex> DateTimeParser.parse_date("01/01/2017")
{:ok, ~D[2017-01-01]}

iex> DateTimeParser.parse_datetime("1564154204")
{:ok, DateTime.from_naive!(~N[2019-07-26T15:16:44Z], "Etc/UTC")}
# `~U[2019-07-26T15:16:44Z]` in Elixir 1.9+

iex> DateTimeParser.parse_datetime("41261.6013888889")
{:ok, ~N[2012-12-18T14:26:00]}

iex> DateTimeParser.parse_date("44262")
{:ok, ~D[2021-03-07]}
# This is a serial number date, commonly found in spreadsheets, eg: `=VALUE("03/07/2021")`

iex> DateTimeParser.parse_datetime("1/1/18 3:24 PM")
{:ok, ~N[2018-01-01T15:24:00]}

iex> DateTimeParser.parse_datetime("1/1/18 3:24 PM", assume_utc: true)
{:ok, DateTime.from_naive!(~N[2018-01-01T15:24:00Z], "Etc/UTC")}
# `~U[2018-01-01T15:24:00Z]` in Elixir 1.9+

iex> DateTimeParser.parse_datetime(~s|"Mar 28, 2018 7:39:53 AM PDT"|, to_utc: true)
{:ok, DateTime.from_naive!(~N[2018-03-28T14:39:53Z], "Etc/UTC")}

iex> {:ok, datetime} = DateTimeParser.parse_datetime(~s|"Mar 1, 2018 7:39:53 AM PST"|)
iex> datetime
#DateTime<2018-03-01 07:39:53-08:00 PST America/Los_Angeles>

iex> DateTimeParser.parse_datetime(~s|"Mar 1, 2018 7:39:53 AM PST"|, to_utc: true)
{:ok, DateTime.from_naive!(~N[2018-03-01T15:39:53Z], "Etc/UTC")}

iex> {:ok, datetime} = DateTimeParser.parse_datetime(~s|"Mar 28, 2018 7:39:53 AM PDT"|)
iex> datetime
#DateTime<2018-03-28 07:39:53-07:00 PDT America/Los_Angeles>

iex> DateTimeParser.parse_time("10:13pm")
{:ok, ~T[22:13:00]}

iex> DateTimeParser.parse_time("10:13:34")
{:ok, ~T[10:13:34]}

iex> DateTimeParser.parse_time("18:14:21.2.0851000000Z")
{:ok, ~T[18:14:21.2.0851]}

iex> DateTimeParser.parse_datetime(nil)
{:error, "Could not parse nil"}

Installation

Add date_time_parser to your list of dependencies in mix.exs:

def deps do
  [
    {:date_time_parser, "~> 1.2.0"}
  ]
end

Configuration

You must have a timezone database configured if you want parsing to consider timezones. See tz or tzdata.

# This is the default config
alias DateTimeParser.Parser
config :date_time_parser, parsers: [Parser.Epoch, Parser.Serial, Parser.Tokenizer]

# To enable only specific parsers, include them in the :parsers key.
config :date_time_parser, parsers: [Parser.Tokenizer]

# To consider more timezones from the past at a performance cost:
config :date_time_parser, include_zones_from: ~N[1900-01-01T00:00:00]
# default is 2020-01-01T00:00:00

# Adding the timezone database from Tz
config :elixir, :time_zone_database, Tz.TimeZoneDatabase

# Or in runtime, pass in the parsers in the function.
DateTimeParser.parse(mystring, parsers: [Parser.Tokenizer])

Write your own parser

You can write your own parser!

If the built-in parsers are not applicable for your use-case, you may build your own parser to use with this library. Let's write a simple one together.

First I will check DateTimeParser.Parser to see what behaviour my new parser should implement. It needs two functions:

  1. c:DateTimeParser.Parser.preflight/1
  2. c:DateTimeParser.Parser.parse/1

These functions accept the t:DateTimeParser.Parser.t/0 struct which contains the options supplied by the user, the string itself, and the context for which you should return your result. For example, if the context is :time then you should return a %Time{}; if :datetime you should return either a %NaiveDateTime{} or a %DateTime{}; if :date then you should return a %Date{}.

Let's implement a parser that reads a special time string. Our string will represent time, but all the digits are shifted up by 10 and must be prefixed with the secret word: "boomshakalaka:". For example, the real world time of 01:10 is represented as boomshakalaka:11:20 in our toy time format. 12:30 is represented as boomshakalaka:22:40, and 5:55 is represented as boomshakalaka:15:65.

defmodule MyParser do
  @behaviour DateTimeParser.Parser
  @secret_regex ~r|boomshakalaka:(?<time>\d{2}:\d{2})|

  def preflight(%{string: string} = parser) do
    case Regex.named_captures(@secret_regex, string) do
      %{"time" => time} ->
        {:ok, %{parser | preflight: time}}

      nil ->
        {:error, :not_compatible}
    end
  end

  # ... more below
end

We'll stop here first and go through the preflight function. Our special parser will only be attempted if the supplied string has any named captures from the regex. That is, it must begin with bookshakalaka: followed by 2 digits, a colon, and 2 more digits. These digits are extracted out like 00:00 where 0 is any digit. If 05:40 is passed in, it would not be compatible so the parser will be skipped.

Now let's parse the time:

def parse(%{preflight: time} = parser) do
  [hour, minute] = String.split(time, ":")
  {hour, ""} = Integer.parse(hour)
  {minute, ""} = Integer.parse(minute)
  result = Time.new(hour - 10, minute - 10, 0, {0, 0})
  for_context(parser.context, result)
end

defp for_context(:datetime, _result), do: :error
defp for_context(:date, _result), do: :error
defp for_context(:time, result), do: result

Notice that we need to consider context of the result. If the user asked for a DateTime, then we need to give them one. In our toy format, it only represents time, so therefore we must return an error when the context is a :datetime or :date.

DateTimeParser.parse_time("boomshakalaka:11:11", parsers: [MyParser])
#=> {:ok, ~T[01:01:00]}

DateTimeParser.parse_date("boomshakalaka:11:11", parsers: [MyParser])
#=> {:error, "Could not parse \"boomshakalaka:11:11\""}

DateTimeParser.parse_datetime("boomshakalaka:11:11", parsers: [MyParser])
#=> {:error, "Could not parse \"boomshakalaka:11:11\""}

DateTimeParser.parse("boomshakalaka:11:11", parsers: [MyParser])
#=> {:ok, ~T[01:01:00]}

Why aren't timezones recognized?

You might not have a timezone database configured.

You may configure one by using tz or tzdata. Not only should you install it, but you also must configure Elixir to use it.

For example, in a script:

Mix.install([:date_time_parser, :tz])
# :ok
DateTimeParser.parse("2020-02-02 10:00:00 PST")
# {:ok, ~N[2020-02-02 10:00:00]}
Application.put_env(:elixir, :time_zone_database, Tz.TimeZoneDatabase)
# :ok
DateTimeParser.parse("2020-02-02 10:00:00 PST")
# {:ok, #DateTime<2020-02-02 10:00:00-08:00 PST America/Los_Angeles>}

or in a Mix project:

# in mix.exs
defp deps do
  [
    {:date_time_parser, "1.2"},
    {:tz, "~> 0.24"},
  ]
end

# in config/config.exs
config :elixir, :time_zone_database, Tz.TimeZoneDatabase

# then in code
DateTimeParser.parse("2020-02-02 10:00:00 PST")
#> {:ok, #DateTime<2020-02-02 10:00:00-08:00 PST America/Los_Angeles>}

Should I use this library?

Only as a last resort. Parsing dates from strings is educated guessing at best. Since Elixir natively supports ISO-8601 parsing (see from_iso8601/2 functions), it's highly recommended that you rely on that first and foremost.

When designing your API that involves dates and strings, be specific with your requirements and supported DateTime strings, and preferably only support ISO-8601 with no exceptions. There is no ambiguity with this format so parsing to DateTime (or Date or Time) will always be correct.

This library is helpful when you must accept ambiguous DateTime string formats and having incorrect results is acceptable. Do not use this library when the resulting (and possibly incorrect) DateTime has catastrophic and dangerous effects in your system.

How to store future timestamps

see guide

tldr: rules change, so don't convert to UTC too early. The future might change the timezone conversion rules.

Changelog

View Changelog

Upgrading from 0.x to 1.0

  • If you use parse_datetime/1, then change to parse_datetime/2 with the second argument as a keyword list to assume_time: true and to_utc: true. In 0.x, it would merge ~T[00:00:00] if the time tokens could not be parsed; in 1.x, you have to opt into this behavior. Also in 0.x, a non-UTC timezone would automatically convert to UTC; in 1.x, the original timezone will be kept instead.
  • If you use parse_date/1, then change to parse_date/2 with the second argument as a keyword list to assume_date: true. In 0.x, it would merge Date.utc_today() with the found date tokens; in 1.x, you need to opt into this behavior.
  • If you use parse_time, there is no breaking change but parsing has been improved.
  • Not a breaking change, but 1.x introduces parse/2 that will return the best struct from the tokens. This may influence your usage.

Contributing

How to contribute

Special Thanks

https://www.taxjar.com

date_time_parser's People

Contributors

dbernheisel avatar fcapovilla avatar mwhitworth avatar tmr08c avatar valpackett avatar wojtekmach avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

dkuku mwhitworth

date_time_parser's Issues

assume date with 4 digits as year

I think that the shorter numbers should be parsed as year and can default to 1 January

iex [11:02 :: 2] > DateTimeParser.parse_date("2022")
{:ok, ~D[1905-07-14]}

Imo it's more likely that it's the current year than 1905.

Support for languages other than US English

Hi, we're looking for a library to parse dates. date_time_parser is working pretty well for American dates, but we also have data from European languages (e.g. German, French). An example would be 04 août 2020.

A similar library in Python is able to support languages other than US English, however this library doesn't seem to have such capabilities. Is this something you might consider adding? Or maybe this topic could be open for a potential PR (we might not have time to submit it soon though)? Thanks.

Moved from taxjar/date_time_parser#38

Remove usage of Timex

Since Timezone support is better supported in Elixir's stdlib, we might try using it instead of using Timex.

Doing this however may require us to write a IANA data file parser so we can perform lookups on our own of all the different combinations of names and abbreviations. The current Calendar.TimeZoneDatabase interface doesn't have this available; and essentially would be re-writing some of what Timex already offers.

However, at the same time, this will allow us to use these during nimble parsec compilation and expand support for all timezone names and abbreviations.

original: taxjar/date_time_parser#49

Is the 1.2.0 release planned?

Is the 1.2.0 release planned?

The 1.1.5 release contains a requirement for the max Timex dependency version of 3.7.2, which is pretty old.

image

If there's a clear list of things todo, perhaps you need help to make 1.2.0 possible?

Timex dependency excludes latest versions

The dependency of timex >= 3.2.1 and <= 3.7.2 prevents my app from being able to upgrade timex to the latest version. Could you please update the dependency to allow the latest version of times, 3.7.11?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.