Giter Club home page Giter Club logo

date_time_parser's People

Contributors

dbern-stripe avatar dbernheisel avatar fcapovilla avatar tmr08c avatar valpackett avatar wojtekmach avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

date_time_parser's Issues

Add option to turn off parsers

There's a scenario where a user may not want to enable "epoch" or "serial" time parsing. That way, strings like "31" aren't accepted.

Perhaps an interface could look like:

parsers = [DateTimeParser.Serial, DateTimeParser.Epoch, DateTimeParser.Default]
DateTimeParser.parse("42", parsers: [DateTimeParser.Default])
# => {:error, "Could not parse 42"}

Incorrect handling of midnight when specified as "12:00 am", and of invalid times

(Using date_time_parser 1.1.1)

Noticed this while parsing some date+time strings received from a real service. In particular the 12:00am case is not handled the way I would expect.

iex(1)> date = "2021-03-27"
"2021-03-27"
iex(2)> DateTimeParser.parse!("#{date} 12:00 am")
~N[2021-03-27 12:00:00]
iex(3)> DateTimeParser.parse!("#{date} 12:00 pm")
~N[2021-03-27 12:00:00]
iex(4)> DateTimeParser.parse!("#{date} 00:00 am")
~N[2021-03-27 00:00:00]

Also noticed that "00:00 pm" is (incorrectly?) parsed as valid:

iex(5)> DateTimeParser.parse!("#{date} 00:00 pm")
~N[2021-03-27 12:00:00]

Doing some more experimentation I also found that some other invalid times are parsed as valid:

iex(7)> DateTimeParser.parse!("#{date} 25:00 am")
~N[2021-03-27 02:05:00]
iex(8)> DateTimeParser.parse!("#{date} 41:00 am")
~N[2021-03-27 04:01:00]
iex(9)> DateTimeParser.parse!("#{date} 9000:00 am")
~N[2021-03-27 09:00:00]

Remove usage of Timex

Since Timezone support is better supported in Elixir's stdlib, we might try using it instead of using Timex.

Doing this however may require us to write a IANA data file parser so we can perform lookups on our own of all the different combinations of names and abbreviations. The current Calendar.TimeZoneDatabase interface doesn't have this available; and essentially would be re-writing some of what Timex already offers.

However, at the same time, this will allow us to use these during nimble parsec compilation and expand support for all timezone names and abbreviations.

"2010/01/01" parsed incorrectly

iex(1)> DateTimeParser.parse("2010/01/01")
{:ok, ~D[0201-10-20]}

should be ~D[2010-01-01] but looks like it's fooled by the slash.

Improve CI: matrix testing and caching

We added GitHub actions in #17, but the starter template is only running one version of Elixir (currently 1.9.1). It'd be really nice to add matrix testing across Elixir versions that we claim to support (back until 1.3.0).

Also, there's a number of things we can cache:

  • Downloading Hex
  • Mix deps depending on the mix.lock hash
  • PLT tables for dialyzer
  • Dialyzer can only run on Elixir 1.6+, so matrix testing will have to account for that. Currently the mix tests alias will run dialyzer.

There's a GitHub action for caching and they have an example for Elixir:
https://github.com/actions/cache/blob/master/examples.md#elixir---mix

Incorrect handling of US month/day

Hey, thanks for this library! One issue we're running into is for dates without the year, we expect them to be parsed as month/day, but they're being parsed as the reverse:

DateTimeParser.parse_date("2/5", parsers: [DateTimeParser.Parser.DateUS], assume_date: true)
{:ok, ~D[2021-05-02]}

I would expect this to return 2021-02-05, which it does if we add the year. It looks like the order of the combinators might just be flipped at the bottom here:

def us_date do
choice([
day_long_month_year(),
year4_month_day(),
month_day_year(),
day_month_year(),
year_month_day(),
day_month(),
month_day()
])

What do you think?

Add negative Epoch support

From Ruby:

Time.at(-1.0001).utc.iso8601(10)
=> "1969-12-31T23:59:58.9999000000Z"
Time.at(-1).utc.iso8601(10)
=> "1969-12-31T23:59:59.0000000000Z"

invalid strings parsed as times.

Related #42

Examples:

iex> date = "2021-03-27"
"2021-03-27"
iex> DateTimeParser.parse!("#{date} 25:00 am")
~N[2021-03-27 02:05:00]
iex> DateTimeParser.parse!("#{date} 41:00 am")
~N[2021-03-27 04:01:00]
iex> DateTimeParser.parse!("#{date} 9000:00 am")
~N[2021-03-27 09:00:00]

Looks like we need to improve time detection and tokenization.

Support for month-year formats

This might be out of scope, but it would be nice to predictably parse strings like Jan 2020, October 1995, May 1442 etc. What happens today is that all of these get interpreted as e.g. October 19 95, so basically %b %d%y instead of %b %Y.

iex(1)> DateTimeParser.parse("Jan 2020")
{:ok, ~D[2020-01-20]}
iex(2)> DateTimeParser.parse("October 1995")
{:ok, ~D[1995-10-19]}
iex(3)> DateTimeParser.parse("May 1442")
{:ok, ~D[2042-05-14]}

I'd expect DateTimeParser.parse("Jan 2020") to produce ~D[2020-01-01]}, i.e. always default to the first day of the month.

Fix failing tests

Looks like I wrote the tests that are deterministic on the current timezone and asserting fields that will change. I don't think the code is wrong; I think the tests are wrong.

Epoch subseconds should retain number of digits after the decimal point

When working on #10, I believe I ran into some unexpected behavior with epoch subseconds.

Current Behavior

Currently, the epoch's subsecond does not track the number of leading 0s. This works for tests like:

test_datetime_parsing(
  "9999999999.999999",
   DateTime.from_naive!(~N[2286-11-20T17:46:39.999999], "Etc/UTC")
)

test_datetime_parsing(
  "9999999999.9999999999",
  DateTime.from_naive!(~N[2286-11-20T17:46:39.999999], "Etc/UTC")
)

because they do not have leading 0s.

However, for a test with leading 0s after the . we get unexpected behavior:

test_datetime_parsing(
   "0000000000.0000000001",
   DateTime.from_naive!(~N[1970-01-01T00:00:00.000001], "Etc/UTC")
)


  1) test parse_datetime/1 - epoch parses datetime 0000000000.000001 (DateTimeParserTest)
     test/date_time_parser_test.exs:168
     Assertion with == failed
     code:  assert datetime == expected
     left:  ~U[1970-01-01 00:00:00.1Z]
     right: ~U[1970-01-01 00:00:00.000001Z]
     stacktrace:
       test/date_time_parser_test.exs:168: (test)

Rather than the subsecond time be 000001 we get 1.

Why I think this is happening

I believe this is happening because NimbleParsec.integer/2 (which is how subseconds are parsed) is parsing integers and in integers leading 0s are unimportant.

When running the sample test above (where the input is "0000000000.0000000001") I receiving the following tokens:

Tokens: [unix_epoch: 0, unix_epoch_subsecond: 1]

Expected Behavior

I think we should expect the leading 0s to be preserved.

What Ruby does

I know you check against what Ruby does, and here are some Ruby examples:

# ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-darwin18]
# Rails 6.0.0
Time.at(1.0000000001).utc.iso8601(10)
=> "1970-01-01T00:00:01.0000000001Z"

Time.at(1.0000100001).utc.iso8601(10)
=> "1970-01-01T00:00:01.0000100001Z"

Caveats/Unknown

6 digits

It looks like currently subsecond support only does up to 6 digits. For a case like 1.00000001 I think this would mean dropping the 1, but I'm not sure if it would ever make sense to round (e.g., 1.0000009 to 00:00:01.000001).

Ruby things

Ruby does things that are unexpected to me, but I'm not sure if they are done for a reason.

Some examples:

# why does it use multiple 9s instead of the 1?
Time.at(1.000001).utc.iso8601(10)
=> "1970-01-01T00:00:01.0000009999Z"

# with is it using 3 instead of 4 
Time.at(1.000004).utc.iso8601(6)
=> "1970-01-01T00:00:01.000003Z"
# using more digits seems to show more 9s after the 3
Time.at(1.000004).utc.iso8601(10)
=> "1970-01-01T00:00:01.0000039999Z"

Fixing

Some potential ideas of how to solve this:

  • Parse the subsecond as a string in DateTimeParser.Combinators.Epoch.unix_epoch_subsecond/0 and handle converting it to a float in DateTimeParser.Epoch. I think converting could work like:
> subsecond = "000001"
"000001"
> {f, _} = Float.parse("0." <> subsecond)
{1.0e-6, ""}
> 1.0 - f
0.999999
  • Moving to Regular Expressions. This came up when @dbernheisel was helping me work on #10. I think this would result in something similar to the above solution, but more changes, so it may be preferrable to avoid.

Support for languages other than US English

Hi, we're looking for a library to parse dates. date_time_parser is working pretty well for American dates, but we also have data from European languages (e.g. German, French). An example would be 04 août 2020.

A similar library in Python is able to support languages other than US English, however this library doesn't seem to have such capabilities. Is this something you might consider adding? Or maybe this topic could be open for a potential PR (we might not have time to submit it soon though)? Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.