taxjar / date_time_parser Goto Github PK
View Code? Open in Web Editor NEWParse strings into DateTime, NaiveDateTime, Date, or Time https://hexdocs.pm/date_time_parser
License: MIT License
Parse strings into DateTime, NaiveDateTime, Date, or Time https://hexdocs.pm/date_time_parser
License: MIT License
Serial timestamps have another less-common convention where the timestamps start in year 1904.
There's a scenario where a user may not want to enable "epoch" or "serial" time parsing. That way, strings like "31" aren't accepted.
Perhaps an interface could look like:
parsers = [DateTimeParser.Serial, DateTimeParser.Epoch, DateTimeParser.Default]
DateTimeParser.parse("42", parsers: [DateTimeParser.Default])
# => {:error, "Could not parse 42"}
(Using date_time_parser
1.1.1)
Noticed this while parsing some date+time strings received from a real service. In particular the 12:00am case is not handled the way I would expect.
iex(1)> date = "2021-03-27"
"2021-03-27"
iex(2)> DateTimeParser.parse!("#{date} 12:00 am")
~N[2021-03-27 12:00:00]
iex(3)> DateTimeParser.parse!("#{date} 12:00 pm")
~N[2021-03-27 12:00:00]
iex(4)> DateTimeParser.parse!("#{date} 00:00 am")
~N[2021-03-27 00:00:00]
Also noticed that "00:00 pm" is (incorrectly?) parsed as valid:
iex(5)> DateTimeParser.parse!("#{date} 00:00 pm")
~N[2021-03-27 12:00:00]
Doing some more experimentation I also found that some other invalid times are parsed as valid:
iex(7)> DateTimeParser.parse!("#{date} 25:00 am")
~N[2021-03-27 02:05:00]
iex(8)> DateTimeParser.parse!("#{date} 41:00 am")
~N[2021-03-27 04:01:00]
iex(9)> DateTimeParser.parse!("#{date} 9000:00 am")
~N[2021-03-27 09:00:00]
A couple updates on this library:
Since Timezone support is better supported in Elixir's stdlib, we might try using it instead of using Timex.
Doing this however may require us to write a IANA data file parser so we can perform lookups on our own of all the different combinations of names and abbreviations. The current Calendar.TimeZoneDatabase interface doesn't have this available; and essentially would be re-writing some of what Timex already offers.
However, at the same time, this will allow us to use these during nimble parsec compilation and expand support for all timezone names and abbreviations.
iex(1)> DateTimeParser.parse("2010/01/01")
{:ok, ~D[0201-10-20]}
should be ~D[2010-01-01]
but looks like it's fooled by the slash.
We added GitHub actions in #17, but the starter template is only running one version of Elixir (currently 1.9.1). It'd be really nice to add matrix testing across Elixir versions that we claim to support (back until 1.3.0).
Also, there's a number of things we can cache:
mix tests
alias will run dialyzer.There's a GitHub action for caching and they have an example for Elixir:
https://github.com/actions/cache/blob/master/examples.md#elixir---mix
Currently this only supports US timezones. It would be great to parse more timezones.
https://en.wikipedia.org/wiki/List_of_time_zone_abbreviations
Hey, thanks for this library! One issue we're running into is for dates without the year, we expect them to be parsed as month/day, but they're being parsed as the reverse:
DateTimeParser.parse_date("2/5", parsers: [DateTimeParser.Parser.DateUS], assume_date: true)
{:ok, ~D[2021-05-02]}
I would expect this to return 2021-02-05
, which it does if we add the year. It looks like the order of the combinators might just be flipped at the bottom here:
date_time_parser/lib/combinators/date.ex
Lines 235 to 244 in 1fb7590
What do you think?
From Ruby:
Time.at(-1.0001).utc.iso8601(10)
=> "1969-12-31T23:59:58.9999000000Z"
Time.at(-1).utc.iso8601(10)
=> "1969-12-31T23:59:59.0000000000Z"
Related #42
Examples:
iex> date = "2021-03-27"
"2021-03-27"
iex> DateTimeParser.parse!("#{date} 25:00 am")
~N[2021-03-27 02:05:00]
iex> DateTimeParser.parse!("#{date} 41:00 am")
~N[2021-03-27 04:01:00]
iex> DateTimeParser.parse!("#{date} 9000:00 am")
~N[2021-03-27 09:00:00]
Looks like we need to improve time detection and tokenization.
The Elixir core team and friends have discussed their testing strategy for major libraries they support, and suggested that testing on
should be sufficient. Testing all the combinations in between shouldn't be necessary.
Currently our GitHub Actions are testing on the combinations, so we can safely remove a bunch of cases.
This might be out of scope, but it would be nice to predictably parse strings like Jan 2020
, October 1995
, May 1442
etc. What happens today is that all of these get interpreted as e.g. October 19 95
, so basically %b %d%y
instead of %b %Y
.
iex(1)> DateTimeParser.parse("Jan 2020")
{:ok, ~D[2020-01-20]}
iex(2)> DateTimeParser.parse("October 1995")
{:ok, ~D[1995-10-19]}
iex(3)> DateTimeParser.parse("May 1442")
{:ok, ~D[2042-05-14]}
I'd expect DateTimeParser.parse("Jan 2020")
to produce ~D[2020-01-01]}
, i.e. always default to the first day of the month.
Looks like I wrote the tests that are deterministic on the current timezone and asserting fields that will change. I don't think the code is wrong; I think the tests are wrong.
When working on #10, I believe I ran into some unexpected behavior with epoch subseconds.
Currently, the epoch's subsecond does not track the number of leading 0
s. This works for tests like:
test_datetime_parsing(
"9999999999.999999",
DateTime.from_naive!(~N[2286-11-20T17:46:39.999999], "Etc/UTC")
)
test_datetime_parsing(
"9999999999.9999999999",
DateTime.from_naive!(~N[2286-11-20T17:46:39.999999], "Etc/UTC")
)
because they do not have leading 0
s.
However, for a test with leading 0
s after the .
we get unexpected behavior:
test_datetime_parsing(
"0000000000.0000000001",
DateTime.from_naive!(~N[1970-01-01T00:00:00.000001], "Etc/UTC")
)
1) test parse_datetime/1 - epoch parses datetime 0000000000.000001 (DateTimeParserTest)
test/date_time_parser_test.exs:168
Assertion with == failed
code: assert datetime == expected
left: ~U[1970-01-01 00:00:00.1Z]
right: ~U[1970-01-01 00:00:00.000001Z]
stacktrace:
test/date_time_parser_test.exs:168: (test)
Rather than the subsecond time be 000001
we get 1
.
I believe this is happening because NimbleParsec.integer/2
(which is how subseconds are parsed) is parsing integers and in integers leading 0
s are unimportant.
When running the sample test above (where the input is "0000000000.0000000001"
) I receiving the following tokens:
Tokens: [unix_epoch: 0, unix_epoch_subsecond: 1]
I think we should expect the leading 0
s to be preserved.
I know you check against what Ruby does, and here are some Ruby examples:
# ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-darwin18]
# Rails 6.0.0
Time.at(1.0000000001).utc.iso8601(10)
=> "1970-01-01T00:00:01.0000000001Z"
Time.at(1.0000100001).utc.iso8601(10)
=> "1970-01-01T00:00:01.0000100001Z"
It looks like currently subsecond support only does up to 6 digits. For a case like 1.00000001
I think this would mean dropping the 1
, but I'm not sure if it would ever make sense to round (e.g., 1.0000009
to 00:00:01.000001
).
Ruby does things that are unexpected to me, but I'm not sure if they are done for a reason.
Some examples:
# why does it use multiple 9s instead of the 1?
Time.at(1.000001).utc.iso8601(10)
=> "1970-01-01T00:00:01.0000009999Z"
# with is it using 3 instead of 4
Time.at(1.000004).utc.iso8601(6)
=> "1970-01-01T00:00:01.000003Z"
# using more digits seems to show more 9s after the 3
Time.at(1.000004).utc.iso8601(10)
=> "1970-01-01T00:00:01.0000039999Z"
Some potential ideas of how to solve this:
DateTimeParser.Combinators.Epoch.unix_epoch_subsecond/0
and handle converting it to a float in DateTimeParser.Epoch
. I think converting could work like:> subsecond = "000001"
"000001"
> {f, _} = Float.parse("0." <> subsecond)
{1.0e-6, ""}
> 1.0 - f
0.999999
Hi, we're looking for a library to parse dates. date_time_parser
is working pretty well for American dates, but we also have data from European languages (e.g. German, French). An example would be 04 août 2020
.
A similar library in Python is able to support languages other than US English, however this library doesn't seem to have such capabilities. Is this something you might consider adding? Or maybe this topic could be open for a potential PR (we might not have time to submit it soon though)? Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.