wrandelshofer / fastdoubleparser Goto Github PK
View Code? Open in Web Editor NEWA Java port of Daniel Lemire's fast_float project
License: MIT License
A Java port of Daniel Lemire's fast_float project
License: MIT License
Thanks for all the hard work on the double and float parsers. Would there be any chance that you could consider adding support for BigDecimal parsing? A lot of the low level parser could be reused.
There are multiple module-info classes in the v0.9.0 jar. In v0.8.0, there was just the versions/9/module-info.class.
In v0.9.0, there are module-info.class fils in all the versions dirs.
This is causing FasterXML/jackson-core#1027
Would it be possible to get some background on the v0.9.0 changes, so that I can work out what to do with the jackson-core issues?
Double.parseDouble
and FastDoubleParser.parseDouble
return different results for the string "0e555"
:
Double.parseDouble("0e555"): 0.0
FastDoubleParser.parseDouble("0e555"): Infinity
Edit: I believe that is caused by the special case at
, which does not handle the even more special case of the mantissa being 0.Parsing hexadecimal float literals like 0x8000000000000000p0 yields an incorrect result.
See merge request #62
The parser throws StringIndexOutOfBoundsException/ArrayIndexOutOfBoundsException for some inputs.
For example with the following input: "0x".
This issue has been discovered in FasterXML/jackson-core#809
The only exceptions, that the parser may throw are:
@wrandelshofer I'm using v0.5.2 and have found that `JavaBigIntegerParser,parseBigInteger(CharSequence str)` accepts hex values like "AAAA" but `new BigInteger(String)` throws a NumberFormatException with "AAAA".
Would it be possible to support being able to disable hex support?
Originally posted by @pjfanning in #24 (comment)
FastDoubleParser accepts illegal inputs "." and ".e2".
Double.parseDouble() does not accept these values.
Hi,
I truly think that https://arxiv.org/abs/2101.11408 is a breakthrough in computer science and that the world would benefit from such parser to be used by default in openjdk (as it is for Go).
(I wonder if an even faster parser couldn't be achieved using jsoniter-scala optimization techniques in addition to Lemire's FasterXML/jackson-core#577 ).
For https://github.com/fastfloat/fast_float, we have extensive tests. I have run them through on FastDoubleParser and found many failures, I have collected them in this gist...
https://gist.github.com/lemire/641a34589c36747f6d24ed6d29ac75f0
The algorithm at https://github.com/fastfloat/fast_float handles all of these cases correctly.
You may refer to https://arxiv.org/abs/2101.11408 or to the C# port at https://github.com/CarlVerret/csFastFloat
Is it possible to publish the 0.5.2 release to maven central?
Thanks for the great project!
There is a bug in the method 'tryHexToFloatWithFastAlgorithm'.
I accidentally removed the if-statements that check whether the fast algorithm succeeded, and didn't notice it because I had the corresponding unit tests commented out.
The FastDoubleParser was recently introduced in Jackson through this issue FasterXML/jackson-core#577 is 3-4x times faster compared to the version that's implemented in OpenJDK. This is fantastic news, since many numerical processing workloads would benefit from this.
However the OpenJDK Double/Float parsers support variety of input formats that the FastDoubleParser will fail on, therefore it can cause unexpected regressions when used.
For example, the FastDoubleParser will fail with a NumberFormatException on these example patterns (there are more to be found in the OpenJDK Double/Float tests):
1.1e-23f
0x.003p12f
0x1.17742db862a4P-1d
I think apart from the first one in this list, the rest are all hexadecimal if I'm not mistaken.
https://github.com/fastfloat/fast_float
is it on your roadmap ? :)
The same trick as in a3c6df6.
Use 0x76
instead of 0x46
byte for detection of invalid digits in "numbers" like 1X345678
.
I have found another input string for which the return values of Double.parseDouble
and FastDoubleParser.parseDouble
differ. This one is less important than #6 though as it implies only a very minor loss in precision:
Double.parseDouble("-2.2222222222223e-322"): -2.2E-322
FastDoubleParser.parseDouble("-2.2222222222223e-322"): 0.0
Both this issue and #6 have been found with the open-source JVM fuzzer Jazzer. If you are interested in these kinds of findings, I could add the fuzzer to the project as a PR.
Could you help release the jar with JDK 8 compatible to make it available for the broader use cases?
See description of pull request #48.
I'm upgrading jackson in Apache JMeter, and I found the new jackson version depends on fastdoubleparser.
It turns out fastdoubleparser does not ship with the license, so it is problematic for the consumers.
See apache/jmeter#5831, and the build failure: https://github.com/apache/jmeter/actions/runs/4823397202/jobs/8592678119?pr=5831#step:4:1857
I have created a lot of similar requests, and almost all of them got fixed eventually, see Dependency with "manual" license configuration
in apache/jmeter#469
Copyright (c) 2021 Werner Randelshofer, Switzerland
is a part of the license, and the license text requires that The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software
It is hard for consumers to comply with the requirement above, especially if fastdoubleparser.jar does not include the license text.
The pom file for fastdoubleparser refers to a different license. See https://repo1.maven.org/maven2/ch/randelshofer/fastdoubleparser/0.8.0/fastdoubleparser-0.8.0.pom
The URL there is http://www.opensource.org/licenses/mit-license.php, which does not mention Werner Randelshofer.
fastdoubleparser.jar
misses reference to the license. There are cases when fastdoubleparser.jar
appears without the corresponding pom.xml
, so if you consider fastdoubleparser.jar
alone, it is hard to tell what is the license for that artifact.
You might want to consider switching to Apache-2.0 license. It has several advantages for the consumers:
NOTICE
file. In general, it becomes easier to review, since every MIT license is different while every Apache-2.0 is the same.Grant of Patent License
while MIT does not mention patentspom.xml
and MANIFEST.MF
If you absolutely like MIT, you might go with MIT or Apache-2.0
, however, I'm not sure if you want that complication (as it would be impossible to express in pom.xml
)
META-INF/LICENSE
, META-INF/NOTICE
, etc. It would enable consumers to get up-to-date licenses when they depend on fastdoubleparser.pom.xml
to point to the proper license text (e.g. a permalink to GitHub). The current link http://www.opensource.org/licenses/mit-license.php
is invalid as it points to a wrong license text.Bundle-License: Apache-2.0
(or Bundle-License: MIT; link=...
) manifest entry (where Apache-2.0
is SPDX identifier, see https://osgi.org/specification/osgi.core/7.0.0/framework.module.html#framework.module-bundle-license )๐
Can you document how one would run the benchmarks and how one would use the code as an external library?
In this section of FastFloatMath, it checks the significand against a 53-bit number (as if it were testing to see if it is an exactly representable double), but then casts to float, despite the comments repeatedly referring to the code as using doubles. I think the cast to float should probably be a cast to double (and d
should be a double), but I'm not familiar with the code.
"We use your java8 code in jackson-core. If you publish a jar with your java8 branch code that would be great - we would change our build to use your published jars and that shades the class packages to include them in jackson-core jar.
One solution would be to append '-java8' to the artifact name (and '-java17' for the java17 jar). Or maven supports 'classifiers' which basically lead to a similar result."
Originally posted by @pjfanning in #22 (comment)
It appears that wrandelshofer/FastDoubleParser might be tied to https://github.com/lemire/fast_double_parser which is based on RFC 7159 (JSON standard). This means that strings such as 9007199254740992.e-256 which are not valid in JSON will not parse.
I really recommend you follow more closely the approach in https://github.com/fastfloat/fast_float if you mean to solve the general float parsing problem.
The test can be done with javac
and java
directly, but it does NOT work as expected with maven
.
After mvn clean package
, the command below raises an error "Error: Could not find or load main class ch.randelshofer.fastdoubleparserdemo.Main in module ch.randelshofer.fastdoubleparserdemo":
java -XX:CompileCommand=inline,java/lang/String.charAt -p fastdoubleparser/target:fastdoubleparserdemo/target -m ch.randelshofer.fastdoubleparserdemo/ch.randelshofer.fastdoubleparserdemo.Main --markdown
I checked the jar
s inside fastdoubleparser/target
and fastdoubleparserdemo/target
, and found that they contains nothing but a META-INF
folder!
jar xvf fastdoubleparser-0.7.0.jar
created: META-INF/
inflated: META-INF/MANIFEST.MF
So, the maven command cannot produce correct jar, and I think it is caused by incorrect Maven project structures and POM configurations. BTW, I think the current multi-release jar here is a little overkill.
See description in FasterXML/jackson-core#1161
See proposed fix in FasterXML/jackson-core#1162
Unfortunately the fix is incomplete. We need to replace all calls to parseDigitsRecursive() with last argument null
by parseDigitsIterative().
JavaDoubleParser seems to be slower than Double.parseDouble for very large numbers (thousands of digits).
Malicious actors often create input files with large numbers to try to cause denial of service issues.
I have a jmh benchmark at https://github.com/pjfanning/jackson-number-parse-bench
./gradlew jmh
It's worth checking the build.gradle file as I have a param that controls which benchmark to run.
jmh {
includes = ['org.example.jackson.bench.DoubleParserBench']
}
I'm wondering if it would be possible to disregard the least significant digits. If there are 1000 digits, only the first 30 or 40 digits should really impact the double value - even if you were conservative and limited it 100 or 200, this would limit the risk vector.
Building a specific revision should be reproducible.
Currently, the multi-release jar created by the build contains the timestamps of the compiled class files. And therefore each time the multi-release jar is built, it has different content.
See
https://maven.apache.org/guides/mini/guide-reproducible-builds.html
Jackson still supports Java 8 but fastdoubleparser has at least some classes that have class file major version 66 - might be java 22
Jackson built fine with fastdoubleparser 0.9.0.
This could be a shortcoming of maven plugins - that don't know about Java 22. In fairness, Java 22 is only early access and many build tools really struggle to keep up.
Error: Failed to execute goal org.apache.maven.plugins:maven-shade-plugin:3.5.1:shade (shade-jackson-core) on project jackson-core: Error creating shaded jar: Problem shading JAR /home/runner/.m2/repository/ch/randelshofer/fastdoubleparser/1.0.0/fastdoubleparser-1.0.0.jar entry META-INF/versions/22/ch/randelshofer/fastdoubleparser/FastDoubleSwar.class: java.lang.IllegalArgumentException: Unsupported class file major version 66
Edit: This seems to be a shortcoming of maven-shade-plugin but I think I have managed to work around it by excluding the java 22 classes that are in META-INF/versions/22/ch/randelshofer/fastdoubleparser
Looks like artifacts are being signed with this key:
https://keyserver.ubuntu.com/pks/lookup?search=6ead752b3e2b38e8e2236d7ba9321edaa5cb3202&fingerprint=on&op=index
If that is the correct key can you add a section to the readme confirming that is the key that is expected to be used for code signing on the artifacts released from this repo? Thanks. :)
See examples of other libs that provide docs for the code signing key used are here:
why allocate here?? by that time you know it's not a NaN for sure... so instead of returning null
you can just return Double.NaN or whatever special constant.
lack of jmh tests is also troubling :(
Hi - thanks for all the great work on the double parser. I've been experimenting with it for possible inclusion in jackson-core.
Parsing floats using the double parser is also much faster than using Float.parseFloat but unfortunately casting doubles to floats can often give you different result from plain Float.parseFloat.
Would it be possible to consider also supporting a dedicated float parser?
An example is 7.006492321624086e-46 which Float.parseFloat returns as 1.4E-45 but using FastDoubleParser:
double dbl = FastDoubleParser.parseDouble("7.006492321624086e-46");
System.out.println("double=" + dbl); //7.006492321624085E-46
System.out.println("float=" + (float)dbl); //0.0
JDK 21 now includes a faster conversion routine from BigDecimal to double.
We can now implement a performant slow path for double values with very few lines of code.
openjdk/jdk#9410
https://bugs.openjdk.org/browse/JDK-8205592
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.