aclex / floaxie Goto Github PK
View Code? Open in Web Editor NEWFloating point printing and parsing library based on Grisu2 and Krosh algorithms
License: Apache License 2.0
Floating point printing and parsing library based on Grisu2 and Krosh algorithms
License: Apache License 2.0
Hi,
I was recently testing your library and noticed that some numbers fail to round-trip when printed using floaxie::ftoa
. As it turned out, the computation of shift_amount
in diy_fp::boundaries
is incorrect.
Actually, shift_amount
currently always equals 1
and the resulting rounding interval is therefore too large if the input is a power-of-2.
The nth_bit
test in
floaxie/include/floaxie/diy_fp.h
Line 445 in 1a4243d
constexpr int Prec
= std::numeric_limits<FloatType>::digits;
constexpr int MinExp
= std::numeric_limits<FloatType>::min_exponent - 1 - (Prec - 1);
constexpr mantissa_storage_type HiddenBit
= static_cast<mantissa_storage_type>(1) << (Prec - 1);
const bool lower_boundary_is_closer
= (mi.m_f == HiddenBit && mi.m_e > MinExp);
const unsigned char shift_amount
= 1 + lower_boundary_is_closer;
though it turns out that
const unsigned char shift_amount = 1 + (mi.m_f == HiddenBit);
is sufficient for the current implementation.
Here are some test cases which currently fail to round-trip.
(The hexadecimal numbers are the IEEE representations of the floating-point numbers.)
Some incorrect single-precision numbers:
9.8607613e-32f == 2^-103 == 0x0C000000
6.3108872e-30f == 2^-97 == 0x0F000000
8.4703295e-22f == 2^-70 == 0x1C800000
8.6736174e-19f == 2^-60 == 0x21800000
7.1054274e-15f == 2^-47 == 0x28000000
7.2057594e+16f == 2^56 == 0x5B800000
3.7778932e+22f == 2^75 == 0x65000000
7.5557864e+22f == 2^76 == 0x65800000
4.8357033e+24f == 2^82 == 0x68800000
7.7371252e+25f == 2^86 == 0x6A800000
1.5474251e+26f == 2^87 == 0x6B000000
3.0948501e+26f == 2^88 == 0x6B800000
6.1897002e+26f == 2^89 == 0x6C000000
1.2379401e+27f == 2^90 == 0x6C800000
2.4758801e+27f == 2^91 == 0x6D000000
4.9517602e+27f == 2^92 == 0x6D800000
9.9035203e+27f == 2^93 == 0x6E000000
1.9807041e+28f == 2^94 == 0x6E800000
3.9614081e+28f == 2^95 == 0x6F000000
7.9228163e+28f == 2^96 == 0x6F800000
Some incorrect double-precision numbers:
1.7800590868057611e-307 == 2^-1019 == 0x0040000000000000
2.0522684006491881e-289 == 2^-959 == 0x0400000000000000
3.9696644133184383e-264 == 2^-875 == 0x0940000000000000
2.9290953396399042e-244 == 2^-809 == 0x0D60000000000000
2.5160737381238802e-234 == 2^-776 == 0x0F70000000000000
5.5329046628180653e-222 == 2^-735 == 0x1200000000000000
4.5965573598916705e-187 == 2^-619 == 0x1940000000000000
2.8451311993408992e-160 == 2^-530 == 0x1ED0000000000000
5.0052077379577523e-147 == 2^-486 == 0x2190000000000000
4.9569176510071274e-119 == 2^-393 == 0x2760000000000000
4.6816763546921983e-97 == 2^-320 == 0x2BF0000000000000
5.0978941156238473e-57 == 2^-187 == 0x3440000000000000
3.2311742677852644e-27 == 2^-88 == 0x3A70000000000000
3.8685626227668134e+25 == 2^85 == 0x4540000000000000
4.9039857307708443e+55 == 2^185 == 0x4B80000000000000
2.6074060497081422e+92 == 2^307 == 0x5320000000000000
4.8098152095208105e+111 == 2^371 == 0x5720000000000000
4.7634102635436893e+139 == 2^464 == 0x5CF0000000000000
4.4989137945431964e+161 == 2^537 == 0x6180000000000000
4.8988833106573424e+201 == 2^670 == 0x69D0000000000000
8.139666055761541e+236 == 2^787 == 0x7120000000000000
1.3207363278391631e+269 == 2^894 == 0x77D0000000000000
Hi
I tried to use your library on MS-Windows in my visual studio project. I copied over the files into my project. I got some compiler warnings - not sure if they are serious or not.
However, when running a simple conversion I get an infinite loop here:
while (!(m_f & hidden_bit<original_matissa_bit_width>()))
{
m_f <<= 1;
m_e--;
}
its line 126 in diy_fp.h
The code I am trying to run is:
char buffer[128];
double num = 55.6666;
floaxie::ftoa(num, buffer);
Has your code ever been run on windows at all?
thanks Peter
floaxie/include/floaxie/ftoa.h
Line 124 in 05c6259
Null character is written at buf[3]
for zero.
If we provide uninitialized buffer, buf[1]
and buf[2]
contains garbage and printf/cout does not work well when we don't use returned the number of string length(e.g. directly construct std::string from char[]
buffer)
Possible fix would be
buffer[0] = '0';
buffer[1] = '\0';
?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.