Giter Club home page Giter Club logo

floaxie's People

Contributors

aclex avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

floaxie's Issues

Incorrect boundary computation

Hi,

I was recently testing your library and noticed that some numbers fail to round-trip when printed using floaxie::ftoa. As it turned out, the computation of shift_amount in diy_fp::boundaries is incorrect.

Actually, shift_amount currently always equals 1 and the resulting rounding interval is therefore too large if the input is a power-of-2.

The nth_bit test in

const unsigned char shift_amount(1 + nth_bit(mi.m_f, std::numeric_limits<FloatType>::digits));
should be replaced by something like

constexpr int Prec
    = std::numeric_limits<FloatType>::digits;
constexpr int MinExp
    = std::numeric_limits<FloatType>::min_exponent - 1 - (Prec - 1);
constexpr mantissa_storage_type HiddenBit
    = static_cast<mantissa_storage_type>(1) << (Prec - 1);

const bool lower_boundary_is_closer
    = (mi.m_f == HiddenBit && mi.m_e > MinExp);
const unsigned char shift_amount
    = 1 + lower_boundary_is_closer;

though it turns out that

const unsigned char shift_amount = 1 + (mi.m_f == HiddenBit);

is sufficient for the current implementation.


Here are some test cases which currently fail to round-trip.
(The hexadecimal numbers are the IEEE representations of the floating-point numbers.)

Some incorrect single-precision numbers:

9.8607613e-32f == 2^-103 == 0x0C000000
6.3108872e-30f == 2^-97  == 0x0F000000
8.4703295e-22f == 2^-70  == 0x1C800000
8.6736174e-19f == 2^-60  == 0x21800000
7.1054274e-15f == 2^-47  == 0x28000000
7.2057594e+16f == 2^56   == 0x5B800000
3.7778932e+22f == 2^75   == 0x65000000
7.5557864e+22f == 2^76   == 0x65800000
4.8357033e+24f == 2^82   == 0x68800000
7.7371252e+25f == 2^86   == 0x6A800000
1.5474251e+26f == 2^87   == 0x6B000000
3.0948501e+26f == 2^88   == 0x6B800000
6.1897002e+26f == 2^89   == 0x6C000000
1.2379401e+27f == 2^90   == 0x6C800000
2.4758801e+27f == 2^91   == 0x6D000000
4.9517602e+27f == 2^92   == 0x6D800000
9.9035203e+27f == 2^93   == 0x6E000000
1.9807041e+28f == 2^94   == 0x6E800000
3.9614081e+28f == 2^95   == 0x6F000000
7.9228163e+28f == 2^96   == 0x6F800000

Some incorrect double-precision numbers:

1.7800590868057611e-307 == 2^-1019 == 0x0040000000000000
2.0522684006491881e-289 == 2^-959  == 0x0400000000000000
3.9696644133184383e-264 == 2^-875  == 0x0940000000000000
2.9290953396399042e-244 == 2^-809  == 0x0D60000000000000
2.5160737381238802e-234 == 2^-776  == 0x0F70000000000000
5.5329046628180653e-222 == 2^-735  == 0x1200000000000000
4.5965573598916705e-187 == 2^-619  == 0x1940000000000000
2.8451311993408992e-160 == 2^-530  == 0x1ED0000000000000
5.0052077379577523e-147 == 2^-486  == 0x2190000000000000
4.9569176510071274e-119 == 2^-393  == 0x2760000000000000
4.6816763546921983e-97  == 2^-320  == 0x2BF0000000000000
5.0978941156238473e-57  == 2^-187  == 0x3440000000000000
3.2311742677852644e-27  == 2^-88   == 0x3A70000000000000
3.8685626227668134e+25  == 2^85    == 0x4540000000000000
4.9039857307708443e+55  == 2^185   == 0x4B80000000000000
2.6074060497081422e+92  == 2^307   == 0x5320000000000000
4.8098152095208105e+111 == 2^371   == 0x5720000000000000
4.7634102635436893e+139 == 2^464   == 0x5CF0000000000000
4.4989137945431964e+161 == 2^537   == 0x6180000000000000
4.8988833106573424e+201 == 2^670   == 0x69D0000000000000
8.139666055761541e+236  == 2^787   == 0x7120000000000000
1.3207363278391631e+269 == 2^894   == 0x77D0000000000000

doesnt work on windows / visual studio

Hi

I tried to use your library on MS-Windows in my visual studio project. I copied over the files into my project. I got some compiler warnings - not sure if they are serious or not.

However, when running a simple conversion I get an infinite loop here:

        while (!(m_f & hidden_bit<original_matissa_bit_width>()))
        {
            m_f <<= 1;
            m_e--;
        }

its line 126 in diy_fp.h

The code I am trying to run is:

    char buffer[128];
    double num = 55.6666;
    floaxie::ftoa(num, buffer);

Has your code ever been run on windows at all?

thanks Peter

`0` is not printed correctly.

buffer[3] = '\0';

Null character is written at buf[3] for zero.
If we provide uninitialized buffer, buf[1] and buf[2] contains garbage and printf/cout does not work well when we don't use returned the number of string length(e.g. directly construct std::string from char[] buffer)

Possible fix would be

buffer[0] = '0';
buffer[1] = '\0';

?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.